From 6844bc1c17bd255047c2bb01875ad9d3c99c1904 Mon Sep 17 00:00:00 2001 From: Ahmed Darrazi Date: Tue, 30 Dec 2025 02:56:28 +0100 Subject: [PATCH] spec: restore run wizard --- specs/011-restore-run-wizard/plan.md | 75 +++++++++ specs/011-restore-run-wizard/spec.md | 224 ++++++++++++++++++++++++++ specs/011-restore-run-wizard/tasks.md | 43 +++++ 3 files changed, 342 insertions(+) create mode 100644 specs/011-restore-run-wizard/plan.md create mode 100644 specs/011-restore-run-wizard/spec.md create mode 100644 specs/011-restore-run-wizard/tasks.md diff --git a/specs/011-restore-run-wizard/plan.md b/specs/011-restore-run-wizard/plan.md new file mode 100644 index 0000000..36c191d --- /dev/null +++ b/specs/011-restore-run-wizard/plan.md @@ -0,0 +1,75 @@ +# Implementation Plan: Restore Run Wizard (011) + +**Branch**: `feat/011-restore-run-wizard` | **Date**: 2025-12-30 +**Input**: Feature specification in `specs/011-restore-run-wizard/spec.md` + +## Summary +Refactor Restore Run creation into a **Filament Wizard** that enforces **Safety First**: +source → scope → safety checks → preview → confirm + execute. + +Leverage existing restore primitives (`RestoreService::preview()` / `RestoreService::execute()`) and incrementally introduce: +- structured **risk checks** +- **diff preview** artifacts/summaries +- stronger **execution gating** + audit fields + +## Technical Context (current code) +- Filament Resource: `app/Filament/Resources/RestoreRunResource.php` (single form today) +- Restore engine: `app/Services/Intune/RestoreService.php` (preview + execute) +- Diff tools: `app/Services/Intune/PolicyNormalizer.php` + `app/Services/Intune/VersionDiff.php` +- Data model: `restore_runs` already stores `preview`, `results`, `metadata`, `requested_items` + +## Phase 1 — Data + State Model (Wizard-ready) +- Define restore run lifecycle statuses (string enum values). +- Decide what is stored as dedicated columns vs `restore_runs.metadata` JSON. +- Add minimal persistence for wizard state: + - `scope_mode`, `check_summary`, `check_results`, `preview_summary`, `confirmed_at/by`, `environment`, `highlander_label`. + +**Checkpoint**: RestoreRun can represent wizard progression and persist computations. + +## Phase 2 — Filament Wizard UI (Create Restore Run) +- Replace the single Create form with a 5-step wizard UI. +- Implement step-level validation and state resets (changing backup set resets downstream). +- Keep dry-run default ON, and make execution UI unavailable until the wizard rules are satisfied. + +**Checkpoint**: Wizard is usable end-to-end in dry-run. + +## Phase 3 — Restore Scope Builder (Selection UX) +- Build grouped selection UI for BackupItems (type/platform), with search and “select all”. +- Clearly mark: + - foundations vs policies + - preview-only types + - items missing policy_version linkage / snapshot completeness hints (optional) + +**Checkpoint**: Scoping is explicit, scalable, and safe. + +## Phase 4 — Safety & Conflict Checks (RestoreRiskChecker) +- Implement server-side checks for the chosen scope. +- Persist results on the RestoreRun and display with severity badges. +- Block execution if blockers exist. + +**Checkpoint**: Defensive layer in place; blockers stop execution. + +## Phase 5 — Preview (RestoreDiffGenerator) +- Generate a diff summary (minimum) comparing backup snapshot vs current target state. +- Persist preview summary (and optionally per-item diffs with limits). +- Require preview completion before allowing execute. + +**Checkpoint**: Preview step is a hard gate for execute and is auditable. + +## Phase 6 — Confirm & Execute +- Add explicit confirmations: + - “I reviewed the impact” + - tenant hard-confirm (Highlander) + - environment badge (frozen at run creation) +- Execute restore via queue job (preferred) or synchronous execution (only if queue is out of scope for MVP). +- Update run statuses and persist outcomes. + +**Checkpoint**: Execution is safe, gated, and traceable. + +## Phase 7 — Tests + QA +- Pest feature tests for: + - wizard gating rules (execute disabled until conditions satisfied) + - safety checks persistence and blocking behavior + - preview summary generation +- Run targeted tests and Pint. + diff --git a/specs/011-restore-run-wizard/spec.md b/specs/011-restore-run-wizard/spec.md new file mode 100644 index 0000000..96b6b9b --- /dev/null +++ b/specs/011-restore-run-wizard/spec.md @@ -0,0 +1,224 @@ +# Feature Specification: Restore Run Wizard (011) + +**Feature Branch**: `feat/011-restore-run-wizard` +**Created**: 2025-12-30 +**Status**: Draft +**Input**: Restore Run Wizard requirements (Safety First / Defensive Restore) + +## Overview +Implement **Restore Runs** as a **multi-step Wizard** (instead of a single “Create Restore Run” form) to enforce **Safety First / Defensive Restore**. + +Restore is a high-risk workflow. The wizard must guide admins through explicit checkpoints: +source selection → scoping → safety checks → preview → confirmation + execution. + +## Problem Statement +The current Restore Run creation is a single form that can lead to: +- picking the wrong backup source +- restoring too broad a scope unintentionally +- executing without a structured “risk + preview + explicit confirmation” flow + +## Goals +- Make restore a **deliberate, stepwise** process with strong defaults. +- Make **dry-run** the default, and keep “Execute” disabled until all safety gates are satisfied. +- Add **server-side safety/conflict checks** and persist results for auditability. +- Provide a **preview** (diff summary at minimum) before allowing execution. + +## Non-Goals (v1) +- Approval workflows / multi-person approvals (but design must not block future addition). +- Perfect diff UX parity with Intune (basic normalized diff output is enough). +- A generic wizard framework (restore-specific implementation is fine). + +--- + +## UX Principles +- **Dry-run default = ON** +- Wizard progression should slow the user down and force explicit decisions. +- “Execute” stays disabled until: + - Preview has been completed + - No blocking checks exist + - “I reviewed the impact” checkbox is checked + - Tenant hard-confirm matches (Highlander principle) + +--- + +## Wizard Steps + +### Step 1 — Select Backup Set (Source of Truth) +**Question:** “What are we restoring from?” + +**Inputs** +- Backup Set (required) + +**Read-only** +- Snapshot timestamp +- Tenant name +- Count of policies/items +- Types (Config / Security / Scripts …) + +**Validation** +- `backup_set_id` is required +- Changing the backup set resets downstream state (scope, checks, preview, confirmation) + +### Step 2 — Define Restore Scope (Selectivity) +**Question:** “What exactly should be restored?” + +**Inputs** +- Scope mode: `all` (default) or `selected` +- If `selected`: item multiselect with search + select all + +**UI** +- Prefer grouped by **type** and **platform** +- Mark “preview-only” types clearly +- Foundations should be discoverable (scope tags, assignment filters, notification templates) + +**Notes** +- “Empty = all” only when scope mode is `all` (not when `selected`) + +### Step 3 — Safety & Conflict Checks (Defensive Layer) +**Question:** “Is this dangerous?” + +**Checks (server-side, persisted)** +- Target policy missing in target tenant? +- Target policy newer than backup? (staleness / overwrite risk) +- Assignments conflicts (e.g., mapping required / orphaned groups) +- Scope tag conflicts (mapping required / missing) +- Preview-only policies included in scope (should be warned and auto-dry-run) + +**Severity** +- ❌ blocking +- ⚠️ warning +- ✅ safe + +**Rules** +- Blocking checks prevent execution. +- Wizard may allow proceeding to preview, but must never allow execute while blockers exist. + +### Step 4 — Preview (Dry-Run Simulation) +**Question:** “What would happen?” + +**Outputs** +- Diff summary (at minimum): + - X policies changed + - Y assignments changed + - Z scope tags changed +- Per-item normalized diff (nice-to-have for v1, but plan for it) + +**Defaults** +- “Preview only (Dry-run)” is ON by default + +### Step 5 — Confirm & Execute (Point of No Return) +**Question:** “Do you really want to do this?” + +**Confirmations** +- Checkbox: “I reviewed the impact” +- Tenant hard-confirm input (must match tenant display identifier) +- Environment badge (Prod/Test) highly visible (frozen at run start for audit) + +**Rules** +- Execute disabled if: + - `dry_run = true` + - blockers exist + - tenant confirm mismatch + - acknowledgement unchecked + +--- + +## Domain Model (v1-aligned) +We already have a `restore_runs` aggregate (`restore_runs` table) with: +- `backup_set_id`, `requested_items`, `preview`, `results`, `status`, `metadata`, timestamps, and `group_mapping`. + +**v1 approach** +- Keep the existing primary key type (bigint) to avoid a disruptive migration. +- Extend the lifecycle/status semantics and persist wizard computations (checks + diff summaries) in structured fields: + - Prefer adding dedicated JSON columns only if needed; otherwise use `metadata` for wizard state. + +### RestoreRun Lifecycle (proposed statuses) +`draft → scoped → checked → previewed → queued → running → completed|partial|failed|cancelled` + +### Persisted Wizard State (minimum) +- `backup_set_id` (existing) +- `requested_items` (selected IDs, existing) +- `metadata.scope_mode` (`all|selected`) +- `metadata.environment` (`prod|test`) +- `metadata.highlander_label` (tenant identifier string, frozen) +- `metadata.check_summary` + `metadata.check_results` (Step 3) +- `metadata.preview_summary` + `metadata.preview_diffs` (Step 4; diffs may be truncated/limited) +- `metadata.confirmed_at`, `metadata.confirmed_by` (Step 5) + +--- + +## Services / Responsibilities +- **RestoreScopeBuilder**: build selectable restore items (grouped, searchable), include foundations & mark preview-only. +- **RestoreRiskChecker**: run safety checks, return structured results + summary. +- **RestoreDiffGenerator**: generate diff summary (and optionally per-item diffs) for preview. +- **RestoreExecutor**: execute restore (idempotent, tenant/run locking), write detailed outcomes. +- **RestoreRunPolicy**: enforce invariants (no execution without preview + confirmations). + +--- + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 — Wizard-driven Restore Run (Priority: P1) +As an admin, I can create a restore run via a 5-step wizard and I cannot accidentally execute without preview + explicit confirmations. + +**Why this priority**: This is the safety foundation; without it, restore remains risky UX. + +**Independent Test**: In Filament, create a restore run with dry-run, see checks + preview, and confirm execute stays disabled until gates satisfied. + +**Acceptance Scenarios** +1. **Given** I select a backup set, **When** I move to the next step, **Then** scope/check/preview state is reset when I change the backup set again. +2. **Given** I keep dry-run enabled, **When** I reach Step 5, **Then** Execute is disabled. +3. **Given** I disable dry-run, **When** I have not completed preview, **Then** Execute is disabled. + +--- + +### User Story 2 — Safety Checks block execution (Priority: P1) +As an admin, I see blocking vs warning checks, and execution is blocked when blockers exist. + +**Why this priority**: Defensive restore requires an explicit risk layer. + +**Independent Test**: Create a scope that triggers a blocking check and verify execution cannot proceed. + +**Acceptance Scenarios** +1. **Given** a blocking check exists, **When** I reach Step 5, **Then** Execute remains disabled and blockers are visible. +2. **Given** only warnings exist, **When** I acknowledge impact and hard-confirm tenant, **Then** I can execute (dry-run off). + +--- + +### User Story 3 — Preview diff summary (Priority: P2) +As an admin, I can preview what would change before executing restore. + +**Why this priority**: A restore without preview is operationally unsafe. + +**Independent Test**: Run Step 4 preview and verify diff summary is computed and persisted on the RestoreRun. + +**Acceptance Scenarios** +1. **Given** I scoped items, **When** I run preview, **Then** I see a summary (changed policies count) and it persists on the restore run. + +--- + +## Edge Cases +- Very large backup sets (hundreds/thousands of items): selection/search must remain responsive. +- Switching backup set mid-flow resets downstream state safely. +- Policies not present in target tenant: shown as warning/blocker depending on restore mode. +- RBAC-limited tenant setup: checks must clearly show “inventory/restore may be partial”. + +--- + +## Functional Requirements +- **FR-011.1**: System MUST implement Restore Run creation as a 5-step wizard in Filament. +- **FR-011.2**: System MUST default `dry_run = true` and prevent execution while dry-run is enabled. +- **FR-011.3**: System MUST run server-side safety checks and persist results (summary + details) for audit. +- **FR-011.4**: System MUST generate at least a diff summary on preview and persist it. +- **FR-011.5**: System MUST require explicit acknowledgement + tenant hard-confirm before allowing execution. +- **FR-011.6**: System MUST freeze environment badge and tenant label for audit on run creation. +- **FR-011.7**: System MUST keep execution disabled if any blocking checks exist. +- **FR-011.8**: System MUST record execution outcomes and leave an auditable trail (existing audit log patterns). + +--- + +## Success Criteria +- **SC-011.1**: Admins can only execute after preview + confirmations; no accidental execution path exists. +- **SC-011.2**: Blocking checks reliably prevent execution. +- **SC-011.3**: Preview produces a persisted summary for every run. + diff --git a/specs/011-restore-run-wizard/tasks.md b/specs/011-restore-run-wizard/tasks.md new file mode 100644 index 0000000..323e85b --- /dev/null +++ b/specs/011-restore-run-wizard/tasks.md @@ -0,0 +1,43 @@ +# Tasks: Restore Run Wizard (011) + +**Branch**: `feat/011-restore-run-wizard` | **Date**: 2025-12-30 +**Input**: `specs/011-restore-run-wizard/spec.md`, `specs/011-restore-run-wizard/plan.md` + +## Phase 0 — Specs (this PR) +- [x] T001 Create `spec.md`, `plan.md`, `tasks.md` for Feature 011. + +## Phase 1 — Data Model + Status Semantics +- [ ] T002 Define RestoreRun lifecycle statuses and transitions (draft→scoped→checked→previewed→queued→running→completed|partial|failed). +- [ ] T003 Add minimal persistence for wizard state (prefer JSON in `restore_runs.metadata` unless columns are required). +- [ ] T004 Freeze `environment` + `highlander_label` at run creation for audit. + +## Phase 2 — Filament Wizard (Create Restore Run) +- [ ] T005 Replace current single-form create with a 5-step wizard (Step 1–5 as in spec). +- [ ] T006 Ensure changing `backup_set_id` resets downstream wizard state. +- [ ] T007 Enforce “dry-run default ON” and keep execute disabled until all gates satisfied. + +## Phase 3 — Restore Scope UX +- [ ] T008 Implement scoped selection UI grouped by policy type + platform with search and bulk toggle. +- [ ] T009 Mark preview-only types clearly and ensure they never execute. +- [ ] T010 Ensure foundations are discoverable (assignment filters, scope tags, notification templates). + +## Phase 4 — Safety & Conflict Checks +- [ ] T011 Implement `RestoreRiskChecker` (server-side) and persist `check_summary` + `check_results`. +- [ ] T012 Render check results with severity (blocking/warning/safe) and block execute when blockers exist. + +## Phase 5 — Preview (Diff) +- [ ] T013 Implement `RestoreDiffGenerator` using `PolicyNormalizer` + `VersionDiff`. +- [ ] T014 Persist preview summary (and per-item diffs with safe limits) and require preview completion before execute. + +## Phase 6 — Confirm & Execute +- [ ] T015 Implement Step 5 confirmations (ack checkbox + tenant hard-confirm). +- [ ] T016 Execute restore via a queued Job (preferred) and update statuses + timestamps. +- [ ] T017 Persist execution outcomes and ensure audit logging entries exist for execution start/finish. + +## Phase 7 — Tests + Formatting +- [ ] T018 Add Pest tests for wizard gating rules and status transitions. +- [ ] T019 Add Pest tests for safety checks persistence and blocking behavior. +- [ ] T020 Add Pest tests for preview summary generation. +- [ ] T021 Run `./vendor/bin/pint --dirty`. +- [ ] T022 Run targeted tests (e.g. `./vendor/bin/sail artisan test --filter=RestoreRunWizard` once tests exist). + -- 2.45.2