# Feature Specification: Restore Run Wizard (011) **Feature Branch**: `feat/011-restore-run-wizard` **Created**: 2025-12-30 **Status**: Draft **Input**: Restore Run Wizard requirements (Safety First / Defensive Restore) ## Overview Implement **Restore Runs** as a **multi-step Wizard** (instead of a single “Create Restore Run” form) to enforce **Safety First / Defensive Restore**. Restore is a high-risk workflow. The wizard must guide admins through explicit checkpoints: source selection → scoping → safety checks → preview → confirmation + execution. ## Problem Statement The current Restore Run creation is a single form that can lead to: - picking the wrong backup source - restoring too broad a scope unintentionally - executing without a structured “risk + preview + explicit confirmation” flow ## Goals - Make restore a **deliberate, stepwise** process with strong defaults. - Make **dry-run** the default, and keep “Execute” disabled until all safety gates are satisfied. - Add **server-side safety/conflict checks** and persist results for auditability. - Provide a **preview** (diff summary at minimum) before allowing execution. ## Non-Goals (v1) - Approval workflows / multi-person approvals (but design must not block future addition). - Perfect diff UX parity with Intune (basic normalized diff output is enough). - A generic wizard framework (restore-specific implementation is fine). --- ## UX Principles - **Dry-run default = ON** - Wizard progression should slow the user down and force explicit decisions. - “Execute” stays disabled until: - Preview has been completed - No blocking checks exist - “I reviewed the impact” checkbox is checked - Tenant hard-confirm matches (Highlander principle) --- ## Wizard Steps ### Step 1 — Select Backup Set (Source of Truth) **Question:** “What are we restoring from?” **Inputs** - Backup Set (required) **Read-only** - Snapshot timestamp - Tenant name - Count of policies/items - Types (Config / Security / Scripts …) **Validation** - `backup_set_id` is required - Changing the backup set resets downstream state (scope, checks, preview, confirmation) ### Step 2 — Define Restore Scope (Selectivity) **Question:** “What exactly should be restored?” **Inputs** - Scope mode: `all` (default) or `selected` - If `selected`: item multiselect with search + select all **UI** - Prefer grouped by **type** and **platform** - Mark “preview-only” types clearly - Foundations should be discoverable (scope tags, assignment filters, notification templates) **Notes** - “Empty = all” only when scope mode is `all` (not when `selected`) ### Step 3 — Safety & Conflict Checks (Defensive Layer) **Question:** “Is this dangerous?” **Checks (server-side, persisted)** - Target policy missing in target tenant? - Target policy newer than backup? (staleness / overwrite risk) - Assignments conflicts (e.g., mapping required / orphaned groups) - Scope tag conflicts (mapping required / missing) - Preview-only policies included in scope (should be warned and auto-dry-run) **Severity** - ❌ blocking - ⚠️ warning - ✅ safe **Rules** - Blocking checks prevent execution. - Wizard may allow proceeding to preview, but must never allow execute while blockers exist. ### Step 4 — Preview (Dry-Run Simulation) **Question:** “What would happen?” **Outputs** - Diff summary (at minimum): - X policies changed - Y assignments changed - Z scope tags changed - Per-item normalized diff (nice-to-have for v1, but plan for it) **Defaults** - “Preview only (Dry-run)” is ON by default ### Step 5 — Confirm & Execute (Point of No Return) **Question:** “Do you really want to do this?” **Confirmations** - Checkbox: “I reviewed the impact” - Tenant hard-confirm input (must match tenant display identifier) - Environment badge (Prod/Test) highly visible (frozen at run start for audit) **Rules** - Execute disabled if: - `dry_run = true` - blockers exist - tenant confirm mismatch - acknowledgement unchecked --- ## Domain Model (v1-aligned) We already have a `restore_runs` aggregate (`restore_runs` table) with: - `backup_set_id`, `requested_items`, `preview`, `results`, `status`, `metadata`, timestamps, and `group_mapping`. **v1 approach** - Keep the existing primary key type (bigint) to avoid a disruptive migration. - Extend the lifecycle/status semantics and persist wizard computations (checks + diff summaries) in structured fields: - Prefer adding dedicated JSON columns only if needed; otherwise use `metadata` for wizard state. ### RestoreRun Lifecycle (proposed statuses) `draft → scoped → checked → previewed → queued → running → completed|partial|failed|cancelled` ### Persisted Wizard State (minimum) - `backup_set_id` (existing) - `requested_items` (selected IDs, existing) - `metadata.scope_mode` (`all|selected`) - `metadata.environment` (`prod|test`) - `metadata.highlander_label` (tenant identifier string, frozen) - `metadata.check_summary` + `metadata.check_results` (Step 3) - `metadata.preview_summary` + `metadata.preview_diffs` (Step 4; diffs may be truncated/limited) - `metadata.confirmed_at`, `metadata.confirmed_by` (Step 5) --- ## Services / Responsibilities - **RestoreScopeBuilder**: build selectable restore items (grouped, searchable), include foundations & mark preview-only. - **RestoreRiskChecker**: run safety checks, return structured results + summary. - **RestoreDiffGenerator**: generate diff summary (and optionally per-item diffs) for preview. - **RestoreExecutor**: execute restore (idempotent, tenant/run locking), write detailed outcomes. - **RestoreRunPolicy**: enforce invariants (no execution without preview + confirmations). --- ## User Scenarios & Testing *(mandatory)* ### User Story 1 — Wizard-driven Restore Run (Priority: P1) As an admin, I can create a restore run via a 5-step wizard and I cannot accidentally execute without preview + explicit confirmations. **Why this priority**: This is the safety foundation; without it, restore remains risky UX. **Independent Test**: In Filament, create a restore run with dry-run, see checks + preview, and confirm execute stays disabled until gates satisfied. **Acceptance Scenarios** 1. **Given** I select a backup set, **When** I move to the next step, **Then** scope/check/preview state is reset when I change the backup set again. 2. **Given** I keep dry-run enabled, **When** I reach Step 5, **Then** Execute is disabled. 3. **Given** I disable dry-run, **When** I have not completed preview, **Then** Execute is disabled. --- ### User Story 2 — Safety Checks block execution (Priority: P1) As an admin, I see blocking vs warning checks, and execution is blocked when blockers exist. **Why this priority**: Defensive restore requires an explicit risk layer. **Independent Test**: Create a scope that triggers a blocking check and verify execution cannot proceed. **Acceptance Scenarios** 1. **Given** a blocking check exists, **When** I reach Step 5, **Then** Execute remains disabled and blockers are visible. 2. **Given** only warnings exist, **When** I acknowledge impact and hard-confirm tenant, **Then** I can execute (dry-run off). --- ### User Story 3 — Preview diff summary (Priority: P2) As an admin, I can preview what would change before executing restore. **Why this priority**: A restore without preview is operationally unsafe. **Independent Test**: Run Step 4 preview and verify diff summary is computed and persisted on the RestoreRun. **Acceptance Scenarios** 1. **Given** I scoped items, **When** I run preview, **Then** I see a summary (changed policies count) and it persists on the restore run. --- ## Edge Cases - Very large backup sets (hundreds/thousands of items): selection/search must remain responsive. - Switching backup set mid-flow resets downstream state safely. - Policies not present in target tenant: shown as warning/blocker depending on restore mode. - RBAC-limited tenant setup: checks must clearly show “inventory/restore may be partial”. --- ## Functional Requirements - **FR-011.1**: System MUST implement Restore Run creation as a 5-step wizard in Filament. - **FR-011.2**: System MUST default `dry_run = true` and prevent execution while dry-run is enabled. - **FR-011.3**: System MUST run server-side safety checks and persist results (summary + details) for audit. - **FR-011.4**: System MUST generate at least a diff summary on preview and persist it. - **FR-011.5**: System MUST require explicit acknowledgement + tenant hard-confirm before allowing execution. - **FR-011.6**: System MUST freeze environment badge and tenant label for audit on run creation. - **FR-011.7**: System MUST keep execution disabled if any blocking checks exist. - **FR-011.8**: System MUST record execution outcomes and leave an auditable trail (existing audit log patterns). --- ## Success Criteria - **SC-011.1**: Admins can only execute after preview + confirmations; no accidental execution path exists. - **SC-011.2**: Blocking checks reliably prevent execution. - **SC-011.3**: Preview produces a persisted summary for every run.