From 6844bc1c17bd255047c2bb01875ad9d3c99c1904 Mon Sep 17 00:00:00 2001
From: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Date: Tue, 30 Dec 2025 02:56:28 +0100
Subject: [PATCH] spec: restore run wizard

---
 specs/011-restore-run-wizard/plan.md  |  75 +++++++++
 specs/011-restore-run-wizard/spec.md  | 224 ++++++++++++++++++++++++++
 specs/011-restore-run-wizard/tasks.md |  43 +++++
 3 files changed, 342 insertions(+)
 create mode 100644 specs/011-restore-run-wizard/plan.md
 create mode 100644 specs/011-restore-run-wizard/spec.md
 create mode 100644 specs/011-restore-run-wizard/tasks.md

diff --git a/specs/011-restore-run-wizard/plan.md b/specs/011-restore-run-wizard/plan.md
new file mode 100644
index 0000000..36c191d
--- /dev/null
+++ b/specs/011-restore-run-wizard/plan.md
@@ -0,0 +1,75 @@
+# Implementation Plan: Restore Run Wizard (011)
+
+**Branch**: `feat/011-restore-run-wizard` | **Date**: 2025-12-30  
+**Input**: Feature specification in `specs/011-restore-run-wizard/spec.md`
+
+## Summary
+Refactor Restore Run creation into a **Filament Wizard** that enforces **Safety First**:
+source → scope → safety checks → preview → confirm + execute.
+
+Leverage existing restore primitives (`RestoreService::preview()` / `RestoreService::execute()`) and incrementally introduce:
+- structured **risk checks**
+- **diff preview** artifacts/summaries
+- stronger **execution gating** + audit fields
+
+## Technical Context (current code)
+- Filament Resource: `app/Filament/Resources/RestoreRunResource.php` (single form today)
+- Restore engine: `app/Services/Intune/RestoreService.php` (preview + execute)
+- Diff tools: `app/Services/Intune/PolicyNormalizer.php` + `app/Services/Intune/VersionDiff.php`
+- Data model: `restore_runs` already stores `preview`, `results`, `metadata`, `requested_items`
+
+## Phase 1 — Data + State Model (Wizard-ready)
+- Define restore run lifecycle statuses (string enum values).
+- Decide what is stored as dedicated columns vs `restore_runs.metadata` JSON.
+- Add minimal persistence for wizard state:
+  - `scope_mode`, `check_summary`, `check_results`, `preview_summary`, `confirmed_at/by`, `environment`, `highlander_label`.
+
+**Checkpoint**: RestoreRun can represent wizard progression and persist computations.
+
+## Phase 2 — Filament Wizard UI (Create Restore Run)
+- Replace the single Create form with a 5-step wizard UI.
+- Implement step-level validation and state resets (changing backup set resets downstream).
+- Keep dry-run default ON, and make execution UI unavailable until the wizard rules are satisfied.
+
+**Checkpoint**: Wizard is usable end-to-end in dry-run.
+
+## Phase 3 — Restore Scope Builder (Selection UX)
+- Build grouped selection UI for BackupItems (type/platform), with search and “select all”.
+- Clearly mark:
+  - foundations vs policies
+  - preview-only types
+  - items missing policy_version linkage / snapshot completeness hints (optional)
+
+**Checkpoint**: Scoping is explicit, scalable, and safe.
+
+## Phase 4 — Safety & Conflict Checks (RestoreRiskChecker)
+- Implement server-side checks for the chosen scope.
+- Persist results on the RestoreRun and display with severity badges.
+- Block execution if blockers exist.
+
+**Checkpoint**: Defensive layer in place; blockers stop execution.
+
+## Phase 5 — Preview (RestoreDiffGenerator)
+- Generate a diff summary (minimum) comparing backup snapshot vs current target state.
+- Persist preview summary (and optionally per-item diffs with limits).
+- Require preview completion before allowing execute.
+
+**Checkpoint**: Preview step is a hard gate for execute and is auditable.
+
+## Phase 6 — Confirm & Execute
+- Add explicit confirmations:
+  - “I reviewed the impact”
+  - tenant hard-confirm (Highlander)
+  - environment badge (frozen at run creation)
+- Execute restore via queue job (preferred) or synchronous execution (only if queue is out of scope for MVP).
+- Update run statuses and persist outcomes.
+
+**Checkpoint**: Execution is safe, gated, and traceable.
+
+## Phase 7 — Tests + QA
+- Pest feature tests for:
+  - wizard gating rules (execute disabled until conditions satisfied)
+  - safety checks persistence and blocking behavior
+  - preview summary generation
+- Run targeted tests and Pint.
+
diff --git a/specs/011-restore-run-wizard/spec.md b/specs/011-restore-run-wizard/spec.md
new file mode 100644
index 0000000..96b6b9b
--- /dev/null
+++ b/specs/011-restore-run-wizard/spec.md
@@ -0,0 +1,224 @@
+# Feature Specification: Restore Run Wizard (011)
+
+**Feature Branch**: `feat/011-restore-run-wizard`  
+**Created**: 2025-12-30  
+**Status**: Draft  
+**Input**: Restore Run Wizard requirements (Safety First / Defensive Restore)
+
+## Overview
+Implement **Restore Runs** as a **multi-step Wizard** (instead of a single “Create Restore Run” form) to enforce **Safety First / Defensive Restore**.
+
+Restore is a high-risk workflow. The wizard must guide admins through explicit checkpoints:
+source selection → scoping → safety checks → preview → confirmation + execution.
+
+## Problem Statement
+The current Restore Run creation is a single form that can lead to:
+- picking the wrong backup source
+- restoring too broad a scope unintentionally
+- executing without a structured “risk + preview + explicit confirmation” flow
+
+## Goals
+- Make restore a **deliberate, stepwise** process with strong defaults.
+- Make **dry-run** the default, and keep “Execute” disabled until all safety gates are satisfied.
+- Add **server-side safety/conflict checks** and persist results for auditability.
+- Provide a **preview** (diff summary at minimum) before allowing execution.
+
+## Non-Goals (v1)
+- Approval workflows / multi-person approvals (but design must not block future addition).
+- Perfect diff UX parity with Intune (basic normalized diff output is enough).
+- A generic wizard framework (restore-specific implementation is fine).
+
+---
+
+## UX Principles
+- **Dry-run default = ON**
+- Wizard progression should slow the user down and force explicit decisions.
+- “Execute” stays disabled until:
+  - Preview has been completed
+  - No blocking checks exist
+  - “I reviewed the impact” checkbox is checked
+  - Tenant hard-confirm matches (Highlander principle)
+
+---
+
+## Wizard Steps
+
+### Step 1 — Select Backup Set (Source of Truth)
+**Question:** “What are we restoring from?”
+
+**Inputs**
+- Backup Set (required)
+
+**Read-only**
+- Snapshot timestamp
+- Tenant name
+- Count of policies/items
+- Types (Config / Security / Scripts …)
+
+**Validation**
+- `backup_set_id` is required
+- Changing the backup set resets downstream state (scope, checks, preview, confirmation)
+
+### Step 2 — Define Restore Scope (Selectivity)
+**Question:** “What exactly should be restored?”
+
+**Inputs**
+- Scope mode: `all` (default) or `selected`
+- If `selected`: item multiselect with search + select all
+
+**UI**
+- Prefer grouped by **type** and **platform**
+- Mark “preview-only” types clearly
+- Foundations should be discoverable (scope tags, assignment filters, notification templates)
+
+**Notes**
+- “Empty = all” only when scope mode is `all` (not when `selected`)
+
+### Step 3 — Safety & Conflict Checks (Defensive Layer)
+**Question:** “Is this dangerous?”
+
+**Checks (server-side, persisted)**
+- Target policy missing in target tenant?
+- Target policy newer than backup? (staleness / overwrite risk)
+- Assignments conflicts (e.g., mapping required / orphaned groups)
+- Scope tag conflicts (mapping required / missing)
+- Preview-only policies included in scope (should be warned and auto-dry-run)
+
+**Severity**
+- ❌ blocking
+- ⚠️ warning
+- ✅ safe
+
+**Rules**
+- Blocking checks prevent execution.
+- Wizard may allow proceeding to preview, but must never allow execute while blockers exist.
+
+### Step 4 — Preview (Dry-Run Simulation)
+**Question:** “What would happen?”
+
+**Outputs**
+- Diff summary (at minimum):
+  - X policies changed
+  - Y assignments changed
+  - Z scope tags changed
+- Per-item normalized diff (nice-to-have for v1, but plan for it)
+
+**Defaults**
+- “Preview only (Dry-run)” is ON by default
+
+### Step 5 — Confirm & Execute (Point of No Return)
+**Question:** “Do you really want to do this?”
+
+**Confirmations**
+- Checkbox: “I reviewed the impact”
+- Tenant hard-confirm input (must match tenant display identifier)
+- Environment badge (Prod/Test) highly visible (frozen at run start for audit)
+
+**Rules**
+- Execute disabled if:
+  - `dry_run = true`
+  - blockers exist
+  - tenant confirm mismatch
+  - acknowledgement unchecked
+
+---
+
+## Domain Model (v1-aligned)
+We already have a `restore_runs` aggregate (`restore_runs` table) with:
+- `backup_set_id`, `requested_items`, `preview`, `results`, `status`, `metadata`, timestamps, and `group_mapping`.
+
+**v1 approach**
+- Keep the existing primary key type (bigint) to avoid a disruptive migration.
+- Extend the lifecycle/status semantics and persist wizard computations (checks + diff summaries) in structured fields:
+  - Prefer adding dedicated JSON columns only if needed; otherwise use `metadata` for wizard state.
+
+### RestoreRun Lifecycle (proposed statuses)
+`draft → scoped → checked → previewed → queued → running → completed|partial|failed|cancelled`
+
+### Persisted Wizard State (minimum)
+- `backup_set_id` (existing)
+- `requested_items` (selected IDs, existing)
+- `metadata.scope_mode` (`all|selected`)
+- `metadata.environment` (`prod|test`)
+- `metadata.highlander_label` (tenant identifier string, frozen)
+- `metadata.check_summary` + `metadata.check_results` (Step 3)
+- `metadata.preview_summary` + `metadata.preview_diffs` (Step 4; diffs may be truncated/limited)
+- `metadata.confirmed_at`, `metadata.confirmed_by` (Step 5)
+
+---
+
+## Services / Responsibilities
+- **RestoreScopeBuilder**: build selectable restore items (grouped, searchable), include foundations & mark preview-only.
+- **RestoreRiskChecker**: run safety checks, return structured results + summary.
+- **RestoreDiffGenerator**: generate diff summary (and optionally per-item diffs) for preview.
+- **RestoreExecutor**: execute restore (idempotent, tenant/run locking), write detailed outcomes.
+- **RestoreRunPolicy**: enforce invariants (no execution without preview + confirmations).
+
+---
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 — Wizard-driven Restore Run (Priority: P1)
+As an admin, I can create a restore run via a 5-step wizard and I cannot accidentally execute without preview + explicit confirmations.
+
+**Why this priority**: This is the safety foundation; without it, restore remains risky UX.
+
+**Independent Test**: In Filament, create a restore run with dry-run, see checks + preview, and confirm execute stays disabled until gates satisfied.
+
+**Acceptance Scenarios**
+1. **Given** I select a backup set, **When** I move to the next step, **Then** scope/check/preview state is reset when I change the backup set again.
+2. **Given** I keep dry-run enabled, **When** I reach Step 5, **Then** Execute is disabled.
+3. **Given** I disable dry-run, **When** I have not completed preview, **Then** Execute is disabled.
+
+---
+
+### User Story 2 — Safety Checks block execution (Priority: P1)
+As an admin, I see blocking vs warning checks, and execution is blocked when blockers exist.
+
+**Why this priority**: Defensive restore requires an explicit risk layer.
+
+**Independent Test**: Create a scope that triggers a blocking check and verify execution cannot proceed.
+
+**Acceptance Scenarios**
+1. **Given** a blocking check exists, **When** I reach Step 5, **Then** Execute remains disabled and blockers are visible.
+2. **Given** only warnings exist, **When** I acknowledge impact and hard-confirm tenant, **Then** I can execute (dry-run off).
+
+---
+
+### User Story 3 — Preview diff summary (Priority: P2)
+As an admin, I can preview what would change before executing restore.
+
+**Why this priority**: A restore without preview is operationally unsafe.
+
+**Independent Test**: Run Step 4 preview and verify diff summary is computed and persisted on the RestoreRun.
+
+**Acceptance Scenarios**
+1. **Given** I scoped items, **When** I run preview, **Then** I see a summary (changed policies count) and it persists on the restore run.
+
+---
+
+## Edge Cases
+- Very large backup sets (hundreds/thousands of items): selection/search must remain responsive.
+- Switching backup set mid-flow resets downstream state safely.
+- Policies not present in target tenant: shown as warning/blocker depending on restore mode.
+- RBAC-limited tenant setup: checks must clearly show “inventory/restore may be partial”.
+
+---
+
+## Functional Requirements
+- **FR-011.1**: System MUST implement Restore Run creation as a 5-step wizard in Filament.
+- **FR-011.2**: System MUST default `dry_run = true` and prevent execution while dry-run is enabled.
+- **FR-011.3**: System MUST run server-side safety checks and persist results (summary + details) for audit.
+- **FR-011.4**: System MUST generate at least a diff summary on preview and persist it.
+- **FR-011.5**: System MUST require explicit acknowledgement + tenant hard-confirm before allowing execution.
+- **FR-011.6**: System MUST freeze environment badge and tenant label for audit on run creation.
+- **FR-011.7**: System MUST keep execution disabled if any blocking checks exist.
+- **FR-011.8**: System MUST record execution outcomes and leave an auditable trail (existing audit log patterns).
+
+---
+
+## Success Criteria
+- **SC-011.1**: Admins can only execute after preview + confirmations; no accidental execution path exists.
+- **SC-011.2**: Blocking checks reliably prevent execution.
+- **SC-011.3**: Preview produces a persisted summary for every run.
+
diff --git a/specs/011-restore-run-wizard/tasks.md b/specs/011-restore-run-wizard/tasks.md
new file mode 100644
index 0000000..323e85b
--- /dev/null
+++ b/specs/011-restore-run-wizard/tasks.md
@@ -0,0 +1,43 @@
+# Tasks: Restore Run Wizard (011)
+
+**Branch**: `feat/011-restore-run-wizard` | **Date**: 2025-12-30  
+**Input**: `specs/011-restore-run-wizard/spec.md`, `specs/011-restore-run-wizard/plan.md`
+
+## Phase 0 — Specs (this PR)
+- [x] T001 Create `spec.md`, `plan.md`, `tasks.md` for Feature 011.
+
+## Phase 1 — Data Model + Status Semantics
+- [ ] T002 Define RestoreRun lifecycle statuses and transitions (draft→scoped→checked→previewed→queued→running→completed|partial|failed).
+- [ ] T003 Add minimal persistence for wizard state (prefer JSON in `restore_runs.metadata` unless columns are required).
+- [ ] T004 Freeze `environment` + `highlander_label` at run creation for audit.
+
+## Phase 2 — Filament Wizard (Create Restore Run)
+- [ ] T005 Replace current single-form create with a 5-step wizard (Step 1–5 as in spec).
+- [ ] T006 Ensure changing `backup_set_id` resets downstream wizard state.
+- [ ] T007 Enforce “dry-run default ON” and keep execute disabled until all gates satisfied.
+
+## Phase 3 — Restore Scope UX
+- [ ] T008 Implement scoped selection UI grouped by policy type + platform with search and bulk toggle.
+- [ ] T009 Mark preview-only types clearly and ensure they never execute.
+- [ ] T010 Ensure foundations are discoverable (assignment filters, scope tags, notification templates).
+
+## Phase 4 — Safety & Conflict Checks
+- [ ] T011 Implement `RestoreRiskChecker` (server-side) and persist `check_summary` + `check_results`.
+- [ ] T012 Render check results with severity (blocking/warning/safe) and block execute when blockers exist.
+
+## Phase 5 — Preview (Diff)
+- [ ] T013 Implement `RestoreDiffGenerator` using `PolicyNormalizer` + `VersionDiff`.
+- [ ] T014 Persist preview summary (and per-item diffs with safe limits) and require preview completion before execute.
+
+## Phase 6 — Confirm & Execute
+- [ ] T015 Implement Step 5 confirmations (ack checkbox + tenant hard-confirm).
+- [ ] T016 Execute restore via a queued Job (preferred) and update statuses + timestamps.
+- [ ] T017 Persist execution outcomes and ensure audit logging entries exist for execution start/finish.
+
+## Phase 7 — Tests + Formatting
+- [ ] T018 Add Pest tests for wizard gating rules and status transitions.
+- [ ] T019 Add Pest tests for safety checks persistence and blocking behavior.
+- [ ] T020 Add Pest tests for preview summary generation.
+- [ ] T021 Run `./vendor/bin/pint --dirty`.
+- [ ] T022 Run targeted tests (e.g. `./vendor/bin/sail artisan test --filter=RestoreRunWizard` once tests exist).
+
-- 
2.45.2