Ahmed Darrazi 1e2958df27 feat: implement restore safety integrity and queue slide-over

2026-04-07 01:29:56 +02:00

23 KiB

Raw Blame History

Implementation Plan: Restore Safety Integrity

Branch: 181-restore-safety-integrity | Date: 2026-04-06 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/spec.md

Summary

Harden the restore flow so operators can distinguish stale versus current preview truth, stale versus current checks truth, technical startability versus safety readiness, and run completion versus real follow-up truth without adding a new recovery persistence model. The implementation keeps RestoreRun and OperationRun as the existing sources of truth, introduces a narrow derived restore-safety layer for scope fingerprinting and integrity assessment, persists only a compact execution-time safety snapshot inside existing RestoreRun.metadata when needed, hardens the wizard and detail surfaces, and preserves restore-specific truth on the canonical operation detail page.

Key approach: work inside the existing RestoreRunResource, CreateRestoreRun, restore form component views, restore infolist entry views, and restore-linked OperationRunResource seams; add derived restore safety helpers under the existing application structure; keep all changes Filament v5 and Livewire v4 compliant; avoid new tables and new Graph contract paths; validate the result with focused Pest, Livewire, hardening, ops-UX, and RBAC coverage.

Technical Context

Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing RestoreRunResource, RestoreService, RestoreRiskChecker, RestoreDiffGenerator, OperationRunResource, TenantlessOperationRunViewer, shared badge infrastructure, and existing RBAC or write-gate helpers
Storage: PostgreSQL with existing restore_runs and operation_runs records plus JSON or array-backed metadata, preview, results, and context; no schema change planned
Testing: Pest feature tests, Livewire page and action tests, unit tests for narrow derived restore-safety helpers, all run through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production Project Type: Laravel monolith web application
Performance Goals: Keep restore wizard and detail surfaces server-driven, avoid new render-time external calls, preserve quick operator scanability on confirm and result surfaces, and keep canonical operation detail DB-only at render time
Constraints: No new central recovery-state table, no new Graph contract path, no route identity change, no RBAC drift, no collapse of executable versus safe versus recovered semantics, no ad-hoc badge mappings, and no new global Filament assets
Scale/Scope: One tenant-scoped restore wizard, one tenant restore detail surface, one restore-linked canonical operation detail surface, a narrow derived restore-safety layer, and focused regression coverage across wizard, result, RBAC, and ops-UX behavior

Constitution Check

GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.

Principle	Status	Notes
Inventory-first	Pass	Backups remain immutable snapshots and no inventory ownership rule changes
Read/write separation	Pass	Real restore execution stays behind preview, checks, hard confirmation, audit, and tests
Graph contract path	Pass	No new Graph endpoints or contract registry changes; existing restore calls stay behind current restore services and `GraphClientInterface`
Deterministic capabilities	Pass	Existing capability registry and `UiEnforcement` or capability resolver remain authoritative
RBAC-UX planes and 404 vs 403	Pass	Tenant restore surfaces remain tenant-scoped; canonical `/admin/operations/{run}` remains workspace-safe and tenant-safe
Workspace isolation	Pass	No workspace scope broadening; canonical monitoring remains workspace-member gated
Tenant isolation	Pass	Restore runs, restore previews, checks, and result detail stay tenant-owned and tenant-entitled
Dangerous and destructive confirmations	Pass	Existing archive, restore, rerun, and force-delete actions remain confirmation-gated; real execution remains hard-confirmed in the wizard
Global search safety	Pass	`OperationRunResource` already remains non-globally-searchable; this feature adds no new globally searchable resource. `RestoreRunResource` is not made newly searchable, and it already has a view page if search is later enabled
Run observability	Pass	Existing `restore.execute` operations continue to create or reuse `OperationRun`; no new run model is introduced
Ops-UX 3-surface feedback	Pass	Existing queued toast, progress surfaces, and terminal monitoring behavior remain authoritative
Ops-UX lifecycle ownership	Pass	`OperationRun.status` and `OperationRun.outcome` remain service-owned; this feature only adds restore-specific read truth
Ops-UX summary counts	Pass	No new `OperationRun` summary-count keys are required; restore-specific integrity stays on restore context
Data minimization	Pass	No new secrets or external payload exposure; detail diagnostics remain secondary
Proportionality (PROP-001)	Pass	New logic is limited to derived restore-safety helpers and optional nested metadata snapshotting on existing records
Persisted truth (PERSIST-001)	Pass	No new table; only a narrow execution-time safety snapshot may be stored on the existing restore run
Behavioral state (STATE-001)	Pass	New integrity, safety, and follow-up states directly change operator guidance and execution gating semantics
Badge semantics (BADGE-001)	Pass	Any new restore safety badges or chips must route through central badge or shared primitive semantics, not page-local mapping
Filament-native UI (UI-FIL-001)	Pass	Existing Filament wizard, sections, view fields, infolist entries, and shared primitives remain the primary UI seams
UI naming (UI-NAMING-001)	Pass	The plan preserves `preview`, `checks`, `dry-run`, `restore`, `partial`, and `follow-up` as operator vocabulary
Operator surfaces (OPSURF-001)	Pass	Wizard and result surfaces become more operator-first, not more diagnostic-first
Filament Action Surface Contract	Pass	No redundant view actions or empty action groups are introduced; list inspect model remains row click
Filament UX-001	Pass with documented variance	The wizard remains structured and the detail page remains infolist-based with custom entry views, but still follows summary-first information architecture
Filament v5 / Livewire v4 compliance	Pass	The implementation stays inside the current Filament v5 and Livewire v4 stack
Provider registration location	Pass	No panel or provider changes; Laravel 11+ provider registration remains in `bootstrap/providers.php`
Asset strategy	Pass	No new panel assets are planned; deployment keeps the existing `php artisan filament:assets` step unchanged

Phase 0 Research

Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/research.md.

Key decisions:

Derive a deterministic restore scope fingerprint from existing wizard inputs instead of introducing a new persisted scope entity.
Separate preview and checks integrity from blocker and warning severity so no blockers can no longer be misread as safe.
Preserve invalidation evidence in wizard state instead of silently clearing prior preview and checks truth.
Persist only a narrow execution-time safety snapshot inside RestoreRun.metadata when historical truth is required for restore detail.
Derive result follow-up truth from existing results, assignment outcomes, and linked OperationRun outcome without adding a recovery entity.
Preserve restore-specific follow-up truth on canonical operation detail via enrichment or a safe deep link rather than an OperationRun schema change.
Reuse Filament wizard, action, and infolist seams plus existing Pest and Livewire test patterns instead of introducing a new UI shell or browser-first harness.

Phase 1 Design

Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/:

data-model.md: existing entities, narrow metadata additions, and derived restore safety models
contracts/restore-safety-integrity.openapi.yaml: internal logical contract for the wizard, create submission, restore detail, and restore-linked canonical operation detail
quickstart.md: focused automated and manual validation workflow for restore safety hardening

Design decisions:

No schema migration is required; the design reuses RestoreRun, OperationRun, and existing JSON-backed fields.
Historical execution truth may be captured inside existing RestoreRun.metadata as a narrow safety snapshot rather than as a new entity.
Wizard hardening remains inside RestoreRunResource::getWizardSteps() and CreateRestoreRun, with restore form component views displaying integrity state and guidance.
Result hardening remains inside existing restore detail infolist entry views and the restore-linked canonical operation detail seams.
Test coverage stays focused on restore wizard, restore detail, linked operation detail, hardening, ops-UX, and RBAC behavior.

Project Structure

Documentation (this feature)

specs/181-restore-safety-integrity/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── restore-safety-integrity.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md

Source Code (repository root)

app/
├── Filament/
│   ├── Pages/
│   │   └── Operations/
│   │       └── TenantlessOperationRunViewer.php
│   └── Resources/
│       ├── OperationRunResource.php
│       └── RestoreRunResource.php
│           └── Pages/
│               ├── CreateRestoreRun.php
│               ├── ListRestoreRuns.php
│               └── ViewRestoreRun.php
├── Models/
│   └── RestoreRun.php
├── Services/
│   └── Intune/
│       ├── RestoreDiffGenerator.php
│       ├── RestoreRiskChecker.php
│       └── RestoreService.php
├── Support/
│   ├── Badges/
│   │   └── Domains/
│   │       ├── RestoreCheckSeverityBadge.php
│   │       ├── RestorePreviewDecisionBadge.php
│   │       ├── RestoreResultStatusBadge.php
│   │       └── RestoreRunStatusBadge.php
│   ├── OpsUx/
│   │   └── OperationUxPresenter.php
│   └── RestoreRunStatus.php

resources/
└── views/
    └── filament/
        ├── forms/
        │   └── components/
        │       ├── restore-run-checks.blade.php
        │       └── restore-run-preview.blade.php
        └── infolists/
            └── entries/
                ├── restore-preview.blade.php
                └── restore-results.blade.php

tests/
├── Feature/
│   ├── Filament/
│   │   ├── RestorePreviewTest.php
│   │   ├── RestoreRunUiEnforcementTest.php
│   │   └── [new or expanded restore safety integrity page tests]
│   ├── OpsUx/
│   │   └── RestoreExecutionOperationRunSyncTest.php
│   ├── Operations/
│   │   └── [new or expanded restore-linked operation detail tests]
│   ├── Hardening/
│   │   └── [existing restore start gate tests]
│   ├── RestoreRiskChecksWizardTest.php
│   └── RestoreRunWizardExecuteTest.php
└── Unit/
    └── [new narrow restore safety resolver tests under app/Support]

Structure Decision: Standard Laravel monolith. The implementation stays inside existing Filament resources, Blade views, restore services, and monitoring seams. Any new helper types stay under existing app/Support or another already-established application namespace. No new base folders or standalone subsystems are required.

Implementation Strategy

Phase A — Introduce Scope Fingerprinting And Derived Integrity State

Goal: Create the smallest possible restore-safety layer that can explain whether preview and checks still apply to the current scope.

Step	File	Change
A.1	`app/Support/RestoreRunStatus.php` plus a new narrow restore safety helper namespace under `app/Support/`	Introduce derived scope fingerprint and integrity assessment helpers without changing persisted `RestoreRunStatus`, and make `invalidate_after_mutation` the explicit freshness policy for wizard-scoped evidence
A.2	`app/Models/RestoreRun.php`	Add narrow metadata accessors or helpers for `scope_basis`, `check_basis`, `preview_basis`, and `execution_safety_snapshot`
A.3	`app/Support/Badges/Domains/` and any shared primitive seam needed	Add central state-to-badge or label mappings only if the new integrity or safety states are surfaced as badges

Phase B — Harden Wizard Invalidation And Confirmation

Goal: Turn the existing wizard into an explicit restore safety gate instead of a sequence that silently forgets prior evaluation work.

Step	File	Change
B.1	`app/Filament/Resources/RestoreRunResource.php`	Extend `getWizardSteps()` to compute and compare scope fingerprints, preserve invalidation evidence, and separate execution readiness from safety readiness
B.2	`app/Filament/Resources/RestoreRunResource/Pages/CreateRestoreRun.php`	Ensure the final create flow validates current preview, current checks, matching fingerprint, and hard-confirm state before a real restore queues
B.3	`resources/views/filament/forms/components/restore-run-checks.blade.php`	Surface `current`, `stale`, `invalidated`, or `not_run` states with one primary next step
B.4	`resources/views/filament/forms/components/restore-run-preview.blade.php`	Surface preview integrity state, generated-at truth, and rerun guidance without calm false positives
B.5	`app/Filament/Resources/RestoreRunResource.php` or `app/Services/Intune/RestoreService.php`	Persist a narrow `execution_safety_snapshot` inside existing `RestoreRun.metadata` when a real restore is queued

Phase C — Harden Restore Result And Detail Truth

Goal: Ensure restore detail answers follow-up truth and next action before raw result lists.

Step	File	Change
C.1	`app/Filament/Resources/RestoreRunResource.php`	Build a result-attention model from existing `results`, assignment outcomes, and linked run context
C.2	`resources/views/filament/infolists/entries/restore-preview.blade.php`	Show which preview basis applied and whether it was current, stale, or invalidated
C.3	`resources/views/filament/infolists/entries/restore-results.blade.php`	Elevate overall result truth, follow-up truth, primary cause family, and one primary next action above raw item detail

Phase D — Preserve Restore Truth On Canonical Operation Detail

Goal: Keep restore-specific follow-up truth visible in canonical monitoring without duplicating restore persistence.

Step	File	Change
D.1	`app/Filament/Resources/OperationRunResource.php`	Add restore-linked continuation truth for `restore.execute` runs using existing restore linkage and tenant-safe deep-link behavior
D.2	`app/Filament/Pages/Operations/TenantlessOperationRunViewer.php`	Preserve restore-specific guidance or safe restore-detail links without broken navigation when deeper access is unavailable

Phase E — Regression Protection And Focused Verification

Goal: Lock the new safety semantics into automated tests and protect existing restore orchestration behavior.

Step	File	Change
E.1	`tests/Feature/RestoreRunWizardExecuteTest.php`	Extend confirmation coverage to include fingerprint and integrity-state validation
E.2	`tests/Feature/RestoreRiskChecksWizardTest.php`	Extend checks-state persistence and invalidation coverage
E.3	`tests/Feature/Filament/RestorePreviewTest.php` and new restore safety detail tests	Cover preview integrity, stale versus invalidated display, and calmness suppression
E.4	`tests/Feature/Filament/RestoreRunUiEnforcementTest.php`	Preserve 404 versus 403 behavior and disabled-action truth under reduced capability
E.5	`tests/Feature/OpsUx/RestoreExecutionOperationRunSyncTest.php` and new restore-linked operation detail tests	Preserve `OperationRun` continuity and restore-specific follow-up visibility from canonical monitoring
E.6	New unit tests under `tests/Unit/Support/`	Cover scope fingerprint generation, integrity classification, safety assessment, and result attention derivation
E.7	`vendor/bin/sail bin pint --dirty --format agent` and focused Pest runs	Required formatting and targeted verification before implementation is considered complete

Key Design Decisions

D-001 — Scope mismatch must be explicit, not inferred from missing data

The current wizard safety behavior already clears preview and checks when some scope inputs change. This plan formalizes that behavior as explicit invalidation truth so the operator can see that prior work existed and was invalidated by a specific change.

D-002 — Execution-time safety truth belongs to the restore run, not a new recovery entity

The operator needs historical truth about what basis was used when a real restore was queued. That justifies a narrow metadata snapshot on the existing RestoreRun but does not justify a second persisted model.

D-003 — Result meaning must be derived from existing restore outputs, not from `RestoreRun.status` alone

completed, partial, and failed remain important lifecycle statuses, but the operator-facing follow-up truth comes from the combination of lifecycle, item results, assignment outcomes, and linked operation context.

D-004 — Canonical operation detail must acknowledge restore-specific follow-up without becoming the restore source of truth

OperationRun stays the monitoring record. RestoreRun stays the restore truth. The canonical operation surface should expose restore continuation meaning or link to it, not clone restore persistence.

D-005 — Filament-native seams are sufficient for this hardening slice

Filament wizard steps, view fields, custom infolist views, confirmation patterns, and Livewire action tests already fit the feature. The plan therefore avoids a parallel UI framework or custom client-side state layer.

D-006 — Restore evidence freshness is mutation-sensitive, not age-window-driven

This slice uses the repo's existing invalidate_after_mutation freshness language for wizard-scoped derived state. Matching fingerprint plus valid capture markers is enough for current inside the active draft. invalidated represents explicit scope drift after a covered mutation, while stale is reserved for legacy or incomplete persisted evidence that cannot prove currentness.

Risk Assessment

Risk	Impact	Likelihood	Mitigation
Scope fingerprint is too narrow and misses a real execution-affecting change	High	Medium	Define the fingerprint from actual restore inputs used by checks and preview, cover it with unit tests and wizard regression tests
Historical safety truth drifts if the detail page recomputes everything from current logic	High	Medium	Persist a narrow execution-time safety snapshot on the existing restore run
New integrity states exist but the UI still reads calmly	High	Medium	Lock calmness suppression into wizard and detail tests, not only into helper code
Restore-specific truth disappears on canonical operation detail	Medium	Medium	Add explicit restore continuation coverage on the operation detail seams
The slice grows into a recovery dashboard or new persisted health system	Medium	Low	Keep the design constrained to existing restore and operation records, with no new table

Test Strategy

Extend existing restore wizard, preview, hardening, RBAC, and ops-UX Pest coverage before adding any new test harness.
Add unit tests for the narrow derived restore safety helpers so fingerprint, integrity, safety, and result attention logic stay deterministic.
Extend existing restore audit, execution-job, and preview-diff tests so invalidation reasoning remains derivable from restore records and the current execution and diff flows remain behaviorally intact.
Add feature tests that prove stale or invalidated preview and checks suppress calm execution language.
Add feature tests that prove scope changes invalidate prior readiness and that confirm-step validation refuses calm execution when integrity conditions are not met.
Add feature tests that prove partial or completed-with-follow-up results are elevated above raw item lists and do not imply tenant recovery.
Add canonical operation-detail tests that prove restore follow-up truth remains visible or safely linked.
Re-run the existing ops-UX constitution and notification guards for direct status transitions, terminal DB notifications, canonical View run links, queued toast copy, and whitelisted summary_counts so reuse of OperationRun cannot regress the three-surface feedback contract.
Keep the manual quickstart.md validation pass as an explicit completion step so the 15-second and one-click operator outcomes are verified, not merely assumed from automated coverage.
Keep all tests Livewire v4 compatible and run the smallest affected subset through Sail before asking for a full-suite pass.

Complexity Tracking

No constitution violations or exception-driven complexity were identified. The only added complexity is the narrow derived restore-safety layer and the compact persisted execution-time safety snapshot already justified by the proportionality review.

Proportionality Review

Current operator problem: Operators can currently treat stale preview or stale checks as if they still authorize the current restore scope, and can read completed as calmer than the product can prove.
Existing structure is insufficient because: Existing restore flow data exists, but presence alone does not distinguish current versus invalid or safe versus merely executable. Existing result rendering does not elevate follow-up truth strongly enough.
Narrowest correct implementation: Add a narrow derived restore-safety layer plus optional nested metadata snapshotting on the existing restore run. Reuse existing wizard, result, and operation-detail surfaces instead of creating a second workflow or persistence model.
Ownership cost created: A small set of derived helpers, central state mapping, new view-model wiring, and additional unit and feature tests.
Alternative intentionally rejected: A new recovery-health table, a tenant-wide recovery dashboard, or a generalized trust framework. Each was rejected as too broad for the current operator problem.
Release truth: Current-release truth. This feature hardens already-shipped restore behavior before broader backup-quality or recovery-confidence work depends on it.

23 KiB Raw Blame History