TenantAtlas/specs/181-restore-safety-integrity/plan.md

23 KiB

Implementation Plan: Restore Safety Integrity

Branch: 181-restore-safety-integrity | Date: 2026-04-06 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/spec.md

Summary

Harden the restore flow so operators can distinguish stale versus current preview truth, stale versus current checks truth, technical startability versus safety readiness, and run completion versus real follow-up truth without adding a new recovery persistence model. The implementation keeps RestoreRun and OperationRun as the existing sources of truth, introduces a narrow derived restore-safety layer for scope fingerprinting and integrity assessment, persists only a compact execution-time safety snapshot inside existing RestoreRun.metadata when needed, hardens the wizard and detail surfaces, and preserves restore-specific truth on the canonical operation detail page.

Key approach: work inside the existing RestoreRunResource, CreateRestoreRun, restore form component views, restore infolist entry views, and restore-linked OperationRunResource seams; add derived restore safety helpers under the existing application structure; keep all changes Filament v5 and Livewire v4 compliant; avoid new tables and new Graph contract paths; validate the result with focused Pest, Livewire, hardening, ops-UX, and RBAC coverage.

Technical Context

Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing RestoreRunResource, RestoreService, RestoreRiskChecker, RestoreDiffGenerator, OperationRunResource, TenantlessOperationRunViewer, shared badge infrastructure, and existing RBAC or write-gate helpers
Storage: PostgreSQL with existing restore_runs and operation_runs records plus JSON or array-backed metadata, preview, results, and context; no schema change planned
Testing: Pest feature tests, Livewire page and action tests, unit tests for narrow derived restore-safety helpers, all run through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production Project Type: Laravel monolith web application
Performance Goals: Keep restore wizard and detail surfaces server-driven, avoid new render-time external calls, preserve quick operator scanability on confirm and result surfaces, and keep canonical operation detail DB-only at render time
Constraints: No new central recovery-state table, no new Graph contract path, no route identity change, no RBAC drift, no collapse of executable versus safe versus recovered semantics, no ad-hoc badge mappings, and no new global Filament assets
Scale/Scope: One tenant-scoped restore wizard, one tenant restore detail surface, one restore-linked canonical operation detail surface, a narrow derived restore-safety layer, and focused regression coverage across wizard, result, RBAC, and ops-UX behavior

Constitution Check

GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.

Principle Status Notes
Inventory-first Pass Backups remain immutable snapshots and no inventory ownership rule changes
Read/write separation Pass Real restore execution stays behind preview, checks, hard confirmation, audit, and tests
Graph contract path Pass No new Graph endpoints or contract registry changes; existing restore calls stay behind current restore services and GraphClientInterface
Deterministic capabilities Pass Existing capability registry and UiEnforcement or capability resolver remain authoritative
RBAC-UX planes and 404 vs 403 Pass Tenant restore surfaces remain tenant-scoped; canonical /admin/operations/{run} remains workspace-safe and tenant-safe
Workspace isolation Pass No workspace scope broadening; canonical monitoring remains workspace-member gated
Tenant isolation Pass Restore runs, restore previews, checks, and result detail stay tenant-owned and tenant-entitled
Dangerous and destructive confirmations Pass Existing archive, restore, rerun, and force-delete actions remain confirmation-gated; real execution remains hard-confirmed in the wizard
Global search safety Pass OperationRunResource already remains non-globally-searchable; this feature adds no new globally searchable resource. RestoreRunResource is not made newly searchable, and it already has a view page if search is later enabled
Run observability Pass Existing restore.execute operations continue to create or reuse OperationRun; no new run model is introduced
Ops-UX 3-surface feedback Pass Existing queued toast, progress surfaces, and terminal monitoring behavior remain authoritative
Ops-UX lifecycle ownership Pass OperationRun.status and OperationRun.outcome remain service-owned; this feature only adds restore-specific read truth
Ops-UX summary counts Pass No new OperationRun summary-count keys are required; restore-specific integrity stays on restore context
Data minimization Pass No new secrets or external payload exposure; detail diagnostics remain secondary
Proportionality (PROP-001) Pass New logic is limited to derived restore-safety helpers and optional nested metadata snapshotting on existing records
Persisted truth (PERSIST-001) Pass No new table; only a narrow execution-time safety snapshot may be stored on the existing restore run
Behavioral state (STATE-001) Pass New integrity, safety, and follow-up states directly change operator guidance and execution gating semantics
Badge semantics (BADGE-001) Pass Any new restore safety badges or chips must route through central badge or shared primitive semantics, not page-local mapping
Filament-native UI (UI-FIL-001) Pass Existing Filament wizard, sections, view fields, infolist entries, and shared primitives remain the primary UI seams
UI naming (UI-NAMING-001) Pass The plan preserves preview, checks, dry-run, restore, partial, and follow-up as operator vocabulary
Operator surfaces (OPSURF-001) Pass Wizard and result surfaces become more operator-first, not more diagnostic-first
Filament Action Surface Contract Pass No redundant view actions or empty action groups are introduced; list inspect model remains row click
Filament UX-001 Pass with documented variance The wizard remains structured and the detail page remains infolist-based with custom entry views, but still follows summary-first information architecture
Filament v5 / Livewire v4 compliance Pass The implementation stays inside the current Filament v5 and Livewire v4 stack
Provider registration location Pass No panel or provider changes; Laravel 11+ provider registration remains in bootstrap/providers.php
Asset strategy Pass No new panel assets are planned; deployment keeps the existing php artisan filament:assets step unchanged

Phase 0 Research

Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/research.md.

Key decisions:

  • Derive a deterministic restore scope fingerprint from existing wizard inputs instead of introducing a new persisted scope entity.
  • Separate preview and checks integrity from blocker and warning severity so no blockers can no longer be misread as safe.
  • Preserve invalidation evidence in wizard state instead of silently clearing prior preview and checks truth.
  • Persist only a narrow execution-time safety snapshot inside RestoreRun.metadata when historical truth is required for restore detail.
  • Derive result follow-up truth from existing results, assignment outcomes, and linked OperationRun outcome without adding a recovery entity.
  • Preserve restore-specific follow-up truth on canonical operation detail via enrichment or a safe deep link rather than an OperationRun schema change.
  • Reuse Filament wizard, action, and infolist seams plus existing Pest and Livewire test patterns instead of introducing a new UI shell or browser-first harness.

Phase 1 Design

Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/181-restore-safety-integrity/:

  • data-model.md: existing entities, narrow metadata additions, and derived restore safety models
  • contracts/restore-safety-integrity.openapi.yaml: internal logical contract for the wizard, create submission, restore detail, and restore-linked canonical operation detail
  • quickstart.md: focused automated and manual validation workflow for restore safety hardening

Design decisions:

  • No schema migration is required; the design reuses RestoreRun, OperationRun, and existing JSON-backed fields.
  • Historical execution truth may be captured inside existing RestoreRun.metadata as a narrow safety snapshot rather than as a new entity.
  • Wizard hardening remains inside RestoreRunResource::getWizardSteps() and CreateRestoreRun, with restore form component views displaying integrity state and guidance.
  • Result hardening remains inside existing restore detail infolist entry views and the restore-linked canonical operation detail seams.
  • Test coverage stays focused on restore wizard, restore detail, linked operation detail, hardening, ops-UX, and RBAC behavior.

Project Structure

Documentation (this feature)

specs/181-restore-safety-integrity/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── restore-safety-integrity.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md

Source Code (repository root)

app/
├── Filament/
│   ├── Pages/
│   │   └── Operations/
│   │       └── TenantlessOperationRunViewer.php
│   └── Resources/
│       ├── OperationRunResource.php
│       └── RestoreRunResource.php
│           └── Pages/
│               ├── CreateRestoreRun.php
│               ├── ListRestoreRuns.php
│               └── ViewRestoreRun.php
├── Models/
│   └── RestoreRun.php
├── Services/
│   └── Intune/
│       ├── RestoreDiffGenerator.php
│       ├── RestoreRiskChecker.php
│       └── RestoreService.php
├── Support/
│   ├── Badges/
│   │   └── Domains/
│   │       ├── RestoreCheckSeverityBadge.php
│   │       ├── RestorePreviewDecisionBadge.php
│   │       ├── RestoreResultStatusBadge.php
│   │       └── RestoreRunStatusBadge.php
│   ├── OpsUx/
│   │   └── OperationUxPresenter.php
│   └── RestoreRunStatus.php

resources/
└── views/
    └── filament/
        ├── forms/
        │   └── components/
        │       ├── restore-run-checks.blade.php
        │       └── restore-run-preview.blade.php
        └── infolists/
            └── entries/
                ├── restore-preview.blade.php
                └── restore-results.blade.php

tests/
├── Feature/
│   ├── Filament/
│   │   ├── RestorePreviewTest.php
│   │   ├── RestoreRunUiEnforcementTest.php
│   │   └── [new or expanded restore safety integrity page tests]
│   ├── OpsUx/
│   │   └── RestoreExecutionOperationRunSyncTest.php
│   ├── Operations/
│   │   └── [new or expanded restore-linked operation detail tests]
│   ├── Hardening/
│   │   └── [existing restore start gate tests]
│   ├── RestoreRiskChecksWizardTest.php
│   └── RestoreRunWizardExecuteTest.php
└── Unit/
    └── [new narrow restore safety resolver tests under app/Support]

Structure Decision: Standard Laravel monolith. The implementation stays inside existing Filament resources, Blade views, restore services, and monitoring seams. Any new helper types stay under existing app/Support or another already-established application namespace. No new base folders or standalone subsystems are required.

Implementation Strategy

Phase A — Introduce Scope Fingerprinting And Derived Integrity State

Goal: Create the smallest possible restore-safety layer that can explain whether preview and checks still apply to the current scope.

Step File Change
A.1 app/Support/RestoreRunStatus.php plus a new narrow restore safety helper namespace under app/Support/ Introduce derived scope fingerprint and integrity assessment helpers without changing persisted RestoreRunStatus, and make invalidate_after_mutation the explicit freshness policy for wizard-scoped evidence
A.2 app/Models/RestoreRun.php Add narrow metadata accessors or helpers for scope_basis, check_basis, preview_basis, and execution_safety_snapshot
A.3 app/Support/Badges/Domains/ and any shared primitive seam needed Add central state-to-badge or label mappings only if the new integrity or safety states are surfaced as badges

Phase B — Harden Wizard Invalidation And Confirmation

Goal: Turn the existing wizard into an explicit restore safety gate instead of a sequence that silently forgets prior evaluation work.

Step File Change
B.1 app/Filament/Resources/RestoreRunResource.php Extend getWizardSteps() to compute and compare scope fingerprints, preserve invalidation evidence, and separate execution readiness from safety readiness
B.2 app/Filament/Resources/RestoreRunResource/Pages/CreateRestoreRun.php Ensure the final create flow validates current preview, current checks, matching fingerprint, and hard-confirm state before a real restore queues
B.3 resources/views/filament/forms/components/restore-run-checks.blade.php Surface current, stale, invalidated, or not_run states with one primary next step
B.4 resources/views/filament/forms/components/restore-run-preview.blade.php Surface preview integrity state, generated-at truth, and rerun guidance without calm false positives
B.5 app/Filament/Resources/RestoreRunResource.php or app/Services/Intune/RestoreService.php Persist a narrow execution_safety_snapshot inside existing RestoreRun.metadata when a real restore is queued

Phase C — Harden Restore Result And Detail Truth

Goal: Ensure restore detail answers follow-up truth and next action before raw result lists.

Step File Change
C.1 app/Filament/Resources/RestoreRunResource.php Build a result-attention model from existing results, assignment outcomes, and linked run context
C.2 resources/views/filament/infolists/entries/restore-preview.blade.php Show which preview basis applied and whether it was current, stale, or invalidated
C.3 resources/views/filament/infolists/entries/restore-results.blade.php Elevate overall result truth, follow-up truth, primary cause family, and one primary next action above raw item detail

Phase D — Preserve Restore Truth On Canonical Operation Detail

Goal: Keep restore-specific follow-up truth visible in canonical monitoring without duplicating restore persistence.

Step File Change
D.1 app/Filament/Resources/OperationRunResource.php Add restore-linked continuation truth for restore.execute runs using existing restore linkage and tenant-safe deep-link behavior
D.2 app/Filament/Pages/Operations/TenantlessOperationRunViewer.php Preserve restore-specific guidance or safe restore-detail links without broken navigation when deeper access is unavailable

Phase E — Regression Protection And Focused Verification

Goal: Lock the new safety semantics into automated tests and protect existing restore orchestration behavior.

Step File Change
E.1 tests/Feature/RestoreRunWizardExecuteTest.php Extend confirmation coverage to include fingerprint and integrity-state validation
E.2 tests/Feature/RestoreRiskChecksWizardTest.php Extend checks-state persistence and invalidation coverage
E.3 tests/Feature/Filament/RestorePreviewTest.php and new restore safety detail tests Cover preview integrity, stale versus invalidated display, and calmness suppression
E.4 tests/Feature/Filament/RestoreRunUiEnforcementTest.php Preserve 404 versus 403 behavior and disabled-action truth under reduced capability
E.5 tests/Feature/OpsUx/RestoreExecutionOperationRunSyncTest.php and new restore-linked operation detail tests Preserve OperationRun continuity and restore-specific follow-up visibility from canonical monitoring
E.6 New unit tests under tests/Unit/Support/ Cover scope fingerprint generation, integrity classification, safety assessment, and result attention derivation
E.7 vendor/bin/sail bin pint --dirty --format agent and focused Pest runs Required formatting and targeted verification before implementation is considered complete

Key Design Decisions

D-001 — Scope mismatch must be explicit, not inferred from missing data

The current wizard safety behavior already clears preview and checks when some scope inputs change. This plan formalizes that behavior as explicit invalidation truth so the operator can see that prior work existed and was invalidated by a specific change.

D-002 — Execution-time safety truth belongs to the restore run, not a new recovery entity

The operator needs historical truth about what basis was used when a real restore was queued. That justifies a narrow metadata snapshot on the existing RestoreRun but does not justify a second persisted model.

D-003 — Result meaning must be derived from existing restore outputs, not from RestoreRun.status alone

completed, partial, and failed remain important lifecycle statuses, but the operator-facing follow-up truth comes from the combination of lifecycle, item results, assignment outcomes, and linked operation context.

D-004 — Canonical operation detail must acknowledge restore-specific follow-up without becoming the restore source of truth

OperationRun stays the monitoring record. RestoreRun stays the restore truth. The canonical operation surface should expose restore continuation meaning or link to it, not clone restore persistence.

D-005 — Filament-native seams are sufficient for this hardening slice

Filament wizard steps, view fields, custom infolist views, confirmation patterns, and Livewire action tests already fit the feature. The plan therefore avoids a parallel UI framework or custom client-side state layer.

D-006 — Restore evidence freshness is mutation-sensitive, not age-window-driven

This slice uses the repo's existing invalidate_after_mutation freshness language for wizard-scoped derived state. Matching fingerprint plus valid capture markers is enough for current inside the active draft. invalidated represents explicit scope drift after a covered mutation, while stale is reserved for legacy or incomplete persisted evidence that cannot prove currentness.

Risk Assessment

Risk Impact Likelihood Mitigation
Scope fingerprint is too narrow and misses a real execution-affecting change High Medium Define the fingerprint from actual restore inputs used by checks and preview, cover it with unit tests and wizard regression tests
Historical safety truth drifts if the detail page recomputes everything from current logic High Medium Persist a narrow execution-time safety snapshot on the existing restore run
New integrity states exist but the UI still reads calmly High Medium Lock calmness suppression into wizard and detail tests, not only into helper code
Restore-specific truth disappears on canonical operation detail Medium Medium Add explicit restore continuation coverage on the operation detail seams
The slice grows into a recovery dashboard or new persisted health system Medium Low Keep the design constrained to existing restore and operation records, with no new table

Test Strategy

  • Extend existing restore wizard, preview, hardening, RBAC, and ops-UX Pest coverage before adding any new test harness.
  • Add unit tests for the narrow derived restore safety helpers so fingerprint, integrity, safety, and result attention logic stay deterministic.
  • Extend existing restore audit, execution-job, and preview-diff tests so invalidation reasoning remains derivable from restore records and the current execution and diff flows remain behaviorally intact.
  • Add feature tests that prove stale or invalidated preview and checks suppress calm execution language.
  • Add feature tests that prove scope changes invalidate prior readiness and that confirm-step validation refuses calm execution when integrity conditions are not met.
  • Add feature tests that prove partial or completed-with-follow-up results are elevated above raw item lists and do not imply tenant recovery.
  • Add canonical operation-detail tests that prove restore follow-up truth remains visible or safely linked.
  • Re-run the existing ops-UX constitution and notification guards for direct status transitions, terminal DB notifications, canonical View run links, queued toast copy, and whitelisted summary_counts so reuse of OperationRun cannot regress the three-surface feedback contract.
  • Keep the manual quickstart.md validation pass as an explicit completion step so the 15-second and one-click operator outcomes are verified, not merely assumed from automated coverage.
  • Keep all tests Livewire v4 compatible and run the smallest affected subset through Sail before asking for a full-suite pass.

Complexity Tracking

No constitution violations or exception-driven complexity were identified. The only added complexity is the narrow derived restore-safety layer and the compact persisted execution-time safety snapshot already justified by the proportionality review.

Proportionality Review

  • Current operator problem: Operators can currently treat stale preview or stale checks as if they still authorize the current restore scope, and can read completed as calmer than the product can prove.
  • Existing structure is insufficient because: Existing restore flow data exists, but presence alone does not distinguish current versus invalid or safe versus merely executable. Existing result rendering does not elevate follow-up truth strongly enough.
  • Narrowest correct implementation: Add a narrow derived restore-safety layer plus optional nested metadata snapshotting on the existing restore run. Reuse existing wizard, result, and operation-detail surfaces instead of creating a second workflow or persistence model.
  • Ownership cost created: A small set of derived helpers, central state mapping, new view-model wiring, and additional unit and feature tests.
  • Alternative intentionally rejected: A new recovery-health table, a tenant-wide recovery dashboard, or a generalized trust framework. Each was rejected as too broad for the current operator problem.
  • Release truth: Current-release truth. This feature hardens already-shipped restore behavior before broader backup-quality or recovery-confidence work depends on it.