TenantAtlas/specs/181-restore-safety-integrity/research.md
ahmido a107e7e41b feat: restore safety integrity and queue slide-over (#210)
## Summary
- add the Spec 181 restore-safety layer with scope fingerprinting, preview/check integrity states, execution safety snapshots, result attention, and operator-facing copy across the wizard, restore detail, and canonical operation detail
- add focused unit and feature coverage for restore-safety assessment, result attention, and restore-linked operation detail
- switch the finding exceptions queue `Inspect exception` action to a native Filament slide-over while preserving query-param-backed inline summary behavior

## Testing
- `vendor/bin/sail artisan test --compact tests/Feature/Monitoring/FindingExceptionsQueueTest.php tests/Feature/Filament/RestoreSafetyIntegrityWizardTest.php tests/Feature/Filament/RestoreResultAttentionSurfaceTest.php tests/Feature/Operations/RestoreLinkedOperationDetailTest.php tests/Unit/Support/RestoreSafety`

## Notes
- Spec 181 checklist is complete (`specs/181-restore-safety-integrity/checklists/requirements.md`)
- the branch still has unchecked follow-up tasks in `specs/181-restore-safety-integrity/tasks.md`: `T012`, `T018`, `T019`, `T023`, `T025`, `T029`, `T032`, `T033`, `T041`, `T042`, `T043`, `T044`
- Filament v5 / Livewire v4 compliance is preserved, no panel provider registration changes were made, no global-search behavior was added, destructive actions remain confirmation-gated, and no new Filament assets were introduced

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #210
2026-04-06 23:37:14 +00:00

8.3 KiB

Research: Restore Safety Integrity

Decision 1: Derive a deterministic scope fingerprint from existing restore inputs instead of creating a new persisted scope entity

  • Decision: Represent restore scope identity as a deterministic fingerprint derived from the existing restore inputs that materially change checked or written behavior: backup_set_id, scope mode, sorted selected item IDs, and normalized group mapping values. Persist that fingerprint only inside existing RestoreRun metadata when historical execution truth needs to be retained.
  • Rationale: The current restore domain already has the raw inputs on RestoreRun and in the wizard state. The missing truth is not a new entity but a reliable way to say whether checks and preview still apply to the current selection. A derived fingerprint solves the mismatch problem without introducing a new table or second scope model.
  • Alternatives considered:
    • Use only timestamps to decide whether preview or checks are current. Rejected because time alone cannot detect scope mismatch.
    • Create a dedicated persisted restore_scope_snapshots table. Rejected because the scope has no independent lifecycle outside the restore run and would violate the feature's proportionality goal.

Decision 2: Keep integrity states separate from risk severity

  • Decision: Model preview integrity and checks integrity as derived state families separate from RestoreRiskChecker severities. current, stale, invalidated, and not_generated or not_run answer whether the basis is still trustworthy; blocking and warning counts continue to answer what the risk checker found.
  • Rationale: Existing risk checks already classify blockers and warnings, but they do not answer whether the evaluated scope still matches the operator's current selection. Treating these as one concept would continue the current trust failure where no blockers can be misread as safe.
  • Alternatives considered:
    • Reuse blocker or warning severity to encode staleness and mismatch. Rejected because severity and integrity have different operator consequences.
    • Collapse integrity into one generic needs rerun label. Rejected because the UI needs to distinguish never run, stale, and invalidated as different truths.

Decision 3: Preserve invalidation evidence in wizard state instead of silently clearing prior work

  • Decision: Replace the current silent reset behavior for preview and checks with explicit invalidation evidence in the wizard state. The last generated basis may still be cleared from being executable truth, but the operator should see that a prior preview or check existed and no longer applies.
  • Rationale: The current wizard already resets check_summary, check_results, checks_ran_at, preview_summary, preview_diffs, and preview_ran_at when scope-affecting inputs change. That preserves safety mechanically, but it does not preserve the operator truth that the prior work was invalidated by a change they just made.
  • Alternatives considered:
    • Keep the current silent clearing behavior and add helper text only. Rejected because it still reads too much like not generated instead of invalidated by your change.
    • Keep the old values visible without marking them invalid. Rejected because it risks making stale truth look reusable.

Decision 4: Persist only a narrow execution-time safety snapshot on the existing restore run

  • Decision: When a real restore is queued, persist a compact execution-time safety snapshot inside existing RestoreRun metadata. The snapshot should capture the scope fingerprint, preview basis, checks basis, derived safety state, and primary blocker or warning context that justified or constrained execution.
  • Rationale: The result and detail surfaces need historical truth about what basis was used at confirmation time. Re-deriving that later from mutable thresholds or current UI logic risks rewriting history. A narrow metadata snapshot keeps the audit-relevant truth on the existing restore record without creating a second persisted model.
  • Alternatives considered:
    • Recompute execution-time safety state dynamically from the current code and current timestamps. Rejected because historical truth can drift as code or thresholds change.
    • Persist a full recovery-health document. Rejected because this feature does not claim tenant-wide recovery truth.

Decision 5: Derive result follow-up truth from existing restore results and operation outcomes instead of adding a recovery entity

  • Decision: Compute completed, partial, failed, and completed_with_follow_up from existing restore results, assignment outcomes, metadata, and linked OperationRun outcome. Treat cause families and next actions as derived read-model fields for the detail surfaces.
  • Rationale: RestoreRun.results, assignment outcomes, and operation-run linkage already contain enough signal to decide whether operator follow-up remains. The product problem is weak surfacing of that truth, not missing domain storage.
  • Alternatives considered:
    • Add a dedicated persisted recovery status column or table. Rejected because the feature does not need a second source of truth.
    • Use only RestoreRun.status as the result meaning. Rejected because completed does not mean recovered and partial does not explain the operator consequence on its own.
  • Decision: Reuse the existing restore-to-operation linkage and enrich the canonical operation detail for restore.execute runs with restore follow-up truth or a single safe route into the restore detail page. Do not add new OperationRun persistence for restore-specific state.
  • Rationale: Canonical monitoring is already the shared destination for operational truth. The feature must keep restore meaning visible there, but the restore-specific source of truth still belongs to RestoreRun.
  • Alternatives considered:
    • Persist restore-follow-up labels directly on OperationRun. Rejected because it duplicates restore truth into the monitoring record.
    • Leave canonical operation detail generic and rely entirely on restore detail for follow-up truth. Rejected because it breaks continuity from monitoring.

Decision 7: Reuse Filament wizard, action, and infolist seams already present in the codebase

  • Decision: Implement the feature inside the existing RestoreRunResource::getWizardSteps(), CreateRestoreRun, restore form component views, restore infolist entry views, and the existing canonical operation detail seams. Rely on Filament wizard lifecycle hooks and action testing patterns rather than inventing a new UI shell.
  • Rationale: Filament v5 already supports wizard step validation hooks, confirmation modals for actions, and direct action testing. Existing restore surfaces are already built on these seams, so a narrow hardening slice should stay inside them.
  • Alternatives considered:
    • Rebuild restore safety as a custom standalone screen outside Filament. Rejected because it would duplicate current routing, RBAC, and UI patterns.
    • Push interactivity into custom infolist entry classes. Rejected because Filament custom infolist entries are display-oriented, not Livewire components, and the current restore detail need is presentation hardening rather than a new client-side interaction model.

Decision 8: Extend the existing Pest and Livewire test surface instead of creating a new browser-first harness

  • Decision: Add focused unit and feature coverage around the new integrity resolvers, wizard invalidation, confirmation hardening, result attention, canonical operation continuity, and RBAC-safe degradation by extending the existing restore-related Pest and Livewire tests.
  • Rationale: The repository already has strong restore wizard, preview, execution, hardening, RBAC, and ops-UX regression coverage. Filament's testing guidance supports direct action invocation and visibility assertions, which fit this feature precisely.
  • Alternatives considered:
    • Rely only on manual UI validation. Rejected because this slice is specifically about preventing subtle trust regressions.
    • Add a large browser-only suite as the primary guard. Rejected because the critical assertions are server-driven state and action consequences that fit existing Pest and Livewire tests better.