Research: Restore Safety Integrity

Decision 1: Derive a deterministic scope fingerprint from existing restore inputs instead of creating a new persisted scope entity

Decision: Represent restore scope identity as a deterministic fingerprint derived from the existing restore inputs that materially change checked or written behavior: backup_set_id, scope mode, sorted selected item IDs, and normalized group mapping values. Persist that fingerprint only inside existing RestoreRun metadata when historical execution truth needs to be retained.
Rationale: The current restore domain already has the raw inputs on RestoreRun and in the wizard state. The missing truth is not a new entity but a reliable way to say whether checks and preview still apply to the current selection. A derived fingerprint solves the mismatch problem without introducing a new table or second scope model.
Alternatives considered:
- Use only timestamps to decide whether preview or checks are current. Rejected because time alone cannot detect scope mismatch.
- Create a dedicated persisted restore_scope_snapshots table. Rejected because the scope has no independent lifecycle outside the restore run and would violate the feature's proportionality goal.

Decision 2: Keep integrity states separate from risk severity

Decision: Model preview integrity and checks integrity as derived state families separate from RestoreRiskChecker severities. current, stale, invalidated, and not_generated or not_run answer whether the basis is still trustworthy; blocking and warning counts continue to answer what the risk checker found.
Rationale: Existing risk checks already classify blockers and warnings, but they do not answer whether the evaluated scope still matches the operator's current selection. Treating these as one concept would continue the current trust failure where no blockers can be misread as safe.
Alternatives considered:
- Reuse blocker or warning severity to encode staleness and mismatch. Rejected because severity and integrity have different operator consequences.
- Collapse integrity into one generic needs rerun label. Rejected because the UI needs to distinguish never run, stale, and invalidated as different truths.

Decision 3: Preserve invalidation evidence in wizard state instead of silently clearing prior work

Decision: Replace the current silent reset behavior for preview and checks with explicit invalidation evidence in the wizard state. The last generated basis may still be cleared from being executable truth, but the operator should see that a prior preview or check existed and no longer applies.
Rationale: The current wizard already resets check_summary, check_results, checks_ran_at, preview_summary, preview_diffs, and preview_ran_at when scope-affecting inputs change. That preserves safety mechanically, but it does not preserve the operator truth that the prior work was invalidated by a change they just made.
Alternatives considered:
- Keep the current silent clearing behavior and add helper text only. Rejected because it still reads too much like not generated instead of invalidated by your change.
- Keep the old values visible without marking them invalid. Rejected because it risks making stale truth look reusable.

Decision 4: Persist only a narrow execution-time safety snapshot on the existing restore run

Decision: When a real restore is queued, persist a compact execution-time safety snapshot inside existing RestoreRun metadata. The snapshot should capture the scope fingerprint, preview basis, checks basis, derived safety state, and primary blocker or warning context that justified or constrained execution.
Rationale: The result and detail surfaces need historical truth about what basis was used at confirmation time. Re-deriving that later from mutable thresholds or current UI logic risks rewriting history. A narrow metadata snapshot keeps the audit-relevant truth on the existing restore record without creating a second persisted model.
Alternatives considered:
- Recompute execution-time safety state dynamically from the current code and current timestamps. Rejected because historical truth can drift as code or thresholds change.
- Persist a full recovery-health document. Rejected because this feature does not claim tenant-wide recovery truth.

Decision 5: Derive result follow-up truth from existing restore results and operation outcomes instead of adding a recovery entity

Decision: Compute completed, partial, failed, and completed_with_follow_up from existing restore results, assignment outcomes, metadata, and linked OperationRun outcome. Treat cause families and next actions as derived read-model fields for the detail surfaces.
Rationale: RestoreRun.results, assignment outcomes, and operation-run linkage already contain enough signal to decide whether operator follow-up remains. The product problem is weak surfacing of that truth, not missing domain storage.
Alternatives considered:
- Add a dedicated persisted recovery status column or table. Rejected because the feature does not need a second source of truth.
- Use only RestoreRun.status as the result meaning. Rejected because completed does not mean recovered and partial does not explain the operator consequence on its own.

Decision 6: Keep restore-specific follow-up truth visible on the canonical operation detail through enrichment or a safe deep link

Decision: Reuse the existing restore-to-operation linkage and enrich the canonical operation detail for restore.execute runs with restore follow-up truth or a single safe route into the restore detail page. Do not add new OperationRun persistence for restore-specific state.
Rationale: Canonical monitoring is already the shared destination for operational truth. The feature must keep restore meaning visible there, but the restore-specific source of truth still belongs to RestoreRun.
Alternatives considered:
- Persist restore-follow-up labels directly on OperationRun. Rejected because it duplicates restore truth into the monitoring record.
- Leave canonical operation detail generic and rely entirely on restore detail for follow-up truth. Rejected because it breaks continuity from monitoring.

Decision 7: Reuse Filament wizard, action, and infolist seams already present in the codebase

Decision: Implement the feature inside the existing RestoreRunResource::getWizardSteps(), CreateRestoreRun, restore form component views, restore infolist entry views, and the existing canonical operation detail seams. Rely on Filament wizard lifecycle hooks and action testing patterns rather than inventing a new UI shell.
Rationale: Filament v5 already supports wizard step validation hooks, confirmation modals for actions, and direct action testing. Existing restore surfaces are already built on these seams, so a narrow hardening slice should stay inside them.
Alternatives considered:
- Rebuild restore safety as a custom standalone screen outside Filament. Rejected because it would duplicate current routing, RBAC, and UI patterns.
- Push interactivity into custom infolist entry classes. Rejected because Filament custom infolist entries are display-oriented, not Livewire components, and the current restore detail need is presentation hardening rather than a new client-side interaction model.

Decision 8: Extend the existing Pest and Livewire test surface instead of creating a new browser-first harness

Decision: Add focused unit and feature coverage around the new integrity resolvers, wizard invalidation, confirmation hardening, result attention, canonical operation continuity, and RBAC-safe degradation by extending the existing restore-related Pest and Livewire tests.
Rationale: The repository already has strong restore wizard, preview, execution, hardening, RBAC, and ops-UX regression coverage. Filament's testing guidance supports direct action invocation and visibility assertions, which fit this feature precisely.
Alternatives considered:
- Rely only on manual UI validation. Rejected because this slice is specifically about preventing subtle trust regressions.
- Add a large browser-only suite as the primary guard. Rejected because the critical assertions are server-driven state and action consequences that fit existing Pest and Livewire tests better.

8.3 KiB Raw Blame History

Research: Restore Safety Integrity

Decision 1: Derive a deterministic scope fingerprint from existing restore inputs instead of creating a new persisted scope entity

Decision 2: Keep integrity states separate from risk severity

Decision 3: Preserve invalidation evidence in wizard state instead of silently clearing prior work

Decision 4: Persist only a narrow execution-time safety snapshot on the existing restore run

Decision 5: Derive result follow-up truth from existing restore results and operation outcomes instead of adding a recovery entity

Decision 6: Keep restore-specific follow-up truth visible on the canonical operation detail through enrichment or a safe deep link

Decision 7: Reuse Filament wizard, action, and infolist seams already present in the codebase

Decision 8: Extend the existing Pest and Livewire test surface instead of creating a new browser-first harness

8.3 KiB

Raw Blame History