8.3 KiB
8.3 KiB
Research: Restore Safety Integrity
Decision 1: Derive a deterministic scope fingerprint from existing restore inputs instead of creating a new persisted scope entity
- Decision: Represent restore scope identity as a deterministic fingerprint derived from the existing restore inputs that materially change checked or written behavior:
backup_set_id, scope mode, sorted selected item IDs, and normalized group mapping values. Persist that fingerprint only inside existingRestoreRunmetadata when historical execution truth needs to be retained. - Rationale: The current restore domain already has the raw inputs on
RestoreRunand in the wizard state. The missing truth is not a new entity but a reliable way to say whether checks and preview still apply to the current selection. A derived fingerprint solves the mismatch problem without introducing a new table or second scope model. - Alternatives considered:
- Use only timestamps to decide whether preview or checks are current. Rejected because time alone cannot detect scope mismatch.
- Create a dedicated persisted
restore_scope_snapshotstable. Rejected because the scope has no independent lifecycle outside the restore run and would violate the feature's proportionality goal.
Decision 2: Keep integrity states separate from risk severity
- Decision: Model preview integrity and checks integrity as derived state families separate from
RestoreRiskCheckerseverities.current,stale,invalidated, andnot_generatedornot_runanswer whether the basis is still trustworthy; blocking and warning counts continue to answer what the risk checker found. - Rationale: Existing risk checks already classify blockers and warnings, but they do not answer whether the evaluated scope still matches the operator's current selection. Treating these as one concept would continue the current trust failure where
no blockerscan be misread assafe. - Alternatives considered:
- Reuse blocker or warning severity to encode staleness and mismatch. Rejected because severity and integrity have different operator consequences.
- Collapse integrity into one generic
needs rerunlabel. Rejected because the UI needs to distinguishnever run,stale, andinvalidatedas different truths.
Decision 3: Preserve invalidation evidence in wizard state instead of silently clearing prior work
- Decision: Replace the current silent reset behavior for preview and checks with explicit invalidation evidence in the wizard state. The last generated basis may still be cleared from being executable truth, but the operator should see that a prior preview or check existed and no longer applies.
- Rationale: The current wizard already resets
check_summary,check_results,checks_ran_at,preview_summary,preview_diffs, andpreview_ran_atwhen scope-affecting inputs change. That preserves safety mechanically, but it does not preserve the operator truth that the prior work was invalidated by a change they just made. - Alternatives considered:
- Keep the current silent clearing behavior and add helper text only. Rejected because it still reads too much like
not generatedinstead ofinvalidated by your change. - Keep the old values visible without marking them invalid. Rejected because it risks making stale truth look reusable.
- Keep the current silent clearing behavior and add helper text only. Rejected because it still reads too much like
Decision 4: Persist only a narrow execution-time safety snapshot on the existing restore run
- Decision: When a real restore is queued, persist a compact execution-time safety snapshot inside existing
RestoreRunmetadata. The snapshot should capture the scope fingerprint, preview basis, checks basis, derived safety state, and primary blocker or warning context that justified or constrained execution. - Rationale: The result and detail surfaces need historical truth about what basis was used at confirmation time. Re-deriving that later from mutable thresholds or current UI logic risks rewriting history. A narrow metadata snapshot keeps the audit-relevant truth on the existing restore record without creating a second persisted model.
- Alternatives considered:
- Recompute execution-time safety state dynamically from the current code and current timestamps. Rejected because historical truth can drift as code or thresholds change.
- Persist a full recovery-health document. Rejected because this feature does not claim tenant-wide recovery truth.
Decision 5: Derive result follow-up truth from existing restore results and operation outcomes instead of adding a recovery entity
- Decision: Compute
completed,partial,failed, andcompleted_with_follow_upfrom existing restore results, assignment outcomes, metadata, and linkedOperationRunoutcome. Treat cause families and next actions as derived read-model fields for the detail surfaces. - Rationale:
RestoreRun.results, assignment outcomes, and operation-run linkage already contain enough signal to decide whether operator follow-up remains. The product problem is weak surfacing of that truth, not missing domain storage. - Alternatives considered:
- Add a dedicated persisted recovery status column or table. Rejected because the feature does not need a second source of truth.
- Use only
RestoreRun.statusas the result meaning. Rejected becausecompleteddoes not meanrecoveredandpartialdoes not explain the operator consequence on its own.
Decision 6: Keep restore-specific follow-up truth visible on the canonical operation detail through enrichment or a safe deep link
- Decision: Reuse the existing restore-to-operation linkage and enrich the canonical operation detail for
restore.executeruns with restore follow-up truth or a single safe route into the restore detail page. Do not add newOperationRunpersistence for restore-specific state. - Rationale: Canonical monitoring is already the shared destination for operational truth. The feature must keep restore meaning visible there, but the restore-specific source of truth still belongs to
RestoreRun. - Alternatives considered:
- Persist restore-follow-up labels directly on
OperationRun. Rejected because it duplicates restore truth into the monitoring record. - Leave canonical operation detail generic and rely entirely on restore detail for follow-up truth. Rejected because it breaks continuity from monitoring.
- Persist restore-follow-up labels directly on
Decision 7: Reuse Filament wizard, action, and infolist seams already present in the codebase
- Decision: Implement the feature inside the existing
RestoreRunResource::getWizardSteps(),CreateRestoreRun, restore form component views, restore infolist entry views, and the existing canonical operation detail seams. Rely on Filament wizard lifecycle hooks and action testing patterns rather than inventing a new UI shell. - Rationale: Filament v5 already supports wizard step validation hooks, confirmation modals for actions, and direct action testing. Existing restore surfaces are already built on these seams, so a narrow hardening slice should stay inside them.
- Alternatives considered:
- Rebuild restore safety as a custom standalone screen outside Filament. Rejected because it would duplicate current routing, RBAC, and UI patterns.
- Push interactivity into custom infolist entry classes. Rejected because Filament custom infolist entries are display-oriented, not Livewire components, and the current restore detail need is presentation hardening rather than a new client-side interaction model.
Decision 8: Extend the existing Pest and Livewire test surface instead of creating a new browser-first harness
- Decision: Add focused unit and feature coverage around the new integrity resolvers, wizard invalidation, confirmation hardening, result attention, canonical operation continuity, and RBAC-safe degradation by extending the existing restore-related Pest and Livewire tests.
- Rationale: The repository already has strong restore wizard, preview, execution, hardening, RBAC, and ops-UX regression coverage. Filament's testing guidance supports direct action invocation and visibility assertions, which fit this feature precisely.
- Alternatives considered:
- Rely only on manual UI validation. Rejected because this slice is specifically about preventing subtle trust regressions.
- Add a large browser-only suite as the primary guard. Rejected because the critical assertions are server-driven state and action consequences that fit existing Pest and Livewire tests better.