267 lines
35 KiB
Markdown
267 lines
35 KiB
Markdown
# Feature Specification: Restore Safety Integrity
|
|
|
|
**Feature Branch**: `181-restore-safety-integrity`
|
|
**Created**: 2026-04-06
|
|
**Status**: Proposed
|
|
**Input**: User description: "Spec 181 - Restore Safety Integrity"
|
|
|
|
## Spec Scope Fields *(mandatory)*
|
|
|
|
- **Scope**: tenant + canonical-view
|
|
- **Primary Routes**:
|
|
- `/admin/t/{tenant}/restore-runs`
|
|
- `/admin/t/{tenant}/restore-runs/create`
|
|
- `/admin/t/{tenant}/restore-runs/{restoreRun}`
|
|
- `/admin/operations`
|
|
- `/admin/operations/{run}`
|
|
- **Data Ownership**:
|
|
- `RestoreRun` remains the tenant-owned restore source of truth for selected scope, preview payload, check results, execution intent, and restore result details.
|
|
- `OperationRun` remains the canonical workspace-owned execution record for queued or running restore execution, with tenant linkage preserved through existing relationships and authorization rules.
|
|
- Preview integrity, checks integrity, restore safety, and result follow-up truth remain derived from existing restore inputs and outcomes. The feature may add structured metadata on existing `RestoreRun` records when needed, but it must not add a new table or central recovery-state store.
|
|
- Existing backup artifacts, policy versions, assignment mappings, and write-gate decisions remain owned by their current domains.
|
|
- **RBAC**:
|
|
- Tenant membership remains required for every restore-run list, create, and detail surface.
|
|
- Real restore execution remains gated by the existing tenant-manage capability from the canonical capability registry.
|
|
- Canonical operation detail remains workspace-scoped first and tenant-entitlement-safe for any restore-linked context or deep link.
|
|
- Non-members remain `404`; in-scope actors who can view but cannot execute remain limited to truthful read surfaces without misleading execution affordances.
|
|
|
|
For canonical-view specs, the spec MUST define:
|
|
|
|
- **Default filter behavior when tenant-context is active**: When operators open canonical monitoring from a tenant restore surface, `/admin/operations` may prefilter to the active tenant and preserve the originating restore-follow-up context. The destination must not flatten restore-specific follow-up into a generic operations list state.
|
|
- **Explicit entitlement checks preventing cross-tenant leakage**: Restore-linked canonical operation detail and cross-links from restore surfaces must resolve only after workspace membership and tenant entitlement checks against the referenced restore run or operation run. Unauthorized actors must not see related restore warnings, counts, result hints, or deep-link affordances.
|
|
|
|
## UI/UX Surface Classification *(mandatory when operator-facing surfaces are changed)*
|
|
|
|
| Surface | Surface Type | Primary Inspect/Open Model | Row Click | Secondary Actions Placement | Destructive Actions Placement | Canonical Collection Route | Canonical Detail Route | Scope Signals | Canonical Noun | Critical Truth Visible by Default | Exception Type |
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
| Restore run wizard | Mutation-first wizard | Dedicated create wizard with one safety decision flow | forbidden | Step-level hint actions only | Final execution action inside hard-confirm step only | `/admin/t/{tenant}/restore-runs` | `/admin/t/{tenant}/restore-runs/{restoreRun}` | Active tenant context, backup set, scope mode, selected item count, preview state, checks state | Restore runs / Restore run | Preview currentness, checks currentness, execution readiness, safety readiness | Wizard surface |
|
|
| Restore run detail and result | Detail-first operational surface | Dedicated restore-run detail page | forbidden | Related navigation and diagnostics appear after summary and next action | No new destructive action on the detail page | `/admin/t/{tenant}/restore-runs` | `/admin/t/{tenant}/restore-runs/{restoreRun}` | Active tenant context, backup set, execution mode, preview/check basis, result attention | Restore runs / Restore run | Overall result truth, follow-up truth, primary next action, basis integrity | Existing custom infolist entry surface |
|
|
| Canonical operation detail for restore-linked runs | Canonical detail | Dedicated operation-run detail page | forbidden | Detail-header navigation and related links only | None introduced by this feature | `/admin/operations` | `/admin/operations/{run}` | Workspace scope, entitled tenant context, run identity, restore linkage | Operations / Operation | Restore-specific follow-up truth is visible or explicitly linked from the canonical run detail | Restore-linked canonical detail |
|
|
|
|
## Operator Surface Contract *(mandatory when operator-facing surfaces are changed)*
|
|
|
|
| Surface | Primary Persona | Surface Type | Primary Operator Question | Default-visible Information | Diagnostics-only Information | Status Dimensions Used | Mutation Scope | Primary Actions | Dangerous Actions |
|
|
|---|---|---|---|---|---|---|---|---|---|
|
|
| Restore run wizard | Tenant operator | Mutation-first wizard | Can I responsibly execute this restore for the currently selected scope? | Selected scope, preview state, checks state, primary blocker or warning, execution readiness, safety readiness, rerun requirement | Raw diff rows, item-level checker output, low-level mapping detail | preview integrity, checks integrity, execution readiness, safety readiness, mutation mode | Preview and checks are simulation-only; real execution mutates the Microsoft tenant | Run checks, Generate preview, Adjust scope, Execute restore | Execute restore |
|
|
| Restore run detail and result | Tenant operator | Detail-first operational surface | What did this restore actually mean, and what do I need to do next? | Overall result truth, follow-up truth, summary of applied or failed work, primary next action, whether recovery is still open | Item-by-item failure detail, raw provider diagnostics, deep mapping detail | execution outcome, result follow-up, cause family, recovery confidence boundary | Read-only result interpretation; follow-up actions may lead to simulation-only or future tenant mutation paths outside this surface | Review result, Open related operation, Follow primary next action | None introduced by this feature |
|
|
| Canonical operation detail for restore-linked runs | Workspace operator or entitled tenant operator | Canonical detail | How does this operation outcome relate to restore safety and restore follow-up truth? | Operation lifecycle, outcome, restore linkage, visible restore follow-up or direct path to it | Generic operation telemetry, technical traces, low-level run internals | operation lifecycle, operation outcome, restore follow-up continuity | Read-only monitoring surface | Return to operations, Open restore context, Refresh | None introduced by this feature |
|
|
|
|
## Proportionality Review *(mandatory when structural complexity is introduced)*
|
|
|
|
- **New source of truth?**: No
|
|
- **New persisted entity/table/artifact?**: No
|
|
- **New abstraction?**: Yes, but only a narrow derived resolver or presenter layer for preview integrity, checks integrity, restore safety, and result follow-up truth over existing restore data.
|
|
- **New enum/state/reason family?**: Yes, as derived restore-domain state families for preview integrity, checks integrity, safety readiness, and result follow-up.
|
|
- **New cross-domain UI framework/taxonomy?**: No. This is restore-domain hardening only, not a new product-wide trust framework.
|
|
- **Current operator problem**: Operators can currently mistake presence of a preview for currentness, prior checks for current scope coverage, technical startability for safety, and restore completion for tenant recovery.
|
|
- **Existing structure is insufficient because**: The existing restore flow exposes checks, preview, and result data, but it does not enforce their time-bound and scope-bound meaning strongly enough at the decision surfaces where operators choose whether to execute or stop.
|
|
- **Narrowest correct implementation**: Derive integrity and follow-up states from existing `RestoreRun`, `OperationRun`, risk-check, diff, and result data. Allow limited structured metadata on the current `RestoreRun` context if required for scope fingerprinting or invalidation reasons, but do not create a second persisted restore-health model.
|
|
- **Ownership cost**: The codebase takes on fingerprint derivation rules, state mapping rules, shared UI semantics for restore safety, and regression tests that keep wizard, detail, and canonical operation surfaces aligned.
|
|
- **Alternative intentionally rejected**: A tenant-wide recovery confidence dashboard, a new persisted recovery health table, or a global restore risk engine were rejected because the immediate trust problem is narrower: the current restore surfaces overstate calmness and understate invalidation.
|
|
- **Release truth**: Current-release truth. This feature hardens already shipped restore behavior before broader recovery-confidence or backup-quality work builds on it.
|
|
|
|
## User Scenarios & Testing *(mandatory)*
|
|
|
|
### User Story 1 - Decide Whether Real Execution Is Responsible (Priority: P1)
|
|
|
|
As a tenant operator preparing a restore, I want the wizard to tell me whether the current preview and checks still apply to the scope I selected, so that I do not launch a real restore on stale assumptions.
|
|
|
|
**Why this priority**: The most dangerous failure is a confident real restore based on outdated or mismatched truth.
|
|
|
|
**Independent Test**: Can be fully tested by opening the wizard, generating checks and preview, then verifying that the final step clearly distinguishes safe readiness from mere technical startability.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a restore scope with current checks, current preview, and no blockers, **When** the operator opens the confirm step, **Then** the surface shows that the restore is technically startable and safety-reviewed for the current scope.
|
|
2. **Given** a restore scope with no blockers but unresolved warnings, **When** the operator opens the confirm step, **Then** the surface does not present a calm `safe` or `looks good` message and instead frames the action as risky or cautionary.
|
|
3. **Given** preview or checks were never run, **When** the operator reaches the confirm step, **Then** the surface shows rerun requirements before real execution is presented as available.
|
|
|
|
---
|
|
|
|
### User Story 2 - Notice Scope Drift Immediately (Priority: P1)
|
|
|
|
As a tenant operator changing the selected restore items or mapping inputs, I want previously generated preview and checks to become visibly invalid, so that I do not assume the old safety work still applies.
|
|
|
|
**Why this priority**: Scope drift is the clearest source-of-truth failure in the current flow and must become operator-visible immediately.
|
|
|
|
**Independent Test**: Can be fully tested by generating checks and preview, then changing selected items, mapping choices, or scope mode and verifying that both states invalidate without subtle or quiet fallback.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** an operator generated checks for one selected restore scope, **When** the operator changes the scope, **Then** the previous checks are shown as invalid for the current scope and the wizard asks for a rerun.
|
|
2. **Given** an operator generated preview before changing group-mapping inputs that affect restore behavior, **When** the mapping changes, **Then** the previous preview no longer appears as current decision truth.
|
|
3. **Given** the operator narrows or broadens the selection after preview and checks exist, **When** the confirm step is revisited, **Then** the calm execution state is suppressed until preview and checks are regenerated for the current fingerprint.
|
|
|
|
---
|
|
|
|
### User Story 3 - Interpret Restore Results Without Overclaiming Recovery (Priority: P2)
|
|
|
|
As a tenant operator reviewing a finished restore, I want the result surface to tell me whether the run merely ended or whether follow-up work remains, so that I do not confuse completion with recovery.
|
|
|
|
**Why this priority**: Partial or mixed restore outcomes are currently diagnosable but not operator-hard enough, which creates false calm after the run ends.
|
|
|
|
**Independent Test**: Can be fully tested by opening completed, partial, failed, and completed-with-follow-up results and verifying that the top of the page communicates result truth and next action before raw item lists.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a restore run completed with mixed item outcomes, **When** the operator opens the detail page, **Then** the page frames the run as partial or completed with follow-up rather than as a calm success.
|
|
2. **Given** a restore run finished with no hard failure but unresolved follow-up work, **When** the operator opens the detail page, **Then** the page states that the run ended but recovery work remains open.
|
|
3. **Given** a restore run failed because of provider, write-gate, or item-level errors, **When** the operator opens the detail page, **Then** the page highlights the primary cause family and the next recommended action before low-level diagnostics.
|
|
|
|
---
|
|
|
|
### User Story 4 - Preserve Restore Truth In Canonical Run Monitoring (Priority: P3)
|
|
|
|
As a workspace or entitled tenant operator inspecting the linked canonical operation run, I want restore-specific follow-up truth to remain discoverable there, so that generic operation telemetry does not hide restore safety meaning.
|
|
|
|
**Why this priority**: The canonical run detail is often the first or shared monitoring destination, and it must not flatten restore meaning into generic execution status alone.
|
|
|
|
**Independent Test**: Can be fully tested by opening restore-linked operation runs from restore surfaces and monitoring surfaces and confirming that restore-specific follow-up truth is visible or reachable within one click.
|
|
|
|
**Acceptance Scenarios**:
|
|
|
|
1. **Given** a restore-linked operation run completed but the restore result requires follow-up, **When** the canonical operation detail is opened, **Then** the operator can see or open restore follow-up truth without hunting through unrelated telemetry.
|
|
2. **Given** the operator lacks access to a deeper diagnostic surface, **When** the canonical operation detail renders, **Then** the page avoids broken or misleading links while still preserving truthful restore attention.
|
|
|
|
### Edge Cases
|
|
|
|
- A restore may be technically executable because write-gate and RBAC checks pass, while preview or checks are stale; the surface must not collapse this into a calm `ready` signal.
|
|
- A preview may still exist for the same backup set but a different item selection, scope mode, or execution-affecting mapping input; the UI must treat that as scope mismatch, not as acceptable reuse.
|
|
- Checks may report no blockers but still include suppressive warnings; the decision surface must remain cautious and avoid positive calmness claims.
|
|
- A restore result may show all queued work completed but still leave unresolved assignment, dependency, or payload-quality issues; the result surface must not imply that the tenant is recovered.
|
|
- An operator may be allowed to view the restore run but not entitled to all deeper operation or diagnostic targets; the surface must degrade safely with truthful messaging and safe links only.
|
|
- A restore may remain preview-only by design for some items or policies; result and confirmation surfaces must keep simulation truth separate from real mutation truth.
|
|
|
|
## Requirements *(mandatory)*
|
|
|
|
**Constitution alignment (required):** This feature changes an existing write-capable restore workflow and an existing long-running restore execution path, but it does not introduce a new Microsoft Graph contract, a new queued job family, or a new persisted run model. Existing restore preview, confirmation, audit, and `OperationRun` observability remain authoritative. This spec hardens the safety meaning of those existing steps so a real restore cannot appear calmer than the underlying truth.
|
|
|
|
**Constitution alignment (PROP-001 / ABSTR-001 / PERSIST-001 / STATE-001 / BLOAT-001):** The feature may introduce narrow derived state families and a restore-domain resolver or presenter because direct presence checks are no longer enough to express currentness, scope binding, or follow-up truth. The solution must remain derived-first. No new persisted restore-health table, no dashboard-grade recovery state, and no cross-domain trust taxonomy are allowed.
|
|
|
|
**Constitution alignment (OPS-UX):** Existing restore execution continues to create or reuse `OperationRun` records under the existing ops-UX contract. Intent feedback remains toast-only. Progress remains on existing operations surfaces. Terminal truth remains in the canonical monitoring record. `OperationRun.status` and `OperationRun.outcome` remain service-owned; this feature must not add ad-hoc status mutation from UI surfaces. Existing `summary_counts` rules remain authoritative; restore-specific integrity or follow-up truth should stay in restore-specific context unless an allowed numeric summary key already exists. Regression coverage must protect wizard gating, execution blocking or degradation, result truth, linked operation detail continuity, and non-regression of run observability.
|
|
|
|
**Constitution alignment (RBAC-UX):** This feature spans the tenant admin plane and the admin canonical-view plane. Tenant membership and tenant entitlement remain isolation boundaries for restore surfaces and related links. Non-members remain `404`. Members who can view restore context but lack the capability to start or rerun a restore remain `403` for those execution actions. Authorization must remain server-side through existing scoped record resolution, policies or Gates, and the canonical capability registry. No raw capability strings or role-name shortcuts may be introduced.
|
|
|
|
**Constitution alignment (OPS-EX-AUTH-001):** Not applicable. Restore monitoring and execution remain outside any `/auth/*` exception.
|
|
|
|
**Constitution alignment (BADGE-001):** Any new preview, checks, safety, warning, or result badges must come from centralized status semantics or shared primitives. The feature must not introduce page-local color or border conventions that invent a second restore status language.
|
|
|
|
**Constitution alignment (UI-FIL-001):** The feature reuses existing Filament wizard steps, Sections, view fields, infolist entries, notifications, and shared badge mappings. Semantic emphasis must come from those existing primitives or other approved shared primitives rather than custom page-local status markup. Existing custom restore infolist entry views remain acceptable only if they are hardened around shared truth rather than bespoke color logic.
|
|
|
|
**Constitution alignment (UI-NAMING-001):** The target object is the restore run. Existing operator verbs such as `Run checks`, `Generate preview`, `Preview only (dry-run)`, `Restore`, and `Open operation` remain the base vocabulary. New operator-facing labels must keep the distinctions between `preview`, `checks`, `safe`, `risky`, `partial`, and `follow-up` explicit and must not replace them with vague implementation-first language.
|
|
|
|
**Constitution alignment (UI-CONST-001 / UI-SURF-001 / UI-HARD-001 / UI-EX-001 / UI-REVIEW-001):** The wizard remains the only primary execution surface, the restore-run detail page remains the restore result truth surface, and the canonical operation detail remains the monitoring truth surface. No redundant `View` action is introduced. Row click remains the primary inspect affordance on upstream list surfaces. Dangerous actions remain grouped or confirm-gated according to the existing action-surface contract.
|
|
|
|
**Constitution alignment (OPSURF-001):** Default-visible content must stay operator-first. The restore wizard must answer whether the scope is current, whether preview and checks still apply, and whether real execution is responsible before raw diffs or item-level diagnostics. The restore result page must answer whether follow-up is still required before long results. The canonical operation detail must not hide restore-specific follow-up truth behind generic run telemetry.
|
|
|
|
**Constitution alignment (UI-SEM-001 / LAYER-001 / TEST-TRUTH-001):** Direct mapping from existing `preview`, `results`, and check presence to UI calmness is no longer sufficient because those values do not encode whether the truth is current, scope-bound, or still decision-worthy. A narrow derived read model is acceptable only if it replaces calmness-by-presence and avoids storing a redundant second truth. Tests must focus on operator consequences: whether stale inputs suppress execution calmness, whether scope drift invalidates prior truth, and whether finished runs avoid overclaiming recovery.
|
|
|
|
**Constitution alignment (Filament Action Surfaces):** The Action Surface Contract remains satisfied. The restore-run list keeps one primary inspect model through row click. The wizard remains the only primary execution surface. The detail surface remains read-first. Existing destructive actions continue to require confirmation and stay outside the wizard. No empty action groups or redundant view actions are introduced.
|
|
|
|
**Constitution alignment (UX-001 — Layout & Information Architecture):** The restore create flow remains a structured wizard with explicit sections. The restore detail page remains an infolist-based view surface, not a disabled edit form. The feature must create a clear reading order of summary first, decision second, diagnostics third on restore result surfaces, and must elevate safety truth before operators commit to execution.
|
|
|
|
### Functional Requirements
|
|
|
|
- **FR-181-001**: The system MUST treat restore preview as time-bound and scope-bound decision truth, not as a generic `preview exists` flag.
|
|
- **FR-181-002**: The restore preview surface MUST show when the preview was generated, which restore scope it represents, whether it is current, and whether rerun is required.
|
|
- **FR-181-003**: The system MUST derive a deterministic restore scope fingerprint from every execution-affecting restore input needed to judge whether preview and checks still match the current restore scope.
|
|
- **FR-181-004**: Restore checks MUST be bound to the fingerprint of the scope they evaluated, and the surface MUST make it visible when the current scope no longer matches that evaluated fingerprint.
|
|
- **FR-181-005**: Preview truth MUST likewise remain bound to the fingerprint of the scope it previewed, and the surface MUST make it visible when the current scope no longer matches that previewed fingerprint.
|
|
- **FR-181-006**: Any operator change to scope mode, selected items, backup source, or execution-affecting mapping input after checks or preview were generated MUST invalidate the prior calm readiness state for the current restore scope.
|
|
- **FR-181-007**: Preview integrity MUST surface at least `not_generated`, `current`, `stale`, and `invalidated` as real operator-visible states.
|
|
- **FR-181-008**: Checks integrity MUST surface at least `not_run`, `current`, `stale`, and `invalidated` as real operator-visible states.
|
|
- **FR-181-009**: The wizard and confirmation surface MUST show execution readiness and safety readiness as separate truths.
|
|
- **FR-181-010**: A restore MAY be technically startable while still being safety-risky or integrity-invalid, and the UI MUST not collapse those states into one calm `ready` state.
|
|
- **FR-181-011**: Blocking issues MUST continue to prevent calm execution approval, and warning-level issues MUST suppress calm `safe`, `ready`, or `looks good` claims even when real execution remains technically possible.
|
|
- **FR-181-012**: The final confirm and execute step MUST validate current preview state, current checks state, matching scope fingerprint, absence of blocking issues, and current execution readiness before presenting real execution as available.
|
|
- **FR-181-013**: If one or more integrity conditions fail at confirmation time, the surface MUST present the next corrective step, such as rerunning checks, regenerating preview, or correcting scope inputs, before real execution can appear calm.
|
|
- **FR-181-014**: The confirmation surface MUST keep simulation-only actions and real tenant mutation clearly separated in operator wording.
|
|
- **FR-181-015**: The restore result surface MUST answer what succeeded, what partially succeeded, what failed, whether follow-up is required, and whether the recovery goal is still uncertain.
|
|
- **FR-181-016**: The restore result surface MUST treat `partial` and `completed_with_follow_up` as non-calm operator states and MUST not present them as an uncomplicated success.
|
|
- **FR-181-017**: The restore result surface MUST present one primary next action whenever follow-up is required and MAY present additional secondary actions only after the primary action is visible.
|
|
- **FR-181-018**: The restore result surface MUST expose operator-usable cause families for follow-up truth, including execution failure, write-gate or RBAC blocking, provider operability, missing dependency or mapping, payload-quality limitation, scope mismatch, and item-level failure.
|
|
- **FR-181-019**: The restore-run detail surface MUST show which preview basis and which checks basis applied to the run or draft, including whether those bases were current, stale, or invalidated when the operator reviewed them.
|
|
- **FR-181-020**: No restore surface may imply that `completed` means `tenant recovered`, `restore guaranteed successful`, or `target state confirmed` unless a different feature later proves that truth.
|
|
- **FR-181-021**: The canonical operation detail for restore-linked runs MUST show restore-specific follow-up truth directly or provide one safe, entitled path to it.
|
|
- **FR-181-022**: The feature MUST remain implementable without a new central recovery-state table, a new tenant-wide recovery dashboard, or a new global risk-scoring model.
|
|
- **FR-181-023**: Auditability of scope invalidation and staleness reasoning MUST remain derivable from existing restore records and existing run context without breaking current restore audit flows.
|
|
- **FR-181-024**: Regression coverage MUST prove integrity-state classification, wizard invalidation, confirmation hardening, result follow-up truth, restore-linked operation continuity, and RBAC-safe degradation.
|
|
|
|
### Derived State Semantics
|
|
|
|
- **Preview integrity family**: `not_generated`, `current`, `stale`, `invalidated`, with optional finer labels such as `scope_mismatch` or `superseded` as derived explanation only.
|
|
- **Checks integrity family**: `not_run`, `current`, `stale`, `invalidated`, with optional finer labels such as `scope_mismatch` or `requires_rerun` as derived explanation only.
|
|
- **Restore safety family**: `blocked`, `risky`, `ready_with_caution`, `ready`. `ready` is reserved for scopes with current integrity and no suppressive blocker or warning conditions.
|
|
- **Restore result family**: `not_executed`, `completed`, `partial`, `failed`, `completed_with_follow_up`. `completed_with_follow_up` means execution finished but operator work is still open; it does not mean full recovery.
|
|
- **Freshness policy**: Preview and checks use `invalidate_after_mutation` for this feature. Within an active wizard draft, the system does not introduce a separate age-based timeout; matching scope fingerprint plus required captured-at evidence is sufficient for `current`. `invalidated` is reserved for explicit scope mismatch after a covered mutation. `stale` is reserved for legacy or incomplete persisted evidence whose currentness can no longer be proven even though a direct mismatch is unavailable.
|
|
|
|
### Non-Functional Requirements
|
|
|
|
- **NFR-181-001**: The feature SHOULD ship without a new table, a new globally persisted recovery-state model, or a new tenant-wide reconciliation dashboard.
|
|
- **NFR-181-002**: Existing restore orchestration, write-gate evaluation, risk checking, diff generation, and operation-run tracking MUST remain behaviorally intact outside the new integrity and follow-up hardening.
|
|
- **NFR-181-003**: New restore safety labels and states MUST remain centrally mappable and regression-testable rather than page-local.
|
|
- **NFR-181-004**: The feature MUST preserve current route identity and existing deep-link stability for restore-run detail and canonical operation detail pages.
|
|
- **NFR-181-005**: Restore-specific truth added to canonical monitoring MUST remain readable without forcing operators into low-level technical diagnostics.
|
|
|
|
### Non-Goals
|
|
|
|
- Redesigning backup-quality surfaces or backup fidelity scoring
|
|
- Building a tenant-wide recovery confidence dashboard
|
|
- Introducing a semantic version-diff system for backup history choice
|
|
- Adding a new post-restore full reconciliation engine for every policy type
|
|
- Creating a new global restore risk-scoring model or provider-hardening subsystem
|
|
- Creating a new central recovery-health persistence table or workflow hub
|
|
|
|
### Assumptions
|
|
|
|
- Existing restore preview, restore checks, and restore execution pipelines already carry enough underlying truth that this feature can harden interpretation without re-architecting the restore domain.
|
|
- The scope fingerprint should include every restore input that materially changes what will be checked or what could be written, including backup source, scope selection, and execution-affecting mapping inputs.
|
|
- Preview and checks use `invalidate_after_mutation` for this feature. Active wizard drafts do not introduce a separate age-based timeout; `invalidated` covers explicit scope drift after a covered mutation, while `stale` remains reserved for legacy or incomplete persisted evidence whose currentness cannot be proven.
|
|
- Existing risk checker outputs, diff outputs, item-level result data, and operation-run linkage remain available to support result attention and next-action derivation.
|
|
|
|
### Dependencies
|
|
|
|
- Existing `RestoreRun` domain model and status lifecycle
|
|
- Existing `RestoreRiskChecker` logic and output semantics
|
|
- Existing `RestoreDiffGenerator` logic and preview semantics
|
|
- Existing write-gate and RBAC hardening foundations
|
|
- Existing `OperationRun` and restore-run coupling, including canonical operation detail surfaces
|
|
- Existing centralized badge or status semantics and tenant-safe navigation rules
|
|
|
|
### Risks
|
|
|
|
- If integrity states are added but calm UI language remains unchanged, the feature will add terminology without removing the core trust failure.
|
|
- If scope fingerprinting is narrower than the real execution scope, operators may still reuse stale safety truth incorrectly.
|
|
- If restore result follow-up truth is only appended below diagnostics, operators will continue to misread completion as recovery.
|
|
- If canonical operation detail remains generic while restore detail becomes strict, trust will drift between monitoring and restore surfaces.
|
|
- If the feature tries to solve tenant-wide recovery truth now, it will overgrow the slice and violate the proportionality goal.
|
|
|
|
## UI Action Matrix *(mandatory when Filament is changed)*
|
|
|
|
| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|
|
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| Restore runs list | `/admin/t/{tenant}/restore-runs` | `New restore run` | Row click to restore-run detail | `Rerun`; `More -> Restore / Archive / Force delete` by record state | `More -> Archive Restore Runs / Restore Restore Runs / Force Delete Restore Runs` | `New restore run` | n/a | n/a | Existing archive, restore, delete, rerun audit behavior remains | The list keeps one primary inspect model and no redundant `View` action. Destructive actions stay grouped and confirmed. |
|
|
| Restore run wizard | `/admin/t/{tenant}/restore-runs/create` | No new destructive header action | n/a | Step hint actions such as `Run checks`, `Generate preview`, `Select all`, `Clear`, and `Sync Groups` remain local to the relevant step | n/a | n/a | n/a | Final create or execute flow remains the single primary save path; real execution remains hard-confirmed | Existing restore queueing and execution audit behavior remains authoritative | The wizard is the only primary execution surface. This feature hardens calmness and gating, not action sprawl. |
|
|
| Restore run detail and result | `/admin/t/{tenant}/restore-runs/{restoreRun}` | None introduced by this feature | n/a | n/a | n/a | n/a | Existing page remains read-first; restore-linked operation navigation may be surfaced when entitled | n/a | No new audit event introduced by this feature | The detail surface must elevate result truth and next action above diagnostics without adding destructive controls. |
|
|
| Canonical operation detail for restore-linked runs | `/admin/operations/{run}` | Existing `Back`, `Refresh`, and related navigation only | n/a | n/a | n/a | n/a | Existing header navigation only; restore follow-up link may appear when entitled | n/a | No new audit event introduced by this feature | No new destructive action is introduced. Restore-specific truth must remain visible or reachable within one click. |
|
|
|
|
### Key Entities *(include if feature involves data)*
|
|
|
|
- **Restore Run**: The tenant-owned restore record that carries scope choice, preview data, checks data, execution intent, and restore results.
|
|
- **Scope Fingerprint**: The deterministic representation of the restore scope used to prove whether preview and checks still apply to the current restore inputs.
|
|
- **Preview Integrity State**: The derived state that answers whether preview exists, is current enough, and still matches the active scope.
|
|
- **Checks Integrity State**: The derived state that answers whether safety checks exist, are current enough, and still match the active scope.
|
|
- **Restore Safety State**: The derived decision state that separates `blocked`, `risky`, `ready_with_caution`, and `ready` instead of treating all non-blocked restores as safe.
|
|
- **Restore Result Follow-Up State**: The derived truth that answers whether the run is merely finished, partially successful, failed, or completed with operator follow-up still required.
|
|
- **Restore Safety Summary**: The operator-facing summary that combines integrity, readiness, primary warning or blocker, and next step without claiming tenant recovery.
|
|
|
|
## Success Criteria *(mandatory)*
|
|
|
|
### Measurable Outcomes
|
|
|
|
- **SC-181-001**: In seeded acceptance scenarios, operators can determine within 15 seconds whether the current preview still applies, whether checks still match the current scope, whether the restore is merely executable or actually safety-reviewed, and what the next required action is.
|
|
- **SC-181-002**: In covered stale or invalidated preview or checks scenarios, 100% of the affected wizard and confirmation surfaces suppress calm `safe` or `ready` claims and visibly require rerun or correction.
|
|
- **SC-181-003**: In covered scope-change scenarios, previously generated preview and checks are visibly invalidated before real execution is presented as calm or approved.
|
|
- **SC-181-004**: In covered partial, failed, and completed-with-follow-up scenarios, 100% of restore result surfaces elevate follow-up-required truth above raw item lists and do not imply that the tenant is recovered.
|
|
- **SC-181-005**: In covered restore-linked operation scenarios, operators can reach restore-specific follow-up truth from canonical operation detail in one click or less without encountering broken or misleading links.
|
|
- **SC-181-006**: The feature ships without a new central recovery-state table, a new tenant-wide recovery dashboard, or a new global restore risk model. |