Ahmed Darrazi 1e2958df27 feat: implement restore safety integrity and queue slide-over

2026-04-07 01:29:56 +02:00

13 KiB

Raw Blame History

Data Model: Restore Safety Integrity

Overview

This feature does not add or change a top-level persisted domain entity. It introduces a tighter derived safety model around the existing restore flow using current RestoreRun, OperationRun, risk-check, preview, and result data.

The central design task is to turn existing restore inputs and outputs into explicit operator truth without changing:

RestoreRun ownership or route identity
OperationRun ownership or lifecycle ownership
existing backup, policy-version, and assignment storage
existing write-gate, RBAC, and audit responsibilities
the no-new-table boundary of this feature

Existing Persistent Entities

1. RestoreRun

Purpose: Tenant-owned restore record for scope selection, preview basis, checks basis, execution intent, and restore result detail.
Existing persistent fields used by this feature:
- id
- tenant_id
- backup_set_id
- operation_run_id
- status
- is_dry_run
- requested_items
- group_mapping
- preview
- results
- metadata
- requested_by
- started_at
- completed_at
Existing relationships used by this feature:
- tenant
- backupSet
- operationRun

Proposed nested metadata additions

No new columns are required. If persisted historical truth is needed, this feature may add the following nested structures inside RestoreRun.metadata:

Key	Type	Purpose
`scope_basis`	object	Historical snapshot of the restore scope used for checks, preview, or execution
`check_basis`	object	Fingerprint and timing for the last checks considered valid enough to persist with the run
`preview_basis`	object	Fingerprint and timing for the last preview considered valid enough to persist with the run
`execution_safety_snapshot`	object	Exact safety truth captured when a real restore was queued or executed

Minimal persisted shape:

metadata
├── scope_basis
│   ├── fingerprint
│   ├── scope_mode
│   ├── selected_item_ids
│   ├── group_mapping_fingerprint
│   └── captured_at
├── check_basis
│   ├── fingerprint
│   ├── ran_at
│   ├── blocking_count
│   ├── warning_count
│   └── result_codes
├── preview_basis
│   ├── fingerprint
│   ├── generated_at
│   └── summary
└── execution_safety_snapshot
    ├── evaluated_at
    ├── scope_fingerprint
    ├── preview_state
    ├── checks_state
    ├── safety_state
    ├── blocking_count
    ├── warning_count
    ├── primary_issue_code
    └── follow_up_boundary

Notes:

scope_basis, check_basis, and preview_basis may be persisted only when needed for historical result truth. They do not require independent lifecycle behavior.
The snapshot is intentionally narrow. It stores the safety basis used at execution time, not a tenant-wide recovery claim.

2. OperationRun

Purpose: Canonical workspace-owned monitoring record for restore execution.
Existing persistent fields used by this feature:
- id
- workspace_id
- tenant_id
- type
- status
- outcome
- context
- summary_counts
- created_at
- started_at
- completed_at
Existing relationship and linkage used by this feature:
- restore execution runs already carry context.restore_run_id or a direct RestoreRun.operation_run_id link

No schema change is planned for OperationRun.

Derived Models

1. RestoreScopeFingerprint

Deterministic representation of the current restore scope.

Field	Type	Source	Notes
`backupSetId`	integer	`backup_set_id`	Required
`scopeMode`	string	`scope_mode`	`all` or `selected`
`selectedItemIds`	list	`backup_item_ids` or `requested_items`	Sorted, unique, empty for `all` scope
`groupMapping`	object	normalized `group_mapping`	Keys sorted, explicit `SKIP` retained
`fingerprint`	string	derived hash	Canonical equality signal

Rules:

The fingerprint must change whenever any execution-affecting restore input changes.
Pure confirmation inputs like tenant_confirm or acknowledged_impact are not part of the scope fingerprint.

2. PreviewIntegrityState

Derived trust state for preview.

Field	Type	Source	Notes
`state`	string	derived	`not_generated`, `current`, `stale`, `invalidated`
`freshnessPolicy`	string	derived	Fixed to `invalidate_after_mutation` for this feature
`fingerprint`	string or null	`preview_basis.fingerprint` or wizard state	Null if never generated
`generatedAt`	datetime or null	`preview_ran_at` or `preview_basis.generated_at`	Null if never generated
`invalidationReasons`	list	derived	e.g. `scope_mismatch`, `mapping_changed`, `backup_set_changed`
`rerunRequired`	boolean	derived	True for all states except `current`
`displaySummary`	string	derived	Operator-facing explanation

3. ChecksIntegrityState

Derived trust state for restore checks.

Field	Type	Source	Notes
`state`	string	derived	`not_run`, `current`, `stale`, `invalidated`
`freshnessPolicy`	string	derived	Fixed to `invalidate_after_mutation` for this feature
`fingerprint`	string or null	`check_basis.fingerprint` or wizard state	Null if never run
`ranAt`	datetime or null	`checks_ran_at` or `check_basis.ran_at`	Null if never run
`blockingCount`	integer	`check_summary.blocking`	Preserved even if the state becomes invalid
`warningCount`	integer	`check_summary.warning`	Preserved even if the state becomes invalid
`invalidationReasons`	list	derived	Same family as preview invalidation
`rerunRequired`	boolean	derived	True for all states except `current`

4. ExecutionReadinessState

Technical ability to start restore execution.

Field	Type	Source	Notes
`allowed`	boolean	derived from RBAC, write-gate, provider operability, hard blockers	Answers “can the system start?”
`blockingReasons`	list	derived	`missing_capability`, `write_gate_blocked`, `provider_unavailable`, `risk_blocker`
`mutationScope`	string	derived	`simulation_only` or `microsoft_tenant`
`requiredCapability`	string	derived	existing registry entry, not a raw string literal in feature code

5. RestoreSafetyAssessment

Decision-layer state that separates executable from safe.

Field	Type	Source	Notes
`state`	string	derived	`blocked`, `risky`, `ready_with_caution`, `ready`
`executionReadiness`	object	`ExecutionReadinessState`	Technical startability
`previewIntegrity`	object	`PreviewIntegrityState`	Decision basis currentness
`checksIntegrity`	object	`ChecksIntegrityState`	Decision basis currentness
`positiveClaimSuppressed`	boolean	derived	True when warnings or integrity issues suppress calm claims
`primaryIssueCode`	string or null	derived	Most important blocker or warning reason
`primaryNextAction`	string	derived	e.g. `rerun_checks`, `regenerate_preview`, `adjust_scope`, `review_warnings`

Derived-state rules:

blocked: execution readiness is false, or risk blockers are present.
risky: execution may be technically possible, but preview or checks are not current enough to support calm execution, or another integrity problem suppresses approval.
ready_with_caution: current preview and current checks exist, blockers are absent, but warnings remain suppressive.
ready: current preview and current checks exist, blockers are absent, warnings are absent or non-suppressive, and the operator can receive a calm execution signal.

6. RestoreExecutionSafetySnapshot

Historical snapshot stored on the existing restore run when a real restore is queued.

Field	Type	Source	Notes
`evaluatedAt`	datetime	confirmation time	Historical anchor
`scopeFingerprint`	string	`RestoreScopeFingerprint`	Basis used to queue execution
`previewState`	string	`PreviewIntegrityState.state`	Historical truth at queue time
`checksState`	string	`ChecksIntegrityState.state`	Historical truth at queue time
`safetyState`	string	`RestoreSafetyAssessment.state`	Historical decision truth
`blockingCount`	integer	checks summary	Historical fact
`warningCount`	integer	checks summary	Historical fact
`primaryIssueCode`	string or null	`RestoreSafetyAssessment.primaryIssueCode`	Audit-friendly summary
`followUpBoundary`	string	derived	e.g. `run_completed_not_recovery_proven`

7. RestoreResultAttention

Derived result-follow-up truth for restore detail and linked monitoring surfaces.

Field	Type	Source	Notes
`state`	string	derived	`not_executed`, `completed`, `partial`, `failed`, `completed_with_follow_up`
`followUpRequired`	boolean	derived	Primary operator signal
`primaryCauseFamily`	string	derived	`execution_failure`, `write_gate_or_rbac`, `provider_operability`, `missing_dependency_or_mapping`, `payload_quality`, `scope_mismatch`, `item_level_failure`, `none`
`summary`	string	derived	Short operator-facing summary
`primaryNextAction`	string	derived	One leading next step
`recoveryClaimBoundary`	string	derived	Explicitly states what the surface is not proving

Decision rules:

partial: mixed item outcomes or mixed assignment outcomes remain after execution.
completed_with_follow_up: execution reached a terminal completed path, but unresolved warnings, skipped items, or open recovery work remain.
completed: execution finished and no derived follow-up remains visible at the restore-run truth level, without implying tenant recovery.

8. RestoreWizardPageModel

Server-driven page model for the wizard.

Field	Type	Purpose
`currentScope`	`RestoreScopeFingerprint`	Shows what the operator is about to restore
`previewIntegrity`	`PreviewIntegrityState`	Shows whether preview still applies
`checksIntegrity`	`ChecksIntegrityState`	Shows whether checks still apply
`executionReadiness`	`ExecutionReadinessState`	Shows whether the system can technically start
`safetyAssessment`	`RestoreSafetyAssessment`	Shows whether the action is safe enough to claim calm readiness
`primaryGuidance`	object	One primary next step and supporting explanation

9. RestoreRunDetailPageModel

Page model for the restore-run detail and result surface.

Field	Type	Purpose
`header`	object	identity, backup set, mode, requested by, timestamps
`basisTruth`	object	preview basis, checks basis, execution safety snapshot
`resultAttention`	`RestoreResultAttention`	overall result truth and next step
`itemBreakdown`	list	per-item and assignment outcomes
`diagnostics`	list	raw preview, raw results, provider details, mapping detail

10. RestoreOperationContinuationModel

Minimal restore-specific truth exposed on the canonical operation detail.

Field	Type	Purpose
`restoreRunId`	integer	linked restore record
`resultAttention`	`RestoreResultAttention`	restore follow-up truth summary
`restoreDetailUrl`	string or null	safe deep link when entitled
`accessState`	string	`linked`, `unavailable`, `forbidden_by_scope`
`unavailableReason`	string or null	truthful degradation without broken links

Validation Rules

Preview is current only when a preview basis exists, its fingerprint matches the current scope fingerprint, a parseable generated timestamp exists, and no covered mutation has invalidated the basis.
Checks are current only when a check basis exists, its fingerprint matches the current scope fingerprint, a parseable checks timestamp exists, and no covered mutation has invalidated the basis.
A fingerprint mismatch must classify preview or checks as invalidated, not merely stale.
Preview or checks classify as stale when evidence exists but required basis markers are incomplete, legacy, or otherwise insufficient to prove currentness on a persisted draft or run, even though an explicit fingerprint mismatch is not available.
This feature uses freshness policy invalidate_after_mutation; it does not add a separate age-based timeout for preview or checks inside the active wizard draft.
ready requires ExecutionReadinessState.allowed = true, PreviewIntegrityState.state = current, ChecksIntegrityState.state = current, and no suppressive warnings or blockers.
ready_with_caution requires current integrity and zero blockers, but at least one suppressive warning remains.
risky remains possible when execution readiness is true but calm approval is suppressed by integrity or warning truth.
completed on the result surface must never imply tenant recovery unless another feature later supplies external reconciliation proof.

State Notes

RestoreRunStatus remains the persisted execution lifecycle enum. This feature does not replace it.
Preview integrity, checks integrity, restore safety, and result attention are derived state families. They are not new top-level persisted enums.
The only persisted addition this design allows is a narrow snapshot of the safety basis used for an actual restore run.

13 KiB Raw Blame History