TenantAtlas/specs/176-backup-quality-truth/spec.md
ahmido e840007127 feat: add backup quality truth surfaces (#211)
## Summary
- add a shared backup-quality resolver and summary model for backup sets, backup items, policy versions, and restore selection
- surface backup-quality truth across Filament backup-set, policy-version, and restore-wizard entry points
- add focused Pest coverage and the full Spec Kit artifact set for spec 176

## Testing
- focused backup-quality verification and integrated-browser smoke coverage were completed during implementation
- degraded browser smoke path was validated with temporary seeded records and then cleaned up again
- the workspace already has a prior `vendor/bin/sail artisan test --compact` run exiting non-zero; that full-suite failure was not reworked as part of this PR

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #211
2026-04-07 11:39:40 +00:00

24 KiB

Feature Specification: Backup Quality Truth Surfaces

Feature Branch: [176-backup-quality-truth]
Created: 2026-04-07
Status: Draft
Input: User description: "Spec 176 - Backup Quality Truth Surfaces"

Spec Scope Fields (mandatory)

  • Scope: tenant
  • Primary Routes: /admin/t/{tenant}/backup-sets, /admin/t/{tenant}/backup-sets/{record}, /admin/t/{tenant}/policy-versions, /admin/t/{tenant}/policy-versions/{record}, /admin/t/{tenant}/restore-runs/create
  • Data Ownership: Tenant-owned BackupSet, BackupItem, PolicyVersion, and RestoreRun draft-selection state within the active workspace and tenant scope.
  • RBAC: Workspace plus tenant membership is required on every affected surface. Members with TENANT_VIEW must see backup-quality truth on backup and version surfaces. Restore creation remains gated by TENANT_MANAGE. Backup-set mutation actions remain gated by existing TENANT_SYNC, TENANT_MANAGE, and TENANT_DELETE capabilities.

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

Surface Surface Type Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type
Backup sets page CRUD / list-first resource Full-row click to backup-set detail required One inline safe shortcut plus More More menu and bulk More /admin/t/{tenant}/backup-sets /admin/t/{tenant}/backup-sets/{record} Workspace context plus tenant context Backup sets / Backup set Capture lifecycle and backup quality summary none
Backup set detail Detail plus relation manager Direct detail page forbidden Contextual summary links plus relation header actions Resource More and relation-manager More /admin/t/{tenant}/backup-sets /admin/t/{tenant}/backup-sets/{record} Tenant context plus related policy context Backup set Quality summary before per-item diagnostics none
Backup items table Relation-manager table Full-row click within backup-set detail required Relation header actions plus More More menu and bulk More /admin/t/{tenant}/backup-sets/{record} /admin/t/{tenant}/backup-sets/{record} Parent backup set plus tenant context Backup items / Backup item Snapshot mode and assignment-quality truth per item existing relation-manager pattern
Policy versions page CRUD / list-first resource Full-row click to policy-version detail required More menu More menu and bulk More /admin/t/{tenant}/policy-versions /admin/t/{tenant}/policy-versions/{record} Workspace context plus tenant context Policy versions / Policy version Snapshot mode and version input quality Empty-state CTA routes to backup sets
Policy version detail Detail / infolist page Direct detail page forbidden Minimal related navigation only No new destructive detail action placement /admin/t/{tenant}/policy-versions /admin/t/{tenant}/policy-versions/{record} Tenant context plus related policy context Policy version Explicit backup-quality truth separate from restore availability existing minimal header pattern
Restore run create wizard Wizard / selection workflow Step-driven selection inside restore-run creation forbidden Inline descriptions and next-action guidance None at selection stage /admin/t/{tenant}/restore-runs /admin/t/{tenant}/restore-runs/create Tenant context plus selected backup set Restore run / Backup selection Backup-set and item quality before safety checks none

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Surface Primary Persona Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Backup sets page Tenant operator List Which backup sets look strong or weak as recovery input? Name, item count, capture timing, lifecycle status, compact backup-quality summary Raw item metadata, per-item details, restore safety analysis Capture lifecycle, input quality TenantPilot only for existing archive and restore maintenance Open backup set, Create backup set Archive, Restore archived set, Force delete
Backup set detail Tenant operator Detail Is this backup set a strong or weak recovery input, and why? Quality summary, degraded counts, next actions, related context Raw payloads, raw assignment JSON, integrity detail Input quality, assignment completeness, lifecycle status TenantPilot only for existing maintenance actions Inspect backup items, open related context from contextual links Archive, Restore archived set, Force delete
Backup items table Tenant operator Table inside detail Which items are degraded inside this backup set? Display name, type, snapshot mode, assignment issue signal, orphaned-assignment signal Full metadata, raw assignments, low-level IDs Snapshot completeness, assignment completeness TenantPilot only for add and remove maintenance; none for visibility Refresh, Add Policies, inspect row Remove, Remove selected
Policy versions page Tenant operator List Which versions are full-payload versus metadata-only or otherwise degraded? Policy identity, version number, capture time, snapshot mode, quality signal Raw JSON, diff payload, redaction detail Version lifecycle, input quality TenantPilot only for existing archive and maintenance actions Open version, open related policy, open backup sets from empty state Restore via Wizard, Archive, Restore archived version, Force delete, bulk prune
Policy version detail Tenant operator Detail Is this version worth using as restore input? Version identity, explicit backup-quality section, related context Normalized settings, raw JSON, diff, redaction detail Input quality, version lineage None for visibility; existing restore entry remains separately gated Open related policy No new destructive detail action
Restore run create wizard Tenant operator with restore rights Wizard Which backup set or items should I avoid or inspect before running safety checks? Backup-set quality summary, per-item quality descriptions, stronger or weaker input hints Risk-check output, preview diff, unresolved mapping detail Input quality first, restore safety later Simulation only until later confirmation and execution steps Select backup set, select items, continue through wizard Final restore execution remains later in the flow

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: no
  • New persisted entity/table/artifact?: no
  • New abstraction?: yes
  • New enum/state/reason family?: no
  • New cross-domain UI framework/taxonomy?: no
  • Current operator problem: Operators can currently tell that a backup, item, or version exists, but they cannot quickly tell whether it is strong, degraded, or metadata-only as recovery input before they reach deep detail or restore-safety surfaces.
  • Existing structure is insufficient because: The relevant truth is split across backup metadata, assignment metadata, disabled restore actions, and later restore-safety checks. That fragmentation causes false confidence and late discovery.
  • Narrowest correct implementation: Introduce at most one narrow derived backup-quality helper that reads existing BackupSet, BackupItem, and PolicyVersion metadata and exposes a compact summary for existing list, detail, and wizard surfaces.
  • Ownership cost: A small amount of shared derivation logic plus unit, feature, and RBAC regression tests that keep quality labels aligned with the underlying metadata keys.
  • Alternative intentionally rejected: A persisted backup-health table, a tenant-wide scoring model, or a new recovery-confidence engine were rejected because they would create new truth, new state, and new ownership cost before the current surfaces tell the existing truth well.
  • Release truth: current-release truth hardening

User Scenarios & Testing (mandatory)

User Story 1 - Judge Backup Sets Early (Priority: P1)

A tenant operator opens the backup-set list or detail page and needs to tell within seconds whether a backup set is merely stored or also looks strong enough to inspect further as recovery input.

Why this priority: This is the earliest point where false confidence must be prevented. If the operator misreads completed as good backup, every later restore decision inherits that error.

Independent Test: Can be fully tested by loading backup-set list and detail pages with full-quality and degraded fixtures and verifying that lifecycle status and backup quality are shown separately.

Acceptance Scenarios:

  1. Given a tenant has one full-quality backup set and one degraded backup set, When the operator opens the backup-set list, Then each row shows capture status separately from a compact backup-quality summary.
  2. Given a backup set contains degraded items, When the operator opens backup-set detail, Then the page shows a quality summary with degradation counts before per-item diagnostics or raw metadata.
  3. Given a completed backup set contains only metadata-only items, When the operator scans the list or detail surface, Then the surface does not imply that the set is safe to restore or broadly recoverable.

User Story 2 - Inspect Item and Version Strength (Priority: P2)

A tenant operator reviewing backup items or policy versions needs to distinguish full payloads from metadata-only or assignment-degraded inputs without inferring that truth from disabled actions or hidden metadata.

Why this priority: Item-level and version-level truth determines whether a backup set is actually useful. If this information stays implicit, operators cannot compare restore inputs confidently.

Independent Test: Can be fully tested by loading the backup-items table, policy-version list, and policy-version detail page with mixed-quality records and verifying explicit per-record quality signals.

Acceptance Scenarios:

  1. Given backup items include full payload, metadata-only, assignment-fetch-failed, and orphaned-assignment examples, When the operator reviews the backup-items table, Then each item shows explicit snapshot mode and assignment-quality signals.
  2. Given policy versions include both full payload and metadata-only snapshots, When the operator reviews the policy-version list, Then snapshot mode is visible without needing to hover disabled actions.
  3. Given a policy-version detail page represents a degraded version, When the operator opens the page, Then the page shows an explicit backup-quality section that explains the weakness without using restore availability as the only signal.

User Story 3 - Select Restore Inputs With Early Warning (Priority: P3)

A tenant operator starting a restore run needs to see weak backup sets and weak items before risk checks or preview steps so that poor input quality is visible at the first selection point.

Why this priority: Restore-safety hardening already exists later in the flow. This story closes the trust gap before the operator commits to a candidate backup set or item selection.

Independent Test: Can be fully tested by opening the restore-run creation wizard with degraded backup-set and backup-item fixtures and verifying that selection step labels or descriptions expose quality truth before safety checks run.

Acceptance Scenarios:

  1. Given a degraded backup set is available for restore, When the operator opens restore wizard step 1, Then the backup-set selection surface shows that the set contains degraded input before the operator reaches safety checks.
  2. Given selected restore items include metadata-only and assignment-degraded inputs, When the operator reviews restore wizard step 2, Then each affected item is clearly marked as degraded before any risk-check action occurs.
  3. Given a backup set is full-quality, When the operator reviews steps 1 and 2, Then the wizard can communicate that no degradations are currently detected without claiming that restore is safe.

User Story 4 - Preserve Truth Under RBAC Boundaries (Priority: P4)

A tenant member with backup or version viewing rights but without restore or maintenance rights still needs to see the same backup-quality truth so that authorization boundaries do not make weak inputs look calmer than they are.

Why this priority: Security boundaries must not distort source-of-truth visibility. Otherwise the UI becomes less truthful for read-only operators than for managers.

Independent Test: Can be fully tested by signing in as a tenant member with TENANT_VIEW but without restore capabilities and verifying that list and detail surfaces still expose quality truth while restore actions remain unavailable.

Acceptance Scenarios:

  1. Given a tenant member has backup and version viewing rights but lacks restore permission, When they open backup-set or policy-version surfaces, Then backup-quality signals remain visible while restore actions stay unavailable.
  2. Given a non-member requests the same tenant-scoped surfaces, When the request is made, Then the system responds with deny-as-not-found semantics instead of exposing resource existence.

Edge Cases

  • A backup set is completed and has zero degradations; the surface must explicitly show that no degradations are detected rather than leaving quality unstated.
  • A backup set mixes full payload items with metadata-only and assignment-degraded items; the summary must show mixed quality without collapsing to a single misleading label.
  • Assignment capture is marked not applicable for a policy type; the surface must not mislabel that condition as a failure.
  • Older items or versions lack enough metadata to derive quality; the surface may show unknown only when no existing authoritative signal is available.
  • Archived backup sets and archived policy versions must retain the same quality truth on list and detail surfaces as active records.

Requirements (mandatory)

This feature introduces no new Microsoft Graph calls, no new background work, no new OperationRun, and no new persistence. It is a read-first truth-hardening feature that makes existing backup and version metadata visible earlier and more clearly.

Authorization remains in the tenant/admin plane under /admin/t/{tenant}/.... Non-members must continue to receive 404 responses. Established members missing mutation capabilities must continue to receive 403 on execution. Members with TENANT_VIEW must still see backup-quality truth on backup and version surfaces even when restore entry points remain unavailable.

Badge and UI semantics must stay centralized. Existing shared badge semantics, especially snapshot-mode badges, remain the canonical language for status-like signals. Any new quality labels or summaries must be derived from shared backup-quality rules rather than page-local color or wording decisions.

The affected Filament surfaces must keep exactly one primary inspect or open model, must not add redundant View actions, and must preserve destructive-action placement and confirmation behavior already defined by the action-surface contract. Quality truth is additive to existing surfaces, not a new local action framework.

If a shared backup-quality helper is introduced, it must replace duplicated page-local derivation instead of layering a second semantic system on top of existing restore-safety logic. Restore safety, preview eligibility, and execution outcome remain separate truths.

Functional Requirements

  • FR-176-001: The system MUST present backup existence truth separately from backup quality truth so that completed, partial, and failed remain capture-lifecycle states rather than quality claims.
  • FR-176-002: The backup-set list MUST show a compact backup-quality summary per row that indicates either no detected degradations or the presence of one or more degradation families.
  • FR-176-003: The backup-set detail surface MUST show a default-visible quality summary before deep diagnostics, including counts for metadata-only items, assignment-capture issues, orphaned-assignment signals, and any other degradation families that are already authoritatively derivable.
  • FR-176-004: The backup-items table MUST show per-item snapshot mode and per-item assignment-quality signals without requiring the operator to open raw JSON or later restore surfaces.
  • FR-176-005: The policy-version list MUST show snapshot mode for every visible version and MUST make degraded versions distinguishable from full-payload versions at scan speed.
  • FR-176-006: The policy-version detail page MUST show explicit backup-quality truth and MUST NOT rely on disabled restore actions or tooltips as the only signal that a version is weak.
  • FR-176-007: Restore wizard step 1 MUST expose backup-set quality before the operator reaches safety checks, preview generation, or execution confirmation.
  • FR-176-008: Restore wizard step 2 MUST expose item-level quality before safety checks, including metadata-only and assignment-quality degradations where the underlying data already exists.
  • FR-176-009: Metadata-only state MUST appear on backup and version surfaces as soon as the source metadata can establish it, and MUST NOT first surface as a restore-stage surprise.
  • FR-176-010: Assignment-capture failures and orphaned-assignment signals MUST be operator-visible on backup-quality surfaces whenever the metadata already records them.
  • FR-176-011: Backup-quality surfaces MUST NOT claim that a backup set, item, or version is safe to restore, restore-ready, or guaranteed to succeed.
  • FR-176-012: Backup-quality surfaces MUST NOT imply that a strong-looking backup set proves tenant-wide recoverability, a guaranteed rollback path, or a recovery certification outcome.
  • FR-176-013: Version history surfaces MUST separate three truths: version exists, version is selectable under current permissions and lifecycle state, and version has stronger or weaker payload quality.
  • FR-176-014: When a backup set, item, or version is weak, the surface MUST suggest meaningful next actions such as opening detail, inspecting degraded items, preferring a stronger version, or continuing into restore with caution.
  • FR-176-015: Quality signals MUST remain visible to users with backup or version viewing rights even when deeper restore or operations surfaces are inaccessible.
  • FR-176-016: The feature MUST derive backup-quality truth from existing tenant-owned records and metadata and MUST NOT require a new persistence model, new materialized state, or a new cross-tenant scoring engine.

Assumptions

  • Existing metadata keys such as source, snapshot_source, assignments_fetch_failed, assignment_capture_reason, has_orphaned_assignments, scope-tag metadata, and redaction or integrity notes are authoritative enough for first-pass backup-quality truth.
  • Existing restore-safety checks remain the sole owner of blocker, warning, preview-only, and execution-gating language.
  • Older records may lack some quality metadata; in those cases the product may show unknown quality only when the existing record truly does not contain enough information to derive a stronger statement.

Dependencies

  • Existing tenant-scoped backup, version, and restore resources remain the operator entry points.
  • Existing centralized badge semantics, especially snapshot-mode badges, remain the canonical language for visible status.
  • Existing restore-safety integrity behavior and metadata-only execution blocking remain unchanged and continue to run after the earlier backup-quality surfaces.

Out of Scope and Follow-up

  • No redesign of restore execution, restore-safety logic, backup capture, retention or pruning, tenant-wide recovery scoring, notification domains, or new persisted backup-health artifacts.
  • Reasonable follow-up work includes a backup-health dashboard, a broader recovery-confidence rollup, and version-rollback usefulness guidance after the current truth-hardening slice is complete.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
BackupSetResource app/Filament/Resources/BackupSetResource.php Create backup set recordUrl() clickable row Primary related action, More: Restore / Archive / Force delete More: Archive Backup Sets / Restore Backup Sets / Force Delete Backup Sets Create backup set Grouped existing mutations remain; related navigation stays in contextual summary links, not the header Create backup set submit plus cancel Existing audit logging remains for restore, archive, and force delete; read-only quality truth adds no new audit event Action surface contract stays satisfied. Quality summary is additive only.
BackupItemsRelationManager app/Filament/Resources/BackupSetResource/RelationManagers/BackupItemsRelationManager.php Refresh, Add Policies Clickable row More: Remove More: Remove selected Add Policies n/a n/a Existing operation-run and audit behavior for remove flows remains; visibility changes are read-only Existing relation-manager exception remains; no redundant View action is added.
PolicyVersionResource app/Filament/Resources/PolicyVersionResource.php none recordUrl() clickable row Primary related action, More: Restore via Wizard / Archive / Force delete / Restore archived version More: Prune Versions / Restore Versions / Force Delete Versions Open backup sets Existing detail header remains intentionally minimal n/a Existing audit logging remains for archive, force delete, and restore; restore-via-wizard keeps existing restore-run and backup creation behavior Policy-version detail gains explicit quality truth so disabled actions stop being the only signal.
Restore run create wizard app/Filament/Resources/RestoreRunResource.php, app/Filament/Resources/RestoreRunResource/Pages/CreateRestoreRun.php none n/a n/a n/a none n/a Wizard Previous / Next / Create restore run Existing restore-run create and execute audit behavior remains unchanged; selection-stage quality visibility is read-only Step 1 and step 2 gain quality descriptions only. No new destructive action is introduced.

Key Entities (include if feature involves data)

  • BackupSet: A tenant-owned capture collection that already records lifecycle state, timestamps, item count, and metadata describing how the set was produced.
  • BackupItem: A tenant-owned captured recovery input for one policy or foundation item, including payload, assignments, and metadata that can expose snapshot completeness and assignment-quality issues.
  • PolicyVersion: An immutable tenant-owned version record that stores captured snapshot data, related metadata, assignments, redaction context, and capture timing.
  • Restore selection context: The tenant-scoped backup-set and optional item selection that an operator builds before restore-safety checks and preview generation.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: In validation sessions and acceptance tests, an operator can identify whether a backup set is full-quality or degraded from the list or detail surface in under 10 seconds without opening raw JSON or preview surfaces.
  • SC-002: In 100% of tested cases where existing records contain metadata-only, assignment-fetch-failed, or orphaned-assignment signals, at least one default-visible backup-quality signal appears on every affected list, detail, or wizard selection surface.
  • SC-003: In 100% of RBAC test cases, users with backup or version viewing rights but without restore rights can still see backup-quality truth on list and detail surfaces while restore actions remain unavailable.
  • SC-004: In 100% of degraded restore-input scenarios covered by acceptance tests, backup-set and item quality is visible before the operator reaches restore-safety checks or preview generation.