TenantAtlas/specs/176-backup-quality-truth/spec.md

# Feature Specification: Backup Quality Truth Surfaces

**Feature Branch**: `[176-backup-quality-truth]`
**Created**: 2026-04-07
**Status**: Draft
**Input**: User description: "Spec 176 - Backup Quality Truth Surfaces"

## Spec Scope Fields *(mandatory)*

- **Scope**: tenant
- **Primary Routes**: `/admin/t/{tenant}/backup-sets`, `/admin/t/{tenant}/backup-sets/{record}`, `/admin/t/{tenant}/policy-versions`, `/admin/t/{tenant}/policy-versions/{record}`, `/admin/t/{tenant}/restore-runs/create`
- **Data Ownership**: Tenant-owned `BackupSet`, `BackupItem`, `PolicyVersion`, and `RestoreRun` draft-selection state within the active workspace and tenant scope.
- **RBAC**: Workspace plus tenant membership is required on every affected surface. Members with `TENANT_VIEW` must see backup-quality truth on backup and version surfaces. Restore creation remains gated by `TENANT_MANAGE`. Backup-set mutation actions remain gated by existing `TENANT_SYNC`, `TENANT_MANAGE`, and `TENANT_DELETE` capabilities.

## UI/UX Surface Classification *(mandatory when operator-facing surfaces are changed)*

| Surface | Surface Type | Primary Inspect/Open Model | Row Click | Secondary Actions Placement | Destructive Actions Placement | Canonical Collection Route | Canonical Detail Route | Scope Signals | Canonical Noun | Critical Truth Visible by Default | Exception Type |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Backup sets page | CRUD / list-first resource | Full-row click to backup-set detail | required | One inline safe shortcut plus More | More menu and bulk More | `/admin/t/{tenant}/backup-sets` | `/admin/t/{tenant}/backup-sets/{record}` | Workspace context plus tenant context | Backup sets / Backup set | Capture lifecycle and backup quality summary | none |
| Backup set detail | Detail plus relation manager | Direct detail page | forbidden | Contextual summary links plus relation header actions | Resource More and relation-manager More | `/admin/t/{tenant}/backup-sets` | `/admin/t/{tenant}/backup-sets/{record}` | Tenant context plus related policy context | Backup set | Quality summary before per-item diagnostics | none |
| Backup items table | Relation-manager table | Full-row click within backup-set detail | required | Relation header actions plus More | More menu and bulk More | `/admin/t/{tenant}/backup-sets/{record}` | `/admin/t/{tenant}/backup-sets/{record}` | Parent backup set plus tenant context | Backup items / Backup item | Snapshot mode and assignment-quality truth per item | existing relation-manager pattern |
| Policy versions page | CRUD / list-first resource | Full-row click to policy-version detail | required | More menu | More menu and bulk More | `/admin/t/{tenant}/policy-versions` | `/admin/t/{tenant}/policy-versions/{record}` | Workspace context plus tenant context | Policy versions / Policy version | Snapshot mode and version input quality | Empty-state CTA routes to backup sets |
| Policy version detail | Detail / infolist page | Direct detail page | forbidden | Minimal related navigation only | No new destructive detail action placement | `/admin/t/{tenant}/policy-versions` | `/admin/t/{tenant}/policy-versions/{record}` | Tenant context plus related policy context | Policy version | Explicit backup-quality truth separate from restore availability | existing minimal header pattern |
| Restore run create wizard | Wizard / selection workflow | Step-driven selection inside restore-run creation | forbidden | Inline descriptions and next-action guidance | None at selection stage | `/admin/t/{tenant}/restore-runs` | `/admin/t/{tenant}/restore-runs/create` | Tenant context plus selected backup set | Restore run / Backup selection | Backup-set and item quality before safety checks | none |

## Operator Surface Contract *(mandatory when operator-facing surfaces are changed)*

| Surface | Primary Persona | Surface Type | Primary Operator Question | Default-visible Information | Diagnostics-only Information | Status Dimensions Used | Mutation Scope | Primary Actions | Dangerous Actions |
|---|---|---|---|---|---|---|---|---|---|
| Backup sets page | Tenant operator | List | Which backup sets look strong or weak as recovery input? | Name, item count, capture timing, lifecycle status, compact backup-quality summary | Raw item metadata, per-item details, restore safety analysis | Capture lifecycle, input quality | TenantPilot only for existing archive and restore maintenance | Open backup set, Create backup set | Archive, Restore archived set, Force delete |
| Backup set detail | Tenant operator | Detail | Is this backup set a strong or weak recovery input, and why? | Quality summary, degraded counts, next actions, related context | Raw payloads, raw assignment JSON, integrity detail | Input quality, assignment completeness, lifecycle status | TenantPilot only for existing maintenance actions | Inspect backup items, open related context from contextual links | Archive, Restore archived set, Force delete |
| Backup items table | Tenant operator | Table inside detail | Which items are degraded inside this backup set? | Display name, type, snapshot mode, assignment issue signal, orphaned-assignment signal | Full metadata, raw assignments, low-level IDs | Snapshot completeness, assignment completeness | TenantPilot only for add and remove maintenance; none for visibility | Refresh, Add Policies, inspect row | Remove, Remove selected |
| Policy versions page | Tenant operator | List | Which versions are full-payload versus metadata-only or otherwise degraded? | Policy identity, version number, capture time, snapshot mode, quality signal | Raw JSON, diff payload, redaction detail | Version lifecycle, input quality | TenantPilot only for existing archive and maintenance actions | Open version, open related policy, open backup sets from empty state | Restore via Wizard, Archive, Restore archived version, Force delete, bulk prune |
| Policy version detail | Tenant operator | Detail | Is this version worth using as restore input? | Version identity, explicit backup-quality section, related context | Normalized settings, raw JSON, diff, redaction detail | Input quality, version lineage | None for visibility; existing restore entry remains separately gated | Open related policy | No new destructive detail action |
| Restore run create wizard | Tenant operator with restore rights | Wizard | Which backup set or items should I avoid or inspect before running safety checks? | Backup-set quality summary, per-item quality descriptions, stronger or weaker input hints | Risk-check output, preview diff, unresolved mapping detail | Input quality first, restore safety later | Simulation only until later confirmation and execution steps | Select backup set, select items, continue through wizard | Final restore execution remains later in the flow |

## Proportionality Review *(mandatory when structural complexity is introduced)*

- **New source of truth?**: no
- **New persisted entity/table/artifact?**: no
- **New abstraction?**: yes
- **New enum/state/reason family?**: no
- **New cross-domain UI framework/taxonomy?**: no
- **Current operator problem**: Operators can currently tell that a backup, item, or version exists, but they cannot quickly tell whether it is strong, degraded, or metadata-only as recovery input before they reach deep detail or restore-safety surfaces.
- **Existing structure is insufficient because**: The relevant truth is split across backup metadata, assignment metadata, disabled restore actions, and later restore-safety checks. That fragmentation causes false confidence and late discovery.
- **Narrowest correct implementation**: Introduce at most one narrow derived backup-quality helper that reads existing `BackupSet`, `BackupItem`, and `PolicyVersion` metadata and exposes a compact summary for existing list, detail, and wizard surfaces.
- **Ownership cost**: A small amount of shared derivation logic plus unit, feature, and RBAC regression tests that keep quality labels aligned with the underlying metadata keys.
- **Alternative intentionally rejected**: A persisted backup-health table, a tenant-wide scoring model, or a new recovery-confidence engine were rejected because they would create new truth, new state, and new ownership cost before the current surfaces tell the existing truth well.
- **Release truth**: current-release truth hardening

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Judge Backup Sets Early (Priority: P1)

A tenant operator opens the backup-set list or detail page and needs to tell within seconds whether a backup set is merely stored or also looks strong enough to inspect further as recovery input.

**Why this priority**: This is the earliest point where false confidence must be prevented. If the operator misreads `completed` as `good backup`, every later restore decision inherits that error.

**Independent Test**: Can be fully tested by loading backup-set list and detail pages with full-quality and degraded fixtures and verifying that lifecycle status and backup quality are shown separately.

**Acceptance Scenarios**:

1. **Given** a tenant has one full-quality backup set and one degraded backup set, **When** the operator opens the backup-set list, **Then** each row shows capture status separately from a compact backup-quality summary.
2. **Given** a backup set contains degraded items, **When** the operator opens backup-set detail, **Then** the page shows a quality summary with degradation counts before per-item diagnostics or raw metadata.
3. **Given** a completed backup set contains only metadata-only items, **When** the operator scans the list or detail surface, **Then** the surface does not imply that the set is safe to restore or broadly recoverable.

---

### User Story 2 - Inspect Item and Version Strength (Priority: P2)

A tenant operator reviewing backup items or policy versions needs to distinguish full payloads from metadata-only or assignment-degraded inputs without inferring that truth from disabled actions or hidden metadata.

**Why this priority**: Item-level and version-level truth determines whether a backup set is actually useful. If this information stays implicit, operators cannot compare restore inputs confidently.

**Independent Test**: Can be fully tested by loading the backup-items table, policy-version list, and policy-version detail page with mixed-quality records and verifying explicit per-record quality signals.

**Acceptance Scenarios**:

1. **Given** backup items include full payload, metadata-only, assignment-fetch-failed, and orphaned-assignment examples, **When** the operator reviews the backup-items table, **Then** each item shows explicit snapshot mode and assignment-quality signals.
2. **Given** policy versions include both full payload and metadata-only snapshots, **When** the operator reviews the policy-version list, **Then** snapshot mode is visible without needing to hover disabled actions.
3. **Given** a policy-version detail page represents a degraded version, **When** the operator opens the page, **Then** the page shows an explicit backup-quality section that explains the weakness without using restore availability as the only signal.

---

### User Story 3 - Select Restore Inputs With Early Warning (Priority: P3)

A tenant operator starting a restore run needs to see weak backup sets and weak items before risk checks or preview steps so that poor input quality is visible at the first selection point.

**Why this priority**: Restore-safety hardening already exists later in the flow. This story closes the trust gap before the operator commits to a candidate backup set or item selection.

**Independent Test**: Can be fully tested by opening the restore-run creation wizard with degraded backup-set and backup-item fixtures and verifying that selection step labels or descriptions expose quality truth before safety checks run.

**Acceptance Scenarios**:

1. **Given** a degraded backup set is available for restore, **When** the operator opens restore wizard step 1, **Then** the backup-set selection surface shows that the set contains degraded input before the operator reaches safety checks.
2. **Given** selected restore items include metadata-only and assignment-degraded inputs, **When** the operator reviews restore wizard step 2, **Then** each affected item is clearly marked as degraded before any risk-check action occurs.
3. **Given** a backup set is full-quality, **When** the operator reviews steps 1 and 2, **Then** the wizard can communicate that no degradations are currently detected without claiming that restore is safe.

---

### User Story 4 - Preserve Truth Under RBAC Boundaries (Priority: P4)

A tenant member with backup or version viewing rights but without restore or maintenance rights still needs to see the same backup-quality truth so that authorization boundaries do not make weak inputs look calmer than they are.

**Why this priority**: Security boundaries must not distort source-of-truth visibility. Otherwise the UI becomes less truthful for read-only operators than for managers.

**Independent Test**: Can be fully tested by signing in as a tenant member with `TENANT_VIEW` but without restore capabilities and verifying that list and detail surfaces still expose quality truth while restore actions remain unavailable.

**Acceptance Scenarios**:

1. **Given** a tenant member has backup and version viewing rights but lacks restore permission, **When** they open backup-set or policy-version surfaces, **Then** backup-quality signals remain visible while restore actions stay unavailable.
2. **Given** a non-member requests the same tenant-scoped surfaces, **When** the request is made, **Then** the system responds with deny-as-not-found semantics instead of exposing resource existence.

### Edge Cases

- A backup set is `completed` and has zero degradations; the surface must explicitly show that no degradations are detected rather than leaving quality unstated.
- A backup set mixes full payload items with metadata-only and assignment-degraded items; the summary must show mixed quality without collapsing to a single misleading label.
- Assignment capture is marked not applicable for a policy type; the surface must not mislabel that condition as a failure.
- Older items or versions lack enough metadata to derive quality; the surface may show `unknown` only when no existing authoritative signal is available.
- Archived backup sets and archived policy versions must retain the same quality truth on list and detail surfaces as active records.

## Requirements *(mandatory)*

This feature introduces no new Microsoft Graph calls, no new background work, no new `OperationRun`, and no new persistence. It is a read-first truth-hardening feature that makes existing backup and version metadata visible earlier and more clearly.

Authorization remains in the tenant/admin plane under `/admin/t/{tenant}/...`. Non-members must continue to receive 404 responses. Established members missing mutation capabilities must continue to receive 403 on execution. Members with `TENANT_VIEW` must still see backup-quality truth on backup and version surfaces even when restore entry points remain unavailable.

Badge and UI semantics must stay centralized. Existing shared badge semantics, especially snapshot-mode badges, remain the canonical language for status-like signals. Any new quality labels or summaries must be derived from shared backup-quality rules rather than page-local color or wording decisions.

The affected Filament surfaces must keep exactly one primary inspect or open model, must not add redundant View actions, and must preserve destructive-action placement and confirmation behavior already defined by the action-surface contract. Quality truth is additive to existing surfaces, not a new local action framework.

If a shared backup-quality helper is introduced, it must replace duplicated page-local derivation instead of layering a second semantic system on top of existing restore-safety logic. Restore safety, preview eligibility, and execution outcome remain separate truths.

### Functional Requirements

- **FR-176-001**: The system MUST present backup existence truth separately from backup quality truth so that `completed`, `partial`, and `failed` remain capture-lifecycle states rather than quality claims.
- **FR-176-002**: The backup-set list MUST show a compact backup-quality summary per row that indicates either no detected degradations or the presence of one or more degradation families.
- **FR-176-003**: The backup-set detail surface MUST show a default-visible quality summary before deep diagnostics, including counts for metadata-only items, assignment-capture issues, orphaned-assignment signals, and any other degradation families that are already authoritatively derivable.
- **FR-176-004**: The backup-items table MUST show per-item snapshot mode and per-item assignment-quality signals without requiring the operator to open raw JSON or later restore surfaces.
- **FR-176-005**: The policy-version list MUST show snapshot mode for every visible version and MUST make degraded versions distinguishable from full-payload versions at scan speed.
- **FR-176-006**: The policy-version detail page MUST show explicit backup-quality truth and MUST NOT rely on disabled restore actions or tooltips as the only signal that a version is weak.
- **FR-176-007**: Restore wizard step 1 MUST expose backup-set quality before the operator reaches safety checks, preview generation, or execution confirmation.
- **FR-176-008**: Restore wizard step 2 MUST expose item-level quality before safety checks, including metadata-only and assignment-quality degradations where the underlying data already exists.
- **FR-176-009**: Metadata-only state MUST appear on backup and version surfaces as soon as the source metadata can establish it, and MUST NOT first surface as a restore-stage surprise.
- **FR-176-010**: Assignment-capture failures and orphaned-assignment signals MUST be operator-visible on backup-quality surfaces whenever the metadata already records them.
- **FR-176-011**: Backup-quality surfaces MUST NOT claim that a backup set, item, or version is safe to restore, restore-ready, or guaranteed to succeed.
- **FR-176-012**: Backup-quality surfaces MUST NOT imply that a strong-looking backup set proves tenant-wide recoverability, a guaranteed rollback path, or a recovery certification outcome.
- **FR-176-013**: Version history surfaces MUST separate three truths: version exists, version is selectable under current permissions and lifecycle state, and version has stronger or weaker payload quality.
- **FR-176-014**: When a backup set, item, or version is weak, the surface MUST suggest meaningful next actions such as opening detail, inspecting degraded items, preferring a stronger version, or continuing into restore with caution.
- **FR-176-015**: Quality signals MUST remain visible to users with backup or version viewing rights even when deeper restore or operations surfaces are inaccessible.
- **FR-176-016**: The feature MUST derive backup-quality truth from existing tenant-owned records and metadata and MUST NOT require a new persistence model, new materialized state, or a new cross-tenant scoring engine.

## Assumptions

- Existing metadata keys such as `source`, `snapshot_source`, `assignments_fetch_failed`, `assignment_capture_reason`, `has_orphaned_assignments`, scope-tag metadata, and redaction or integrity notes are authoritative enough for first-pass backup-quality truth.
- Existing restore-safety checks remain the sole owner of blocker, warning, preview-only, and execution-gating language.
- Older records may lack some quality metadata; in those cases the product may show `unknown quality` only when the existing record truly does not contain enough information to derive a stronger statement.

## Dependencies

- Existing tenant-scoped backup, version, and restore resources remain the operator entry points.
- Existing centralized badge semantics, especially snapshot-mode badges, remain the canonical language for visible status.
- Existing restore-safety integrity behavior and metadata-only execution blocking remain unchanged and continue to run after the earlier backup-quality surfaces.

## Out of Scope and Follow-up

- No redesign of restore execution, restore-safety logic, backup capture, retention or pruning, tenant-wide recovery scoring, notification domains, or new persisted backup-health artifacts.
- Reasonable follow-up work includes a backup-health dashboard, a broader recovery-confidence rollup, and version-rollback usefulness guidance after the current truth-hardening slice is complete.

## UI Action Matrix *(mandatory when Filament is changed)*

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| BackupSetResource | `app/Filament/Resources/BackupSetResource.php` | Create backup set | `recordUrl()` clickable row | Primary related action, More: Restore / Archive / Force delete | More: Archive Backup Sets / Restore Backup Sets / Force Delete Backup Sets | Create backup set | Grouped existing mutations remain; related navigation stays in contextual summary links, not the header | Create backup set submit plus cancel | Existing audit logging remains for restore, archive, and force delete; read-only quality truth adds no new audit event | Action surface contract stays satisfied. Quality summary is additive only. |
| BackupItemsRelationManager | `app/Filament/Resources/BackupSetResource/RelationManagers/BackupItemsRelationManager.php` | Refresh, Add Policies | Clickable row | More: Remove | More: Remove selected | Add Policies | n/a | n/a | Existing operation-run and audit behavior for remove flows remains; visibility changes are read-only | Existing relation-manager exception remains; no redundant View action is added. |
| PolicyVersionResource | `app/Filament/Resources/PolicyVersionResource.php` | none | `recordUrl()` clickable row | Primary related action, More: Restore via Wizard / Archive / Force delete / Restore archived version | More: Prune Versions / Restore Versions / Force Delete Versions | Open backup sets | Existing detail header remains intentionally minimal | n/a | Existing audit logging remains for archive, force delete, and restore; restore-via-wizard keeps existing restore-run and backup creation behavior | Policy-version detail gains explicit quality truth so disabled actions stop being the only signal. |
| Restore run create wizard | `app/Filament/Resources/RestoreRunResource.php`, `app/Filament/Resources/RestoreRunResource/Pages/CreateRestoreRun.php` | none | n/a | n/a | n/a | none | n/a | Wizard Previous / Next / Create restore run | Existing restore-run create and execute audit behavior remains unchanged; selection-stage quality visibility is read-only | Step 1 and step 2 gain quality descriptions only. No new destructive action is introduced. |

## Key Entities *(include if feature involves data)*

- **BackupSet**: A tenant-owned capture collection that already records lifecycle state, timestamps, item count, and metadata describing how the set was produced.
- **BackupItem**: A tenant-owned captured recovery input for one policy or foundation item, including payload, assignments, and metadata that can expose snapshot completeness and assignment-quality issues.
- **PolicyVersion**: An immutable tenant-owned version record that stores captured snapshot data, related metadata, assignments, redaction context, and capture timing.
- **Restore selection context**: The tenant-scoped backup-set and optional item selection that an operator builds before restore-safety checks and preview generation.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: In validation sessions and acceptance tests, an operator can identify whether a backup set is full-quality or degraded from the list or detail surface in under 10 seconds without opening raw JSON or preview surfaces.
- **SC-002**: In 100% of tested cases where existing records contain metadata-only, assignment-fetch-failed, or orphaned-assignment signals, at least one default-visible backup-quality signal appears on every affected list, detail, or wizard selection surface.
- **SC-003**: In 100% of RBAC test cases, users with backup or version viewing rights but without restore rights can still see backup-quality truth on list and detail surfaces while restore actions remain unavailable.
- **SC-004**: In 100% of degraded restore-input scenarios covered by acceptance tests, backup-set and item quality is visible before the operator reaches restore-safety checks or preview generation.