TenantAtlas/specs/184-dashboard-recovery-honesty/spec.md
ahmido f1a73490e4 feat: finalize dashboard recovery honesty (#215)
## Summary
- add a dedicated Recovery Readiness dashboard widget for backup posture and recovery evidence
- group Needs Attention items by domain and elevate the recovery call-to-action
- align restore-run and recovery posture tests with the extracted widget and continuity flows
- include the related spec artifacts for 184-dashboard-recovery-honesty

## Verification
- `cd /Users/ahmeddarrazi/Documents/projects/TenantAtlas/apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`
- `cd /Users/ahmeddarrazi/Documents/projects/TenantAtlas/apps/platform && ./vendor/bin/sail artisan test --compact --filter="DashboardKpisWidget|DashboardRecoveryPosture|TenantDashboardDbOnly|TenantpilotSeedBackupHealthBrowserFixtureCommand|NeedsAttentionWidget"`
- browser smoke verified on the calm, unvalidated, and weakened dashboard states

## Notes
- Livewire v4.0+ compliant with Filament v5
- no panel provider changes; Laravel 11+ provider registration remains in `bootstrap/providers.php`
- Recovery Readiness stays within the existing tenant dashboard asset strategy; no new Filament asset registration required

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #215
2026-04-08 23:21:36 +00:00

22 KiB

Feature Specification: Dashboard Recovery Posture Honesty

Feature Branch: [184-dashboard-recovery-honesty]
Created: 2026-04-08
Status: Draft
Input: User description: "Spec 184 — Dashboard Recovery Posture Honesty"

Spec Scope Fields (mandatory)

  • Scope: tenant
  • Primary Routes: /admin/t/{tenant}, /admin/t/{tenant}/restore-runs, /admin/t/{tenant}/restore-runs/{record}, /admin/t/{tenant}/backup-sets, /admin/t/{tenant}/backup-sets/{record}
  • Data Ownership: Tenant-owned BackupSet, RestoreRun, and linked OperationRun outcome context are read within the active workspace and tenant scope to derive a more honest overview statement. No new persisted recovery-confidence state is introduced.
  • RBAC: Workspace plus tenant membership remains required on every affected surface. Members who can open the tenant dashboard must see honest summary boundaries even when they cannot start or manage restore runs. Existing restore-run creation and mutation actions remain under current restore permissions. Non-members continue to receive deny-as-not-found semantics.

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

Surface Surface Type Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type
Tenant dashboard KPI strip Dashboard / stats overview Explicit stat click per signal forbidden Supporting text inside the stat description none /admin/t/{tenant} Signal-specific drill-through to /admin/t/{tenant}/restore-runs or /admin/t/{tenant}/restore-runs/{record} Workspace context plus tenant context Dashboard KPIs / Backup posture Backup health is separate from restore evidence existing widget pattern
Needs Attention / Healthy Checks panel Dashboard / attention summary Explicit card CTA per attention item; healthy state is read-only forbidden Card CTA and helper copy only none /admin/t/{tenant} /admin/t/{tenant}/restore-runs, /admin/t/{tenant}/restore-runs/{record} Workspace context plus tenant context Needs attention / Healthy checks Unvalidated and weakened recovery confidence are visible before drilldown existing widget pattern
Restore runs page CRUD / list-first resource Full-row click to restore-run detail required Existing header action plus More menu Existing More and bulk More groups /admin/t/{tenant}/restore-runs /admin/t/{tenant}/restore-runs/{record} Tenant context plus restore-run identity Restore runs / Restore run Recent restore outcome and follow-up reason confirm the overview claim none

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Surface Primary Persona Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Tenant dashboard KPI strip Tenant operator Dashboard summary Do healthy backups also have supporting restore evidence, or is recovery posture still unvalidated? Backup posture, recovery-confidence qualifier, visible claim boundary, next step Per-run causes, raw backup metadata, deeper restore evidence backup health, recovery evidence availability, recent restore attention None; read-only summary Open restore history, open supporting backup context when backup health itself needs follow-up none
Needs Attention / Healthy Checks panel Tenant operator Dashboard attention and healthy-boundary surface What recovery-confidence issue needs action now, and why? No restore history, weakened recent restore history, boundary copy, concrete next action Full restore results, preview or check details, low-level run metadata backup health, recovery evidence availability, restore result attention, recency None; read-only summary Open restore history, open latest problematic restore run none
Restore runs page Tenant operator List and detail Which restore runs explain the dashboard signal? Recent restore status, result-attention reason, completed timing, related backup context Assignment-level failures, preview detail, low-level result payloads execution lifecycle, result attention, follow-up state Existing restore-run maintenance actions only Inspect restore run, create restore run Existing rerun, archive, restore archived, and force-delete actions

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: no
  • New persisted entity/table/artifact?: no
  • New abstraction?: no
  • New enum/state/reason family?: no
  • New cross-domain UI framework/taxonomy?: no
  • Current operator problem: A tenant dashboard can currently look calm or healthy even when restore history is absent or recent restore results weaken confidence, so operators can overread backup health as recovery posture.
  • Existing structure is insufficient because: Backup health, restore history, and restore result attention already exist as separate truths, but the summary surfaces do not yet combine them with an honest claim boundary. Operators must manually cross-check multiple pages to avoid an overclaim.
  • Narrowest correct implementation: Derive a small set of overview honesty signals from existing backup health assessment, restore history presence, and per-run restore result attention, then show them on the existing dashboard widgets and existing restore-run drilldowns.
  • Ownership cost: Additional widget copy, narrow derived-summary logic, and focused feature plus RBAC regression tests that keep overview language and drilldown continuity aligned.
  • Alternative intentionally rejected: A new recovery-confidence score, enum, page, or persisted posture state was rejected because it would introduce new truth and new ownership cost before the current overview surfaces tell the existing truth accurately.
  • Release truth: current-release truth hardening

User Scenarios & Testing (mandatory)

User Story 1 - See Unvalidated Recovery Confidence Early (Priority: P1)

A tenant operator opens the tenant dashboard and needs to know within seconds whether healthy-looking backups are backed by any relevant restore evidence or whether recovery confidence is still unvalidated.

Why this priority: This is the highest-risk trust gap. If the first overview screen quietly converts healthy backups into a healthy recovery impression, later detail truth arrives too late.

Independent Test: Can be fully tested by rendering the tenant dashboard with healthy backup fixtures and no relevant restore history, then verifying that the overview shows an explicit unvalidated recovery-confidence signal instead of an all-clear.

Acceptance Scenarios:

  1. Given a tenant has healthy backup posture and no relevant restore history, When the operator opens the tenant dashboard, Then the summary shows healthy backups plus an explicit unvalidated recovery-confidence message and a next action.
  2. Given the same tenant has no other attention items, When the healthy-check state renders, Then the widget does not show an unqualified all-good message and instead keeps the recovery-confidence boundary visible.

User Story 2 - Escalate Weak Restore History on Overview (Priority: P2)

A tenant operator reviewing the dashboard needs recent failed, partial, or follow-up restore results to affect the overview immediately instead of hiding inside restore history details.

Why this priority: Weak restore history is evidence that directly changes how much trust the operator should place in recovery posture. It cannot remain a drilldown-only fact.

Independent Test: Can be fully tested by rendering overview surfaces with recent failed, partial, and follow-up restore fixtures and verifying that each case creates a visible confidence-related attention signal with matching drilldown behavior.

Acceptance Scenarios:

  1. Given a tenant has healthy backups but a recent failed or partial restore run, When the operator opens the dashboard, Then Needs Attention shows a recovery-confidence issue that links to restore history explaining the same failure state.
  2. Given a tenant has a recent restore run that completed with follow-up required, When the operator opens the dashboard, Then the overview shows weakened confidence rather than a neutral or healthy-only message.
  3. Given recent restore history exists without a current confidence-weakening attention state, When the operator opens the dashboard, Then the overview may say that no recent restore issues are visible but does not claim that recovery is proven.

User Story 3 - Preserve Honest Drilldowns and RBAC Boundaries (Priority: P3)

A tenant operator or read-only member needs the dashboard signal and the destination surface to tell the same story, while RBAC limits must never make the summary look stronger than the accessible evidence.

Why this priority: Overview honesty fails if the next click contradicts the dashboard or if authorization gaps hide weakness by omission.

Independent Test: Can be fully tested by opening overview signals as different tenant members, verifying that the linked restore-history surface confirms the same reason, and ensuring restricted users still see cautious summary language.

Acceptance Scenarios:

  1. Given the dashboard says recovery confidence is unvalidated because no relevant restore history exists, When the operator follows the dashboard action, Then the destination surface confirms that the tenant lacks relevant restore history.
  2. Given the dashboard says recovery confidence is weakened by a recent problematic restore, When the operator follows the dashboard action, Then the destination surface confirms the same failed, partial, or follow-up reason.
  3. Given a tenant member can see the dashboard but cannot open deeper restore evidence, When the dashboard renders, Then the summary remains cautious and truthful and does not replace missing evidence with a stronger claim.

Edge Cases

  • A tenant has only draft, preview-only, or dry-run restore history; the overview treats recovery confidence as unvalidated rather than positive.
  • A tenant has both an older successful restore and a more recent failed or follow-up restore; the weakened signal takes precedence on the summary surface.
  • A summary signal points to a restore run that is no longer directly openable; the drilldown falls back to tenant-scoped restore history rather than a dead end.
  • A user can see the dashboard but lacks permission to inspect restore runs; the summary still states unvalidated or weakened confidence without suggesting that everything is healthy.
  • Healthy backup posture and backup-automation follow-up can coexist with unvalidated recovery confidence; the overview must not let one healthy-sounding statement erase the other caution.

Requirements (mandatory)

This feature introduces no new Microsoft Graph calls, no new background work, no new OperationRun, and no new persistence. It is a read-first truth-hardening slice that makes existing backup and restore evidence visible more honestly on tenant overview surfaces.

Authorization remains in the tenant/admin plane under /admin/t/{tenant}/.... Non-members must continue to receive 404 responses. Established members missing deeper restore capabilities must continue to receive 403 on execution paths, but summary visibility must not depend on restore-mutation rights.

This slice reuses existing Filament dashboard widgets, stat descriptions, attention cards, and existing restore-run resource surfaces. No new local badge framework, page-local status language, or extra action surface is introduced. UI-FIL-001 is satisfied by continuing to use existing Filament widget primitives and shared status language. UX-001 create, edit, and detail-form rules are not materially changed; the dashboard keeps its existing layout, and the restore-run resource keeps its existing list-and-view contract.

The affected Filament surfaces keep exactly one primary inspect or open model, add no redundant View actions, and introduce no new destructive actions. Existing destructive restore-run actions continue to follow the current placement and confirmation rules. Action Surface Contract expectations therefore remain satisfied.

Existing per-run restore result attention remains the authoritative signal for restore outcome quality. This feature may summarize or elevate that truth, but it must not duplicate it with a second scoring or status system.

Functional Requirements

  • FR-184-001: The system MUST present tenant backup health and tenant recovery-confidence evidence as separate truths on tenant dashboard summary surfaces.
  • FR-184-002: When backup health is healthy but no relevant restore history exists, the system MUST display an explicit unvalidated recovery-confidence state and MUST NOT present an all-clear summary.
  • FR-184-003: When the system cannot determine recovery confidence from the available restore history, the system MUST map that limitation to the canonical unvalidated overview state and say that limitation directly instead of inferring a positive recovery claim from backup health alone.
  • FR-184-004: Needs Attention or the healthy-boundary surface MUST surface absence of restore history as an overview-relevant condition with a clear next action.
  • FR-184-005: Recent restore history with failed, partial, completed_with_follow_up, or an equivalent confidence-weakening attention state MUST appear on overview surfaces as a recovery-confidence issue.
  • FR-184-006: Overview surfaces MUST distinguish unvalidated confidence from weakened confidence and MUST NOT collapse both states into one ambiguous bucket.
  • FR-184-007: Any positive backup-health summary on the dashboard MUST show a visible claim boundary that healthy backups reflect backup inputs only and do not prove restore success.
  • FR-184-008: Healthy checks MUST NOT render an unqualified healthy or all-clear state when recovery confidence is unvalidated or weakened, and any no_recent_issues_visible healthy check MUST preserve the non-proof boundary.
  • FR-184-009: When recovery confidence is unvalidated or weakened, overview copy MUST explain what is missing or concerning, why that affects confidence, and what the operator should do next.
  • FR-184-010: Overview signals about missing restore history MUST drill into a tenant-scoped restore-history surface that confirms the absence or insufficiency of relevant restore evidence.
  • FR-184-011: Overview signals about weakened restore history MUST drill into a tenant-scoped restore-history surface or restore-run detail that confirms the same failed, partial, or follow-up reason shown on the summary surface.
  • FR-184-012: The feature MUST reuse existing per-run restore result attention as the authoritative quality signal for restore outcomes and MUST NOT introduce a parallel positive-scoring or reason system.
  • FR-184-013: The feature MUST NOT introduce a new state or message that claims recovery is proven, guaranteed, or strongly confirmed beyond the evidence the current system already has.
  • FR-184-014: RBAC limits on restore history visibility MUST NOT cause summary surfaces to make stronger recovery claims than the visible evidence supports; when detailed restore evidence cannot be opened, the summary must remain cautious and truthful.
  • FR-184-015: This slice MUST NOT introduce or alter tenant-linked recovery posture summaries outside the tenant dashboard. Any future reuse of this posture signal on another surface MUST preserve the same distinction between backup posture and the canonical recovery-evidence states unvalidated, weakened, and no_recent_issues_visible.
  • FR-184-016: The feature MUST derive its summary state from existing tenant backup health, restore history, and restore result attention records and MUST NOT add a new persisted recovery-confidence field, table, or scoring artifact.
  • FR-184-017: When recent restore history exists without a current confidence-weakening attention state, overview surfaces MAY state that no recent restore issues are visible, but MUST stop short of claiming recovery proof.
  • FR-184-018: The feature MUST cap recovery-evidence derivation to the 10 most recent tenant-scoped restore-run candidates, eager-load only the relations required for summary and drillthrough selection, and MUST NOT introduce N+1 row lookups on dashboard or restore-run list surfaces.

Assumptions

  • Relevant restore history means tenant-scoped restore runs that have reached an executed result state or another existing result-attention state that the current system can classify. Draft-only, preview-only, or dry-run-only history does not count as proven recovery evidence.
  • Existing restore history surfaces already show enough result detail to confirm failed, partial, and follow-up reasons once the operator drills down from the overview.
  • Workspace-level surfaces that later reuse this posture language should consume the same tenant-level semantics rather than creating a separate recovery-confidence vocabulary.

Dependencies

  • Existing tenant dashboard surfaces remain the operator entry point for this slice.
  • Existing TenantBackupHealthAssessment and TenantBackupHealthResolver remain the source of backup-input truth.
  • Existing RestoreRun history surfaces and RestoreSafetyResolver::resultAttentionForRun(...) remain the source of restore-outcome truth.
  • Existing RBAC helper-text and disabled-link patterns remain the fallback behavior when the operator cannot open deeper restore evidence.

Out of Scope and Follow-up

  • No new recovery-confidence engine, score, enum, or dedicated posture page.
  • No automatic restore validation, scheduled restore probes, or restore execution changes.
  • No new backup-health rules, restore-result-attention taxonomy changes, or restore-safety model redesign.
  • No new claim that a tenant is recovery-proven.
  • Reasonable follow-up work includes broader workspace-level recovery rollups after tenant-level overview honesty is stable.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
Tenant dashboard summary widgets app/Filament/Pages/TenantDashboard.php, app/Filament/Widgets/Dashboard/DashboardKpis.php, app/Filament/Widgets/Dashboard/NeedsAttention.php none added Explicit stat and card CTA only; no row click none n/a none n/a n/a no new audit event Action Surface Contract stays satisfied because the dashboard remains read-only. UI-FIL-001 stays satisfied through existing Filament widget primitives. UX-001 create and edit form rules are not applicable to this dashboard slice.
RestoreRunResource list and detail app/Filament/Resources/RestoreRunResource.php, app/Filament/Resources/RestoreRunResource/Pages/ListRestoreRuns.php, app/Filament/Resources/RestoreRunResource/Pages/ViewRestoreRun.php Existing New restore run action remains recordUrl() clickable row to restore-run detail Existing More-menu maintenance actions remain unchanged Existing grouped bulk actions remain unchanged Existing New restore run empty-state CTA remains none added Existing restore-run create flow remains unchanged existing restore-run mutation audit behavior only This spec reuses restore-run list and detail as canonical drilldowns and adds no new destructive action or placement exception.

Key Entities (include if feature involves data)

  • Backup health assessment: Tenant-level summary of backup freshness and input health that is useful but not sufficient to prove recovery success.
  • Restore history: Tenant-scoped record of restore runs whose presence, absence, and recent outcomes affect how strongly the product can speak about recovery confidence.
  • Restore result attention: Per-run classification that distinguishes completed, failed, partial, follow-up, and not-executed outcome states that matter for operator trust.
  • Recovery posture summary: Non-persisted dashboard statement that combines backup health, restore history presence, and restore-result attention without becoming a new score or stored state.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: In acceptance testing, operators can identify within 10 seconds whether a tenant has healthy backups plus unvalidated or weakened recovery evidence from /admin/t/{tenant} without opening raw details.
  • SC-002: In 100% of tested tenants with no relevant restore history, the dashboard or healthy-boundary surface shows an explicit unvalidated recovery-confidence signal and never shows a healthy-only all-clear.
  • SC-003: In 100% of tested tenants with recent failed, partial, or follow-up restore runs, the overview shows a confidence-related attention item with a drilldown that confirms the same reason.
  • SC-004: In 100% of tested positive backup-health scenarios, summary-level copy includes the claim boundary that healthy backups do not prove restore success.
  • SC-005: In 100% of tested RBAC-restricted scenarios, summary surfaces remain cautious and truthful even when the user cannot open deeper restore evidence pages.
  • SC-006: In targeted regression coverage, recovery-evidence derivation evaluates no more than the 10 most recent tenant-scoped restore-run candidates and introduces no N+1 row queries on the dashboard or restore-run list surfaces.