TenantAtlas/specs/180-tenant-backup-health/plan.md
ahmido 6f8eb28ca2 feat: add tenant backup health signals (#212)
## Summary
- add the Spec 180 tenant backup-health resolver and value objects to derive absent, stale, degraded, healthy, and schedule-follow-up posture from existing backup and schedule truth
- surface backup posture and reason-driven drillthroughs in the tenant dashboard and preserve continuity on backup-set and backup-schedule destinations
- add deterministic local/testing browser-fixture seeding plus a local fixture-login helper for the blocked drillthrough `403` scenario, along with the related spec artifacts and focused regression coverage

## Testing
- `vendor/bin/sail artisan test --compact tests/Feature/Auth/BackupHealthBrowserFixtureLoginTest.php tests/Feature/Console/TenantpilotSeedBackupHealthBrowserFixtureCommandTest.php`
- `vendor/bin/sail artisan test --compact tests/Unit/Support/BackupHealth/TenantBackupHealthResolverTest.php tests/Feature/Filament/DashboardKpisWidgetTest.php tests/Feature/Filament/NeedsAttentionWidgetTest.php tests/Feature/Filament/TenantDashboardTruthAlignmentTest.php tests/Feature/Filament/TenantDashboardTenantScopeTest.php tests/Feature/Filament/TenantDashboardDbOnlyTest.php tests/Feature/Filament/BackupSetListContinuityTest.php tests/Feature/Filament/BackupSetEnterpriseDetailPageTest.php tests/Feature/BackupScheduling/BackupScheduleLifecycleTest.php tests/Feature/Auth/BackupHealthBrowserFixtureLoginTest.php tests/Feature/Console/TenantpilotSeedBackupHealthBrowserFixtureCommandTest.php`

## Notes
- Filament v5 / Livewire v4 compliant; no panel-provider change was needed, so `bootstrap/providers.php` remains unchanged
- no new globally searchable resource was introduced, so global-search behavior is unchanged
- no new destructive action was added; existing destructive actions and confirmation behavior remain unchanged
- no new asset registration was added; the existing deploy-time `php artisan filament:assets` step remains sufficient
- the local fixture login helper route is limited to `local` and `testing` environments
- the focused and broader Spec 180 packs are green; the full suite was not rerun after these changes

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #212
2026-04-07 21:35:58 +00:00

24 KiB

Implementation Plan: Tenant Backup Health Signals

Branch: 180-tenant-backup-health | Date: 2026-04-07 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/spec.md

Summary

Harden the tenant dashboard so operators can tell within seconds whether a tenant has no usable backup basis, a stale latest backup basis, a degraded latest backup basis, or a healthy recent backup basis without opening deep backup surfaces first. The implementation keeps BackupSet, BackupItem, BackupSchedule, and the existing backup-quality layer as the only sources of truth, introduces one narrow derived tenant backup-health resolver over those records, adds a config-backed freshness policy with schedule follow-up semantics, integrates the result into DashboardKpis and NeedsAttention, and preserves reason-driven drillthrough into existing backup-set and backup-schedule surfaces without adding a new persistence model or recovery-confidence framework.

Key approach: work inside the existing TenantDashboard, DashboardKpis, NeedsAttention, BackupSetResource, BackupScheduleResource, and BackupQualityResolver seams; derive tenant posture from the latest relevant completed backup set plus existing backup-quality truth and enabled-schedule timing; keep the feature Filament v5 and Livewire v4 compliant; avoid new tables, Graph calls, jobs, or asset registration; validate the result with focused Pest, Livewire, truth-alignment, and RBAC coverage.

Technical Context

Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing DashboardKpis, NeedsAttention, BackupSetResource, BackupScheduleResource, BackupQualityResolver, BackupQualitySummary, ScheduleTimeService, shared badge infrastructure, and existing RBAC helpers
Storage: PostgreSQL with existing tenant-owned backup_sets, backup_items, and backup_schedules records plus existing JSON-backed backup metadata; no schema change planned
Testing: Pest feature tests, Livewire widget and resource tests, and unit tests for the narrow backup-health derivation layer, all run through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production
Project Type: Laravel monolith web application
Performance Goals: Keep tenant-dashboard rendering DB-only and query-bounded, avoid new N+1 query hotspots while deriving the latest relevant backup basis, and preserve 5 to 10 second operator scanability on tenant dashboard and drillthrough destinations
Constraints: No new backup-health table, no recovery-confidence score, no new Graph contract path, no new queue or OperationRun, no RBAC drift, no calmness leakage beyond evidence, no ad-hoc badge mappings, and no new global Filament assets
Scale/Scope: One tenant-scoped dashboard composition, two existing dashboard widgets, one narrow derived backup-health layer, optional config additions in config/tenantpilot.php, and focused regression coverage across resolver, widget, drillthrough, and RBAC behavior

Constitution Check

GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.

Principle Status Notes
Inventory-first Pass Backups remain immutable snapshot truth; the feature only summarizes existing backup and schedule state on read
Read/write separation Pass This is a read-first dashboard hardening slice; existing backup and schedule mutations remain unchanged and separately confirmed or audited
Graph contract path Pass No new Microsoft Graph calls or contract-registry changes are introduced
Deterministic capabilities Pass Existing capability registry and tenant-scoped authorization remain authoritative; no raw capability strings are introduced
RBAC-UX planes and 404 vs 403 Pass The feature stays in the tenant/admin plane under /admin/t/{tenant}/...; non-members remain 404, and existing in-scope authorization stays server-side
Workspace isolation Pass No workspace-scope broadening or cross-workspace aggregation is added
Tenant isolation Pass Backup sets, backup items, schedules, and dashboard summaries stay tenant-owned and tenant-scoped
Dangerous and destructive confirmations Pass No new destructive action is introduced. Existing backup and schedule destructive actions remain ->requiresConfirmation() and capability-gated
Global search safety Pass No new globally searchable resource is introduced or changed. BackupSetResource already has a view page, BackupScheduleResource already has an edit page, and global search configuration remains unchanged
Run observability Pass No new long-running work or OperationRun usage is introduced
Ops-UX 3-surface feedback Pass No new queued action or run feedback surface is added
Ops-UX lifecycle ownership Pass OperationRun.status and OperationRun.outcome are untouched
Ops-UX summary counts Pass No new summary_counts keys are required
Data minimization Pass The feature reuses existing metadata and timestamps only; no new secret or payload exposure is planned
Proportionality (PROP-001) Pass Added logic is limited to one narrow tenant backup-health layer plus config-backed freshness semantics
Persisted truth (PERSIST-001) Pass No new table, column, or stored backup-health mirror is introduced
Behavioral state (STATE-001) Pass New posture and reason families are derived only because they change operator guidance and dashboard calmness behavior
Badge semantics (BADGE-001) Pass Existing badge and tag infrastructure remains the semantic source; any new backup-health tone stays inside shared UI primitives rather than local mappings
Filament-native UI (UI-FIL-001) Pass Existing Filament widgets, stats, tables, and shared primitives remain the implementation seams
UI naming (UI-NAMING-001) Pass Operator-facing vocabulary stays bounded to backup health, last backup, stale, degraded, no backups, and schedule follow-up, without recoverable or proven claims
Operator surfaces (OPSURF-001) Pass Default-visible tenant-dashboard content becomes more operator-first by exposing backup posture before deep diagnostics
Filament Action Surface Contract Pass BackupSetResource and BackupScheduleResource keep existing inspect models and destructive placement; TenantDashboard remains under the current dashboard exemption
Filament UX-001 Pass with documented variance No new create or edit screen is added. Existing backup-set and backup-schedule resources remain the canonical follow-up surfaces, with summary-first truth added where needed
Filament v5 / Livewire v4 compliance Pass The implementation stays inside the current Filament v5 and Livewire v4 stack
Provider registration location Pass No panel or provider changes are planned; Laravel 11+ provider registration remains in bootstrap/providers.php
Asset strategy Pass No new panel assets are planned; deployment keeps the existing php artisan filament:assets step unchanged

Phase 0 Research

Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/research.md.

Key decisions:

  • Derive tenant backup health from existing BackupSet, BackupItem, BackupSchedule, and BackupQualityResolver truth instead of introducing persisted backup-health state.
  • Let the latest relevant completed backup set govern tenant posture rather than allowing older healthier history to calm the dashboard.
  • Reuse existing backup-quality summaries for degradation truth and add no competing backup-quality taxonomy.
  • Define backup freshness through one config-backed fallback window on the latest relevant completed backup set, while treating schedule timing as a secondary follow-up signal rather than health proof.
  • Derive schedule follow-up from enabled schedules whose current next_run_at or last_run_at semantics indicate missed or overdue execution beyond a small grace window.
  • Integrate backup health into the existing DashboardKpis and NeedsAttention widgets and keep healthy wording suppressed unless the backing evidence is fully supportive.
  • Route dashboard drillthroughs by problem class: no usable backup basis opens the backup-set list, stale or degraded latest backup opens the latest relevant backup-set detail, and schedule follow-up opens the backup-schedules list.
  • Extend the current widget, truth-alignment, backup-set, schedule, and tenant-scope Pest coverage instead of creating a browser-first harness.

Phase 1 Design

Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/:

  • research.md: implementation and domain decisions for tenant backup-health derivation
  • data-model.md: existing entities, config inputs, and derived backup-health models
  • contracts/tenant-backup-health.openapi.yaml: internal logical contract for dashboard summary, backup-set confirmation, and schedule follow-up surfaces
  • quickstart.md: focused automated and manual validation workflow for tenant backup-health signals

Design decisions:

  • No schema migration is required. The design adds only a narrow derived resolver layer and a small config section in config/tenantpilot.php for backup-health freshness semantics.
  • Tenant backup health is derived at render time from the latest relevant completed backup set, existing BackupQualitySummary, and enabled-schedule timing. No new Tenant field, cache table, or materialized rollup is planned.
  • Stale versus degraded precedence is deterministic: absent outranks everything, stale outranks degraded, degraded outranks healthy, and schedule_follow_up remains a secondary reason family. When the latest backup basis is fresh and non-degraded, posture may remain healthy, but schedule_follow_up becomes the active reason and suppresses any positive healthy confirmation until resolved.
  • DashboardKpis owns the primary backup-health stat or card, while NeedsAttention owns reason-specific backup follow-up items and the positive healthy backup check.
  • Backup-set detail remains the confirmation surface for stale and degraded latest-backup posture by combining recency and existing backup-quality summary. Backup-schedules list remains the confirmation surface for schedule-follow-up posture and must foreground one derived follow-up indicator so the missed-run or overdue reason stays scan-fast.
  • The feature stays Filament v5 and Livewire v4 compliant, introduces no new panel provider, and requires no new asset registration.

Project Structure

Documentation (this feature)

specs/180-tenant-backup-health/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── tenant-backup-health.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md

Source Code (repository root, including planned additions for this feature)

app/
├── Filament/
│   ├── Pages/
│   │   └── TenantDashboard.php
│   ├── Resources/
│   │   ├── BackupScheduleResource.php
│   │   └── BackupSetResource.php
│   └── Widgets/
│       └── Dashboard/
│           ├── DashboardKpis.php
│           └── NeedsAttention.php
├── Models/
│   ├── BackupItem.php
│   ├── BackupSchedule.php
│   ├── BackupSet.php
│   └── Tenant.php
├── Support/
│   ├── BackupHealth/
│   │   ├── TenantBackupHealthAssessment.php
│   │   ├── BackupFreshnessEvaluation.php
│   │   ├── BackupScheduleFollowUpEvaluation.php
│   │   ├── BackupHealthActionTarget.php
│   │   ├── BackupHealthDashboardSignal.php
│   │   └── TenantBackupHealthResolver.php
│   ├── BackupQuality/
│   │   ├── BackupQualityResolver.php
│   │   └── BackupQualitySummary.php
│   └── Badges/
│       └── [existing shared badge seams only if new backup-health tone mapping is needed]

config/
└── tenantpilot.php

tests/
├── Feature/
│   ├── BackupScheduling/
│   │   └── BackupScheduleLifecycleTest.php
│   └── Filament/
│       ├── BackupSetListContinuityTest.php
│       ├── BackupSetEnterpriseDetailPageTest.php
│       ├── DashboardKpisWidgetTest.php
│       ├── NeedsAttentionWidgetTest.php
│       ├── TenantDashboardDbOnlyTest.php
│       ├── TenantDashboardTenantScopeTest.php
│       └── TenantDashboardTruthAlignmentTest.php
└── Unit/
    └── Support/
        └── BackupHealth/
            └── TenantBackupHealthResolverTest.php

Structure Decision: Standard Laravel monolith. The implementation stays inside existing dashboard widgets, backup resources, shared support helpers, and current test structure. Any new helper types and lightweight dashboard-facing value objects live under app/Support/BackupHealth/ as a narrow derived layer shared by the dashboard and drillthrough logic.

Implementation Strategy

Phase A — Introduce Narrow Tenant Backup-Health Derivation

Goal: Create one derived path that can answer absent, stale, degraded, or healthy from existing backup and schedule truth without introducing new persistence.

Step File Change
A.1 New narrow helper(s) under app/Support/BackupHealth/ Introduce TenantBackupHealthResolver plus lightweight TenantBackupHealthAssessment, BackupFreshnessEvaluation, BackupScheduleFollowUpEvaluation, BackupHealthActionTarget, and BackupHealthDashboardSignal value objects that derive the latest relevant completed backup basis, posture, primary reason, supporting message, drillthrough target, and healthy-claim boundary with query-bounded latest-basis loading
A.2 app/Support/BackupQuality/BackupQualityResolver.php plus the new backup-health layer Explicitly reuse BackupQualityResolver and BackupQualitySummary output to classify material degradation instead of creating a second backup-quality system
A.3 config/tenantpilot.php Add a small backup_health config section for canonical freshness hours and schedule overdue grace so stale logic is explicit, testable, and not hard-coded in widgets

Phase B — Integrate Backup Health Into Primary Tenant Dashboard Surfaces

Goal: Make tenant backup posture visible on the dashboard before the operator has to open deep backup pages.

Step File Change
B.1 app/Filament/Widgets/Dashboard/DashboardKpis.php Add a backup-health stat or card that reflects the derived posture, last relevant backup timing, current reason, color tone, and one reason-driven destination
B.2 app/Filament/Widgets/Dashboard/NeedsAttention.php Add backup-health attention items for no usable backup basis, stale latest backup, degraded latest backup, and schedule follow-up
B.3 app/Filament/Widgets/Dashboard/NeedsAttention.php Add Backups are recent and healthy to the healthy-check set only when the derived assessment positively supports it and no backup-health attention item, including schedule_follow_up, remains

Phase C — Preserve Drillthrough Continuity On Backup And Schedule Surfaces

Goal: Ensure the dashboard warning or healthy claim can be rediscovered on the destination surface without guesswork.

Step File Change
C.1 app/Support/BackupHealth/TenantBackupHealthResolver.php plus app/Support/BackupHealth/BackupHealthActionTarget.php Centralize reason-driven URL selection in the existing backup-health layer so no-basis goes to backup-set index, stale or degraded latest backup goes to the relevant backup-set detail, and schedule follow-up goes to backup-schedules index
C.2 app/Filament/Resources/BackupSetResource.php Reuse or slightly harden the backup-set list and detail presentation so the index confirms no usable backup basis and the latest relevant backup-set detail clearly confirms stale or degraded posture on arrival
C.3 app/Filament/Resources/BackupScheduleResource.php Add one derived schedule-follow-up confirmation signal on the list surface so existing last_run_at, last_run_status, and next_run_at evidence remains scan-fast on arrival

Phase D — Lock Semantics With Focused Regression Coverage

Goal: Protect resolver truth, dashboard truth, continuity, and tenant safety from regression.

Step File Change
D.1 New unit tests under tests/Unit/Support/BackupHealth/ Cover no-backup, stale, degraded, healthy, schedule-follow-up, and latest-history-governs derivation
D.2 tests/Feature/Filament/DashboardKpisWidgetTest.php Extend KPI payload and URL assertions for backup-health posture and reason-driven drillthrough
D.3 tests/Feature/Filament/NeedsAttentionWidgetTest.php Extend attention and healthy-check coverage for no-backup, stale-backup, degraded-latest-backup, schedule-follow-up, and healthy-backup scenarios
D.4 tests/Feature/Filament/TenantDashboardTruthAlignmentTest.php Ensure backup-health calmness and caution align with the rest of the tenant dashboard and do not reintroduce calmness leakage
D.5 tests/Feature/Filament/BackupSetListContinuityTest.php, tests/Feature/Filament/BackupSetEnterpriseDetailPageTest.php, and tests/Feature/BackupScheduling/BackupScheduleLifecycleTest.php Prove that no-basis, stale, degraded, and schedule-follow-up drillthrough destinations confirm the same problem class the dashboard named
D.6 tests/Feature/Filament/TenantDashboardTenantScopeTest.php or a new RBAC-safe visibility test Preserve tenant-scope truth and non-member-safe behavior for dashboard summary and backup follow-up routes
D.7 vendor/bin/sail bin pint --dirty --format agent and focused Pest runs Required formatting and targeted verification before implementation is considered complete

Key Design Decisions

D-001 — Tenant backup health is derived, not stored

The product already stores the facts this slice needs: completed backup sets, backup-item quality metadata, and backup schedule timing. The missing piece is a tenant-level interpretation layer for overview truth, not a new persistence model.

D-002 — The latest relevant completed backup set governs posture

Older healthy history cannot calm the dashboard if the latest relevant completed backup is stale or degraded. This keeps the overview aligned with the operator's current recovery starting point.

D-003 — Stale and degraded remain distinct, with deterministic precedence

absent, stale, degraded, and healthy are mutually exclusive primary posture states. When the latest relevant backup is both old and degraded, stale becomes the primary posture while degradation remains visible as supporting detail rather than disappearing.

D-004 — Schedule timing is follow-up truth, not health proof

An enabled schedule can support the operator's diagnosis, but it cannot prove healthy backup posture. Overdue or never-successful schedules add schedule_follow_up; they do not substitute for a recent healthy completed backup basis. If the backup basis is otherwise healthy, posture may stay healthy, but schedule_follow_up becomes the active reason and suppresses calm confirmation until the schedule concern clears.

D-005 — Healthy wording is stricter than mere backup existence

Backups are recent and healthy is reserved for tenants whose latest relevant completed backup exists, meets the freshness window, and carries no material degradation under existing backup-quality truth. Lack of evidence must suppress calmness.

D-006 — Existing Filament seams are sufficient

The current DashboardKpis, NeedsAttention, BackupSetResource, and BackupScheduleResource surfaces already provide the right seams. This slice does not need a new page shell, a new dashboard module, or a new front-end state layer.

D-007 — Keep the claim boundary below recovery confidence

The feature can say that backups are absent, stale, degraded, or healthy as backup inputs. It cannot say that the tenant is recoverable, that restore will succeed, or that recovery posture is proven.

Risk Assessment

Risk Impact Likelihood Mitigation
Latest-basis selection drifts from operator expectation and lets older history calm the dashboard High Medium Make latest relevant completed backup selection explicit in the resolver and cover mixed-history precedence with unit tests
Dashboard calmness returns because schedule presence is treated as a proxy for health High Medium Keep schedule follow-up secondary in the resolver and test that schedules never make a tenant healthy on their own
Backup health duplicates or contradicts existing backup-quality truth High Medium Reuse BackupQualityResolver and existing degradation families rather than adding a second backup-quality mapping
Schedule drillthrough lands on a surface that does not clearly confirm the warning Medium Medium Use the schedule list as the primary follow-up destination and add one scan-fast confirmation signal if timestamps alone are insufficient
Tight stale thresholds create noise or false calmness over time Medium Medium Externalize fallback freshness and schedule grace in config and pin the semantics with unit and feature tests

Test Strategy

  • Add unit tests for the narrow backup-health resolver so latest-basis selection, stale precedence, degraded detection reuse, healthy-gate logic, and schedule-follow-up derivation remain deterministic.
  • Extend DashboardKpisWidgetTest to assert the backup-health stat label, value, description, color, and destination across absent, stale, degraded, and healthy scenarios.
  • Extend NeedsAttentionWidgetTest to assert backup-health attention items, healthy-check inclusion or suppression, and safe degraded-link behavior when appropriate.
  • Extend TenantDashboardTruthAlignmentTest so backup-health calmness or caution cannot contradict the rest of the dashboard's operator truth.
  • Extend backup-set and schedule surface tests so dashboard drillthroughs recover the same problem class on the target page.
  • Extend tenant-scope or RBAC coverage so entitled users see truthful summary state and non-members receive deny-as-not-found semantics without cross-tenant hints.
  • Keep all tests Livewire v4 compatible and run the smallest affected subset through Sail before asking for a full-suite pass.
  • Run vendor/bin/sail bin pint --dirty --format agent before final verification.

Complexity Tracking

No constitution violations or exception-driven complexity were identified. The only added structure is a narrow derived backup-health layer and a small derived posture or reason family already justified by the proportionality review.

Proportionality Review

  • Current operator problem: The tenant dashboard can look healthy while backup posture is missing, stale, or degraded, which hides a recovery-relevant truth from the operator's primary overview surface.
  • Existing structure is insufficient because: Existing backup-quality truth lives in backup-set, item, version, and restore-adjacent surfaces, but there is no tenant-level rollup that answers the dashboard question directly.
  • Narrowest correct implementation: Add one narrow derived tenant backup-health layer, wire it into the existing dashboard widgets, and reuse current backup and schedule destinations for continuity without creating new persistence or a broader recovery-confidence system.
  • Ownership cost created: A small amount of resolver logic, a small config-backed freshness policy, limited widget wiring, and focused unit and feature tests.
  • Alternative intentionally rejected: A persisted backup-health table, a workspace-wide recovery rollup, or a recovery-confidence score. Each adds broader truth and maintenance cost than the current tenant-dashboard problem requires.
  • Release truth: Current-release truth. The feature corrects a trust gap on already-shipped tenant overview surfaces.