Ahmed Darrazi 6e76b333ca feat: add tenant backup health signals

2026-04-07 23:33:38 +02:00

24 KiB

Raw Blame History

Implementation Plan: Tenant Backup Health Signals

Branch: 180-tenant-backup-health | Date: 2026-04-07 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/spec.md

Summary

Harden the tenant dashboard so operators can tell within seconds whether a tenant has no usable backup basis, a stale latest backup basis, a degraded latest backup basis, or a healthy recent backup basis without opening deep backup surfaces first. The implementation keeps BackupSet, BackupItem, BackupSchedule, and the existing backup-quality layer as the only sources of truth, introduces one narrow derived tenant backup-health resolver over those records, adds a config-backed freshness policy with schedule follow-up semantics, integrates the result into DashboardKpis and NeedsAttention, and preserves reason-driven drillthrough into existing backup-set and backup-schedule surfaces without adding a new persistence model or recovery-confidence framework.

Key approach: work inside the existing TenantDashboard, DashboardKpis, NeedsAttention, BackupSetResource, BackupScheduleResource, and BackupQualityResolver seams; derive tenant posture from the latest relevant completed backup set plus existing backup-quality truth and enabled-schedule timing; keep the feature Filament v5 and Livewire v4 compliant; avoid new tables, Graph calls, jobs, or asset registration; validate the result with focused Pest, Livewire, truth-alignment, and RBAC coverage.

Technical Context

Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing DashboardKpis, NeedsAttention, BackupSetResource, BackupScheduleResource, BackupQualityResolver, BackupQualitySummary, ScheduleTimeService, shared badge infrastructure, and existing RBAC helpers
Storage: PostgreSQL with existing tenant-owned backup_sets, backup_items, and backup_schedules records plus existing JSON-backed backup metadata; no schema change planned
Testing: Pest feature tests, Livewire widget and resource tests, and unit tests for the narrow backup-health derivation layer, all run through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production
Project Type: Laravel monolith web application
Performance Goals: Keep tenant-dashboard rendering DB-only and query-bounded, avoid new N+1 query hotspots while deriving the latest relevant backup basis, and preserve 5 to 10 second operator scanability on tenant dashboard and drillthrough destinations
Constraints: No new backup-health table, no recovery-confidence score, no new Graph contract path, no new queue or OperationRun, no RBAC drift, no calmness leakage beyond evidence, no ad-hoc badge mappings, and no new global Filament assets
Scale/Scope: One tenant-scoped dashboard composition, two existing dashboard widgets, one narrow derived backup-health layer, optional config additions in config/tenantpilot.php, and focused regression coverage across resolver, widget, drillthrough, and RBAC behavior

Constitution Check

GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.

Principle	Status	Notes
Inventory-first	Pass	Backups remain immutable snapshot truth; the feature only summarizes existing backup and schedule state on read
Read/write separation	Pass	This is a read-first dashboard hardening slice; existing backup and schedule mutations remain unchanged and separately confirmed or audited
Graph contract path	Pass	No new Microsoft Graph calls or contract-registry changes are introduced
Deterministic capabilities	Pass	Existing capability registry and tenant-scoped authorization remain authoritative; no raw capability strings are introduced
RBAC-UX planes and 404 vs 403	Pass	The feature stays in the tenant/admin plane under `/admin/t/{tenant}/...`; non-members remain `404`, and existing in-scope authorization stays server-side
Workspace isolation	Pass	No workspace-scope broadening or cross-workspace aggregation is added
Tenant isolation	Pass	Backup sets, backup items, schedules, and dashboard summaries stay tenant-owned and tenant-scoped
Dangerous and destructive confirmations	Pass	No new destructive action is introduced. Existing backup and schedule destructive actions remain `->requiresConfirmation()` and capability-gated
Global search safety	Pass	No new globally searchable resource is introduced or changed. `BackupSetResource` already has a view page, `BackupScheduleResource` already has an edit page, and global search configuration remains unchanged
Run observability	Pass	No new long-running work or `OperationRun` usage is introduced
Ops-UX 3-surface feedback	Pass	No new queued action or run feedback surface is added
Ops-UX lifecycle ownership	Pass	`OperationRun.status` and `OperationRun.outcome` are untouched
Ops-UX summary counts	Pass	No new `summary_counts` keys are required
Data minimization	Pass	The feature reuses existing metadata and timestamps only; no new secret or payload exposure is planned
Proportionality (PROP-001)	Pass	Added logic is limited to one narrow tenant backup-health layer plus config-backed freshness semantics
Persisted truth (PERSIST-001)	Pass	No new table, column, or stored backup-health mirror is introduced
Behavioral state (STATE-001)	Pass	New posture and reason families are derived only because they change operator guidance and dashboard calmness behavior
Badge semantics (BADGE-001)	Pass	Existing badge and tag infrastructure remains the semantic source; any new backup-health tone stays inside shared UI primitives rather than local mappings
Filament-native UI (UI-FIL-001)	Pass	Existing Filament widgets, stats, tables, and shared primitives remain the implementation seams
UI naming (UI-NAMING-001)	Pass	Operator-facing vocabulary stays bounded to backup health, last backup, stale, degraded, no backups, and schedule follow-up, without `recoverable` or `proven` claims
Operator surfaces (OPSURF-001)	Pass	Default-visible tenant-dashboard content becomes more operator-first by exposing backup posture before deep diagnostics
Filament Action Surface Contract	Pass	`BackupSetResource` and `BackupScheduleResource` keep existing inspect models and destructive placement; `TenantDashboard` remains under the current dashboard exemption
Filament UX-001	Pass with documented variance	No new create or edit screen is added. Existing backup-set and backup-schedule resources remain the canonical follow-up surfaces, with summary-first truth added where needed
Filament v5 / Livewire v4 compliance	Pass	The implementation stays inside the current Filament v5 and Livewire v4 stack
Provider registration location	Pass	No panel or provider changes are planned; Laravel 11+ provider registration remains in `bootstrap/providers.php`
Asset strategy	Pass	No new panel assets are planned; deployment keeps the existing `php artisan filament:assets` step unchanged

Phase 0 Research

Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/research.md.

Key decisions:

Derive tenant backup health from existing BackupSet, BackupItem, BackupSchedule, and BackupQualityResolver truth instead of introducing persisted backup-health state.
Let the latest relevant completed backup set govern tenant posture rather than allowing older healthier history to calm the dashboard.
Reuse existing backup-quality summaries for degradation truth and add no competing backup-quality taxonomy.
Define backup freshness through one config-backed fallback window on the latest relevant completed backup set, while treating schedule timing as a secondary follow-up signal rather than health proof.
Derive schedule follow-up from enabled schedules whose current next_run_at or last_run_at semantics indicate missed or overdue execution beyond a small grace window.
Integrate backup health into the existing DashboardKpis and NeedsAttention widgets and keep healthy wording suppressed unless the backing evidence is fully supportive.
Route dashboard drillthroughs by problem class: no usable backup basis opens the backup-set list, stale or degraded latest backup opens the latest relevant backup-set detail, and schedule follow-up opens the backup-schedules list.
Extend the current widget, truth-alignment, backup-set, schedule, and tenant-scope Pest coverage instead of creating a browser-first harness.

Phase 1 Design

Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/180-tenant-backup-health/:

research.md: implementation and domain decisions for tenant backup-health derivation
data-model.md: existing entities, config inputs, and derived backup-health models
contracts/tenant-backup-health.openapi.yaml: internal logical contract for dashboard summary, backup-set confirmation, and schedule follow-up surfaces
quickstart.md: focused automated and manual validation workflow for tenant backup-health signals

Design decisions:

No schema migration is required. The design adds only a narrow derived resolver layer and a small config section in config/tenantpilot.php for backup-health freshness semantics.
Tenant backup health is derived at render time from the latest relevant completed backup set, existing BackupQualitySummary, and enabled-schedule timing. No new Tenant field, cache table, or materialized rollup is planned.
Stale versus degraded precedence is deterministic: absent outranks everything, stale outranks degraded, degraded outranks healthy, and schedule_follow_up remains a secondary reason family. When the latest backup basis is fresh and non-degraded, posture may remain healthy, but schedule_follow_up becomes the active reason and suppresses any positive healthy confirmation until resolved.
DashboardKpis owns the primary backup-health stat or card, while NeedsAttention owns reason-specific backup follow-up items and the positive healthy backup check.
Backup-set detail remains the confirmation surface for stale and degraded latest-backup posture by combining recency and existing backup-quality summary. Backup-schedules list remains the confirmation surface for schedule-follow-up posture and must foreground one derived follow-up indicator so the missed-run or overdue reason stays scan-fast.
The feature stays Filament v5 and Livewire v4 compliant, introduces no new panel provider, and requires no new asset registration.

Project Structure

Documentation (this feature)

specs/180-tenant-backup-health/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── tenant-backup-health.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md

Source Code (repository root, including planned additions for this feature)

app/
├── Filament/
│   ├── Pages/
│   │   └── TenantDashboard.php
│   ├── Resources/
│   │   ├── BackupScheduleResource.php
│   │   └── BackupSetResource.php
│   └── Widgets/
│       └── Dashboard/
│           ├── DashboardKpis.php
│           └── NeedsAttention.php
├── Models/
│   ├── BackupItem.php
│   ├── BackupSchedule.php
│   ├── BackupSet.php
│   └── Tenant.php
├── Support/
│   ├── BackupHealth/
│   │   ├── TenantBackupHealthAssessment.php
│   │   ├── BackupFreshnessEvaluation.php
│   │   ├── BackupScheduleFollowUpEvaluation.php
│   │   ├── BackupHealthActionTarget.php
│   │   ├── BackupHealthDashboardSignal.php
│   │   └── TenantBackupHealthResolver.php
│   ├── BackupQuality/
│   │   ├── BackupQualityResolver.php
│   │   └── BackupQualitySummary.php
│   └── Badges/
│       └── [existing shared badge seams only if new backup-health tone mapping is needed]

config/
└── tenantpilot.php

tests/
├── Feature/
│   ├── BackupScheduling/
│   │   └── BackupScheduleLifecycleTest.php
│   └── Filament/
│       ├── BackupSetListContinuityTest.php
│       ├── BackupSetEnterpriseDetailPageTest.php
│       ├── DashboardKpisWidgetTest.php
│       ├── NeedsAttentionWidgetTest.php
│       ├── TenantDashboardDbOnlyTest.php
│       ├── TenantDashboardTenantScopeTest.php
│       └── TenantDashboardTruthAlignmentTest.php
└── Unit/
    └── Support/
        └── BackupHealth/
            └── TenantBackupHealthResolverTest.php

Structure Decision: Standard Laravel monolith. The implementation stays inside existing dashboard widgets, backup resources, shared support helpers, and current test structure. Any new helper types and lightweight dashboard-facing value objects live under app/Support/BackupHealth/ as a narrow derived layer shared by the dashboard and drillthrough logic.

Implementation Strategy

Phase A — Introduce Narrow Tenant Backup-Health Derivation

Goal: Create one derived path that can answer absent, stale, degraded, or healthy from existing backup and schedule truth without introducing new persistence.

Step	File	Change
A.1	New narrow helper(s) under `app/Support/BackupHealth/`	Introduce `TenantBackupHealthResolver` plus lightweight `TenantBackupHealthAssessment`, `BackupFreshnessEvaluation`, `BackupScheduleFollowUpEvaluation`, `BackupHealthActionTarget`, and `BackupHealthDashboardSignal` value objects that derive the latest relevant completed backup basis, posture, primary reason, supporting message, drillthrough target, and healthy-claim boundary with query-bounded latest-basis loading
A.2	`app/Support/BackupQuality/BackupQualityResolver.php` plus the new backup-health layer	Explicitly reuse `BackupQualityResolver` and `BackupQualitySummary` output to classify material degradation instead of creating a second backup-quality system
A.3	`config/tenantpilot.php`	Add a small `backup_health` config section for canonical freshness hours and schedule overdue grace so stale logic is explicit, testable, and not hard-coded in widgets

Phase B — Integrate Backup Health Into Primary Tenant Dashboard Surfaces

Goal: Make tenant backup posture visible on the dashboard before the operator has to open deep backup pages.

Step	File	Change
B.1	`app/Filament/Widgets/Dashboard/DashboardKpis.php`	Add a backup-health stat or card that reflects the derived posture, last relevant backup timing, current reason, color tone, and one reason-driven destination
B.2	`app/Filament/Widgets/Dashboard/NeedsAttention.php`	Add backup-health attention items for no usable backup basis, stale latest backup, degraded latest backup, and schedule follow-up
B.3	`app/Filament/Widgets/Dashboard/NeedsAttention.php`	Add `Backups are recent and healthy` to the healthy-check set only when the derived assessment positively supports it and no backup-health attention item, including `schedule_follow_up`, remains

Phase C — Preserve Drillthrough Continuity On Backup And Schedule Surfaces

Goal: Ensure the dashboard warning or healthy claim can be rediscovered on the destination surface without guesswork.

Step	File	Change
C.1	`app/Support/BackupHealth/TenantBackupHealthResolver.php` plus `app/Support/BackupHealth/BackupHealthActionTarget.php`	Centralize reason-driven URL selection in the existing backup-health layer so no-basis goes to backup-set index, stale or degraded latest backup goes to the relevant backup-set detail, and schedule follow-up goes to backup-schedules index
C.2	`app/Filament/Resources/BackupSetResource.php`	Reuse or slightly harden the backup-set list and detail presentation so the index confirms no usable backup basis and the latest relevant backup-set detail clearly confirms stale or degraded posture on arrival
C.3	`app/Filament/Resources/BackupScheduleResource.php`	Add one derived schedule-follow-up confirmation signal on the list surface so existing `last_run_at`, `last_run_status`, and `next_run_at` evidence remains scan-fast on arrival

Phase D — Lock Semantics With Focused Regression Coverage

Goal: Protect resolver truth, dashboard truth, continuity, and tenant safety from regression.

Step	File	Change
D.1	New unit tests under `tests/Unit/Support/BackupHealth/`	Cover no-backup, stale, degraded, healthy, schedule-follow-up, and latest-history-governs derivation
D.2	`tests/Feature/Filament/DashboardKpisWidgetTest.php`	Extend KPI payload and URL assertions for backup-health posture and reason-driven drillthrough
D.3	`tests/Feature/Filament/NeedsAttentionWidgetTest.php`	Extend attention and healthy-check coverage for no-backup, stale-backup, degraded-latest-backup, schedule-follow-up, and healthy-backup scenarios
D.4	`tests/Feature/Filament/TenantDashboardTruthAlignmentTest.php`	Ensure backup-health calmness and caution align with the rest of the tenant dashboard and do not reintroduce calmness leakage
D.5	`tests/Feature/Filament/BackupSetListContinuityTest.php`, `tests/Feature/Filament/BackupSetEnterpriseDetailPageTest.php`, and `tests/Feature/BackupScheduling/BackupScheduleLifecycleTest.php`	Prove that no-basis, stale, degraded, and schedule-follow-up drillthrough destinations confirm the same problem class the dashboard named
D.6	`tests/Feature/Filament/TenantDashboardTenantScopeTest.php` or a new RBAC-safe visibility test	Preserve tenant-scope truth and non-member-safe behavior for dashboard summary and backup follow-up routes
D.7	`vendor/bin/sail bin pint --dirty --format agent` and focused Pest runs	Required formatting and targeted verification before implementation is considered complete

Key Design Decisions

D-001 — Tenant backup health is derived, not stored

The product already stores the facts this slice needs: completed backup sets, backup-item quality metadata, and backup schedule timing. The missing piece is a tenant-level interpretation layer for overview truth, not a new persistence model.

D-002 — The latest relevant completed backup set governs posture

Older healthy history cannot calm the dashboard if the latest relevant completed backup is stale or degraded. This keeps the overview aligned with the operator's current recovery starting point.

D-003 — Stale and degraded remain distinct, with deterministic precedence

absent, stale, degraded, and healthy are mutually exclusive primary posture states. When the latest relevant backup is both old and degraded, stale becomes the primary posture while degradation remains visible as supporting detail rather than disappearing.

D-004 — Schedule timing is follow-up truth, not health proof

An enabled schedule can support the operator's diagnosis, but it cannot prove healthy backup posture. Overdue or never-successful schedules add schedule_follow_up; they do not substitute for a recent healthy completed backup basis. If the backup basis is otherwise healthy, posture may stay healthy, but schedule_follow_up becomes the active reason and suppresses calm confirmation until the schedule concern clears.

D-005 — Healthy wording is stricter than mere backup existence

Backups are recent and healthy is reserved for tenants whose latest relevant completed backup exists, meets the freshness window, and carries no material degradation under existing backup-quality truth. Lack of evidence must suppress calmness.

D-006 — Existing Filament seams are sufficient

The current DashboardKpis, NeedsAttention, BackupSetResource, and BackupScheduleResource surfaces already provide the right seams. This slice does not need a new page shell, a new dashboard module, or a new front-end state layer.

D-007 — Keep the claim boundary below recovery confidence

The feature can say that backups are absent, stale, degraded, or healthy as backup inputs. It cannot say that the tenant is recoverable, that restore will succeed, or that recovery posture is proven.

Risk Assessment

Risk	Impact	Likelihood	Mitigation
Latest-basis selection drifts from operator expectation and lets older history calm the dashboard	High	Medium	Make latest relevant completed backup selection explicit in the resolver and cover mixed-history precedence with unit tests
Dashboard calmness returns because schedule presence is treated as a proxy for health	High	Medium	Keep schedule follow-up secondary in the resolver and test that schedules never make a tenant healthy on their own
Backup health duplicates or contradicts existing backup-quality truth	High	Medium	Reuse `BackupQualityResolver` and existing degradation families rather than adding a second backup-quality mapping
Schedule drillthrough lands on a surface that does not clearly confirm the warning	Medium	Medium	Use the schedule list as the primary follow-up destination and add one scan-fast confirmation signal if timestamps alone are insufficient
Tight stale thresholds create noise or false calmness over time	Medium	Medium	Externalize fallback freshness and schedule grace in config and pin the semantics with unit and feature tests

Test Strategy

Add unit tests for the narrow backup-health resolver so latest-basis selection, stale precedence, degraded detection reuse, healthy-gate logic, and schedule-follow-up derivation remain deterministic.
Extend DashboardKpisWidgetTest to assert the backup-health stat label, value, description, color, and destination across absent, stale, degraded, and healthy scenarios.
Extend NeedsAttentionWidgetTest to assert backup-health attention items, healthy-check inclusion or suppression, and safe degraded-link behavior when appropriate.
Extend TenantDashboardTruthAlignmentTest so backup-health calmness or caution cannot contradict the rest of the dashboard's operator truth.
Extend backup-set and schedule surface tests so dashboard drillthroughs recover the same problem class on the target page.
Extend tenant-scope or RBAC coverage so entitled users see truthful summary state and non-members receive deny-as-not-found semantics without cross-tenant hints.
Keep all tests Livewire v4 compatible and run the smallest affected subset through Sail before asking for a full-suite pass.
Run vendor/bin/sail bin pint --dirty --format agent before final verification.

Complexity Tracking

No constitution violations or exception-driven complexity were identified. The only added structure is a narrow derived backup-health layer and a small derived posture or reason family already justified by the proportionality review.

Proportionality Review

Current operator problem: The tenant dashboard can look healthy while backup posture is missing, stale, or degraded, which hides a recovery-relevant truth from the operator's primary overview surface.
Existing structure is insufficient because: Existing backup-quality truth lives in backup-set, item, version, and restore-adjacent surfaces, but there is no tenant-level rollup that answers the dashboard question directly.
Narrowest correct implementation: Add one narrow derived tenant backup-health layer, wire it into the existing dashboard widgets, and reuse current backup and schedule destinations for continuity without creating new persistence or a broader recovery-confidence system.
Ownership cost created: A small amount of resolver logic, a small config-backed freshness policy, limited widget wiring, and focused unit and feature tests.
Alternative intentionally rejected: A persisted backup-health table, a workspace-wide recovery rollup, or a recovery-confidence score. Each adds broader truth and maintenance cost than the current tenant-dashboard problem requires.
Release truth: Current-release truth. The feature corrects a trust gap on already-shipped tenant overview surfaces.

24 KiB Raw Blame History