## Summary - add the Spec 180 tenant backup-health resolver and value objects to derive absent, stale, degraded, healthy, and schedule-follow-up posture from existing backup and schedule truth - surface backup posture and reason-driven drillthroughs in the tenant dashboard and preserve continuity on backup-set and backup-schedule destinations - add deterministic local/testing browser-fixture seeding plus a local fixture-login helper for the blocked drillthrough `403` scenario, along with the related spec artifacts and focused regression coverage ## Testing - `vendor/bin/sail artisan test --compact tests/Feature/Auth/BackupHealthBrowserFixtureLoginTest.php tests/Feature/Console/TenantpilotSeedBackupHealthBrowserFixtureCommandTest.php` - `vendor/bin/sail artisan test --compact tests/Unit/Support/BackupHealth/TenantBackupHealthResolverTest.php tests/Feature/Filament/DashboardKpisWidgetTest.php tests/Feature/Filament/NeedsAttentionWidgetTest.php tests/Feature/Filament/TenantDashboardTruthAlignmentTest.php tests/Feature/Filament/TenantDashboardTenantScopeTest.php tests/Feature/Filament/TenantDashboardDbOnlyTest.php tests/Feature/Filament/BackupSetListContinuityTest.php tests/Feature/Filament/BackupSetEnterpriseDetailPageTest.php tests/Feature/BackupScheduling/BackupScheduleLifecycleTest.php tests/Feature/Auth/BackupHealthBrowserFixtureLoginTest.php tests/Feature/Console/TenantpilotSeedBackupHealthBrowserFixtureCommandTest.php` ## Notes - Filament v5 / Livewire v4 compliant; no panel-provider change was needed, so `bootstrap/providers.php` remains unchanged - no new globally searchable resource was introduced, so global-search behavior is unchanged - no new destructive action was added; existing destructive actions and confirmation behavior remain unchanged - no new asset registration was added; the existing deploy-time `php artisan filament:assets` step remains sufficient - the local fixture login helper route is limited to `local` and `testing` environments - the focused and broader Spec 180 packs are green; the full suite was not rerun after these changes Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #212
74 lines
7.7 KiB
Markdown
74 lines
7.7 KiB
Markdown
# Research: Tenant Backup Health Signals
|
|
|
|
## Decision 1: Derive tenant backup health from existing backup and schedule truth instead of creating a persisted health model
|
|
|
|
- Decision: Build tenant backup health from existing `BackupSet`, `BackupItem`, `BackupSchedule`, and `BackupQualityResolver` facts at render time. Do not add a `tenant_backup_health` table, a cached rollup row, or a recovery-confidence ledger.
|
|
- Rationale: The repository already stores the facts this feature needs: completed backup timestamps, backup-quality degradations, and schedule timing. The product gap is missing tenant-level overview truth, not missing persistence.
|
|
- Alternatives considered:
|
|
- Persist a backup-health row per tenant. Rejected because it would create a second source of truth for data that is already derivable.
|
|
- Piggyback on `Tenant` model columns. Rejected because this slice does not need a new lifecycle-bearing tenant field.
|
|
|
|
## Decision 2: Let the latest relevant completed backup set govern posture
|
|
|
|
- Decision: The latest relevant completed backup set is the primary tenant backup-health basis. Older healthier backup history cannot override a newer stale or degraded latest basis.
|
|
- Rationale: The operator's first question is about the current recovery starting point, not whether an older good snapshot exists somewhere in history.
|
|
- Alternatives considered:
|
|
- Choose the healthiest recent backup in the tenant. Rejected because it would calm the dashboard while the most recent relevant backup is weak.
|
|
- Aggregate all backup history into a composite score. Rejected because the feature explicitly avoids a scoring engine.
|
|
|
|
## Decision 3: Reuse existing backup-quality truth instead of introducing a second degradation system
|
|
|
|
- Decision: Material degradation for tenant backup health is derived from existing `BackupQualitySummary` output, especially degraded item counts and existing degradation families. No new competing backup-quality taxonomy is introduced.
|
|
- Rationale: Backup-quality truth was already hardened in the dedicated backup-quality work. Re-deriving the same concepts differently at tenant level would create contradiction and extra maintenance.
|
|
- Alternatives considered:
|
|
- Add a tenant-specific degradation matrix. Rejected because it would drift from backup-set and item detail truth.
|
|
- Use only raw backup-item metadata in the dashboard resolver. Rejected because the shared quality resolver already exists and should remain authoritative.
|
|
|
|
## Decision 4: Define one config-backed freshness window for backup posture and keep schedule timing secondary
|
|
|
|
- Decision: Backup posture freshness is evaluated against the latest relevant completed backup set using one config-backed canonical freshness window in `config/tenantpilot.php`, initially aligned with the repo's existing 24-hour freshness posture for other safety-critical truth. Enabled schedule timing can add follow-up pressure but cannot replace or override that single freshness rule.
|
|
- Rationale: There is no current backup-health freshness rule in the codebase. Hard-coding a threshold inside widgets would be brittle, while a small config value keeps the semantics explicit and testable.
|
|
- Alternatives considered:
|
|
- Make freshness entirely schedule-driven. Rejected because schedule state is secondary in the spec and cannot replace actual backup existence.
|
|
- Hard-code a stale threshold inside the dashboard widget. Rejected because it would hide an important product rule in presentation code.
|
|
|
|
## Decision 5: Detect schedule follow-up from enabled schedules that look overdue or never-successful beyond a grace window
|
|
|
|
- Decision: `schedule_follow_up` is derived from enabled schedules whose `next_run_at` is overdue beyond a small grace window, whose `last_run_at` is missing after they should have started producing evidence, or whose last schedule status indicates a failed or non-successful recent run. If the latest backup basis is otherwise fresh and non-degraded, posture may remain `healthy`, but `schedule_follow_up` becomes the active reason and suppresses any positive healthy confirmation.
|
|
- Rationale: `BackupSchedule` already tracks `last_run_at`, `last_run_status`, and `next_run_at`. Those fields are enough to signal that automation needs attention without conflating it with actual backup health proof.
|
|
- Alternatives considered:
|
|
- Ignore schedule state entirely. Rejected because the spec explicitly wants schedule follow-up represented.
|
|
- Treat any enabled schedule as positive backup evidence. Rejected because schedule existence is not backup proof.
|
|
|
|
## Decision 6: Surface backup health through the existing dashboard widgets, not a new page or module
|
|
|
|
- Decision: Add backup health to the current `DashboardKpis` and `NeedsAttention` widgets and extend the existing healthy-check pattern instead of building a dedicated dashboard module or alternate tenant overview page.
|
|
- Rationale: The tenant dashboard is already the operator's primary overview surface, and the spec is explicitly a small hardening slice rather than a new product area.
|
|
- Alternatives considered:
|
|
- Build a standalone backup-health page. Rejected because it does not solve the tenant-dashboard truth gap.
|
|
- Defer backup health until a full recovery-confidence initiative. Rejected because the current dashboard already needs a narrower truth fix.
|
|
|
|
## Decision 7: Keep drillthrough reason-driven and use the least surprising existing surface for each reason
|
|
|
|
- Decision: Use reason-driven drillthroughs. `no_backup_basis` opens the backup-set list, `latest_backup_stale` and `latest_backup_degraded` open the latest relevant backup-set detail, and `schedule_follow_up` opens the backup-schedules list as the primary confirmation surface.
|
|
- Rationale: These destinations already exist and are tenant-scoped. The backup-set detail can confirm stale or degraded latest-basis truth through recency and backup-quality summary. The schedule list is safer than an edit screen as the first follow-up destination and already shows last-run and next-run timing.
|
|
- Alternatives considered:
|
|
- Send every backup-health state to the backup-set list. Rejected because it weakens reason continuity for degraded latest-backup scenarios.
|
|
- Send schedule follow-up directly to edit pages. Rejected because the first operator need is confirmation, not mutation.
|
|
|
|
## Decision 8: Add only the minimum continuity hardening needed on target surfaces
|
|
|
|
- Decision: Reuse the current backup-set detail and list surfaces for stale or degraded continuity and add one minimal schedule-follow-up confirmation signal on the backup-schedules list so the current timestamp columns remain scan-fast enough by themselves.
|
|
- Rationale: Spec 176 already moved backup quality into backup surfaces. This slice should not rebuild those surfaces again; it should only ensure the dashboard reason can be rediscovered quickly.
|
|
- Alternatives considered:
|
|
- Add a banner framework or page-level explanation system. Rejected because it would overgrow the slice.
|
|
- Leave continuity entirely to raw timestamps. Rejected because schedule follow-up may be too implicit at scan speed.
|
|
|
|
## Decision 9: Extend existing widget and dashboard truth tests before introducing new test harnesses
|
|
|
|
- Decision: Extend `DashboardKpisWidgetTest`, `NeedsAttentionWidgetTest`, `TenantDashboardTruthAlignmentTest`, and existing backup or schedule feature tests, and add one narrow unit test file for the resolver.
|
|
- Rationale: The affected behavior is server-driven widget and resource truth, which the current Pest and Livewire suite already covers well.
|
|
- Alternatives considered:
|
|
- Rely only on manual validation. Rejected because the feature is specifically about preventing subtle trust regressions.
|
|
- Add a large browser-only pack. Rejected because the highest-value assertions are deterministic server-side state and rendered truth.
|