10 KiB
Research: Workspace Recovery Posture Visibility
Decision 1: Keep the slice inside the existing workspace overview instead of creating a new portfolio recovery surface
Decision: Implement Spec 185 inside WorkspaceOverviewBuilder, WorkspaceSummaryStats, WorkspaceNeedsAttention, and the existing workspace overview Blade surface. Do not create a dedicated workspace recovery matrix page or a second portfolio posture shell.
Rationale: The operator gap is on /admin, where the current workspace landing page does not expose backup or recovery-evidence weakness across visible tenants. Extending the existing landing surface closes that gap with the smallest IA change and keeps the operator’s first scan in one place.
Alternatives considered:
- Add a dedicated workspace recovery matrix page. Rejected because the spec explicitly rules that out and because it would split first-scan triage across two pages.
- Add recovery posture to the tenant list instead of
/admin. Rejected because the workflow problem is workspace-first triage, not tenant-directory enrichment.
Decision 2: Reuse the existing tenant truth seams rather than invent a workspace recovery mapper
Decision: Reuse TenantBackupHealthResolver for backup posture and RestoreSafetyResolver::dashboardRecoveryEvidence() for recovery evidence as the authoritative truth seams for the workspace overview.
Rationale: Both tenant-level truths already exist and already carry the bounded operator language required by the spec. Reinterpreting those states inside a new workspace-only mapper would create a second truth path and make tenant and workspace semantics drift.
Alternatives considered:
- Recompute backup posture and recovery evidence inside
WorkspaceOverviewBuilderfrom raw tables only. Rejected because it would duplicate tenant logic and increase drift risk. - Persist a workspace recovery summary table. Rejected because the feature is explicitly derived-first and does not need independent lifecycle truth.
Decision 3: Add narrow batch-friendly derivation to the existing truth helpers to avoid N+1 behavior
Decision: Add narrow visible-tenant batch or prefetch support around the existing backup-health and recovery-evidence derivation seams instead of calling the current single-tenant resolver methods naively in a loop.
Rationale: TenantBackupHealthResolver::assess() currently loads the latest relevant backup set and schedules for one tenant, and RestoreSafetyResolver::dashboardRecoveryEvidence() also resolves backup health and capped restore history for one tenant. Calling both seams per visible tenant inside the workspace overview would create avoidable fanout. A narrow batch-friendly seam preserves the source-of-truth logic while keeping /admin query-bounded.
Alternatives considered:
- Call the existing single-tenant resolver methods once per visible tenant. Rejected because representative workspace render paths already have a DB-only query budget, and this approach would grow linearly in an uncontrolled way.
- Add a new generic workspace caching layer or persisted aggregate. Rejected because it introduces architecture and persistence the slice does not need.
Decision 4: Count affected visible tenants, not raw issues or raw restore runs
Decision: The new summary metrics count visible tenants with backup attention and visible tenants with recovery-evidence attention.
Rationale: The workspace operator question is “how many tenants need attention,” not “how many backup items failed” or “how many restore runs exist.” Tenant counts preserve triage value and align with how the existing governance metric already works.
Alternatives considered:
- Count raw backup degradations, schedules, or restore runs. Rejected because the counts become noisy and do not answer which tenant to open first.
- Build a single blended posture score. Rejected because the spec explicitly requires backup health and recovery evidence to stay separate.
Decision 5: Keep backup health and recovery evidence separate on metrics, attention, and calmness copy
Decision: Use one backup-attention metric, one recovery-attention metric, one backup_health attention family, and one recovery_evidence attention family. Calmness copy explicitly mentions both domains.
Rationale: Backup health and recovery evidence answer different operator questions. The product already treats them as separate truths at tenant level, and the workspace overview must preserve that distinction.
Alternatives considered:
- Show one blended “recovery posture” metric. Rejected because it hides whether the weakness is missing backup basis or weak restore evidence.
- Keep only attention items and no new metrics. Rejected because the operator also needs a quick portfolio count before drilling into items.
Decision 6: Use the tenant dashboard as the primary workspace drillthrough for new recovery and backup items
Decision: New workspace backup-health and recovery-evidence items drill into /admin/t/{tenant} as the primary landing. Summary metrics fall back to ChooseTenant when multiple visible tenants are affected and may link directly to the tenant dashboard only when exactly one visible tenant is affected.
Rationale: The tenant dashboard already surfaces backup health and recovery evidence side by side, so it preserves the flagged weakness without forcing the operator immediately into deep backup-set or restore-run pages. This matches the spec’s workspace-first triage flow and keeps drillthrough predictable under RBAC.
Alternatives considered:
- Link new workspace items directly to backup-set or restore-run surfaces. Rejected as the primary contract because it makes the workspace flow more brittle and less consistent, especially when permissions differ across downstream surfaces.
- Use only
ChooseTenantfor all new items. Rejected because the operator would lose the “open this tenant now” flow that the spec explicitly requires.
Decision 7: Keep deeper tenant backup or restore pages as secondary follow-up only
Decision: The workspace overview does not need to invent new direct reason-specific routes. Existing tenant backup-set and restore-run surfaces remain tenant-local follow-up after the operator lands on the tenant dashboard.
Rationale: The tenant dashboard already preserves why the tenant was flagged. That makes new direct workspace-to-detail routing unnecessary for this slice and avoids adding a second continuity scheme for workspace triage.
Alternatives considered:
- Add workspace-specific reason query parameters to deeper routes immediately. Rejected for now because the tenant dashboard already preserves the context and this slice does not need a new routing contract to be truthful.
- Add a new workspace recovery drawer or modal instead of navigating. Rejected because it would add a new interaction model on the landing page.
Decision 8: Extend calmness by explicit checked domains, not silent implication
Decision: Add backup_health and recovery_evidence to the workspace checked_domains contract and make calmness copy say that those domains were included.
Rationale: The main trust failure is calmness by omission. Simply changing the boolean without naming the checked domains would not let the operator distinguish “calm because checked” from “calm because ignored.”
Alternatives considered:
- Reuse the current calmness boolean with no domain changes. Rejected because it would leave the semantic gap intact.
- Mention backup and recovery in the body copy only. Rejected because the contract also needs machine-readable domain coverage for tests and future slices.
Decision 9: Insert new recovery and backup items above activity-only signals while preserving current governance-first intent
Decision: Add backup-health and recovery-evidence items above operations and alerts in workspace attention ordering, but do not let them unintentionally erase the existing governance-first priority scheme.
Rationale: Recovery and backup are now triage-relevant workspace domains, but the current workspace overview already uses governance as a high-signal priority family. The narrowest safe change is to raise backup and recovery above activity-only signals while keeping the current governance-critical ordering semantics intact.
Alternatives considered:
- Put all new backup or recovery items at the very top of the queue. Rejected because it could unintentionally demote stronger existing governance blockers.
- Put all new items below operations and alerts. Rejected because the slice would not solve the portfolio-triage gap.
Decision 10: Reuse existing workspace overview tests and DB-only guards instead of introducing a new harness
Decision: Extend the existing workspace overview test pack and DB-only render guard, and keep the upstream tenant recovery performance guard in scope to protect the source truths that workspace aggregation consumes.
Rationale: The repo already has focused tests for summary metrics, drillthrough continuity, permission visibility, calmness, and DB-only rendering. Extending those seams keeps coverage close to the business consequences and avoids building a second test framework for the same landing page.
Alternatives considered:
- Add only manual smoke checks. Rejected because the repo requires programmatic coverage and the slice changes multiple derived contracts.
- Build a browser-only suite for the workspace overview. Rejected because the current workspace overview behavior is already well-covered with server-side and Livewire-style tests.
Decision 11: No new assets, provider registration, or global-search changes
Decision: Keep the slice inside existing Filament pages, widgets, and views with no new assets, no provider changes, and no new global-search behavior.
Rationale: The feature is about visibility and triage semantics, not new frontend infrastructure or discovery surfaces. Existing deployment and panel registration rules remain unchanged.
Alternatives considered:
- Add custom assets or a new panel widget type. Rejected because existing Filament widgets already cover the required UI.
- Expand global search to expose workspace recovery posture. Rejected because the spec does not require it and because search is not the first-scan triage surface.