## Summary
Implements and polishes the Platform Ops Runbooks feature (Spec 113) — the operator control plane for safe backfills and data repair from `/system`.
## Changes
### UX Polish (Phase 7 — US4)
- **Filament-native components**: Rewrote `runbooks.blade.php` and `view-run.blade.php` using `<x-filament::section>` instead of raw Tailwind div cards. Cards now render correctly with Filament's built-in borders, shadows and dark mode.
- **System panel theme**: Created `resources/css/filament/system/theme.css` and registered `->viteTheme()` on `SystemPanelProvider`. The system panel previously had no theme CSS registered — Tailwind utility classes weren't compiled for its views, causing the warning icon SVG to expand to full container size.
- **Live scope selector**: Added `->live()` to the scope `Radio` field so "Single tenant" immediately reveals the tenant search dropdown without requiring a Submit first.
### Core Feature (Phases 1–6, previously shipped)
- `/system/ops/runbooks` — runbook catalog, preflight, run with typed confirmation + reason
- `/system/ops/runs` — run history table with status/outcome badges
- `/system/ops/runs/{id}` — run detail view with summary counts, failures, collapsible context
- `FindingsLifecycleBackfillRunbookService` — preflight + execution logic
- AllowedTenantUniverse — scopes tenant picker to non-platform tenants only
- RBAC: `platform.ops.view`, `platform.runbooks.view`, `platform.runbooks.run`, `platform.runbooks.findings.lifecycle_backfill`
- Rate-limited `/system/login` (10/min per IP+username)
- Distinct session cookie for `/system` isolation
## Test Coverage
- 16 tests / 141 assertions — all passing
- Covers: page access, RBAC, preflight, run dispatch, scope selector, run detail, run list
## Checklist
- [x] Filament v5 / Livewire v4 compliant
- [x] Provider registered in `bootstrap/providers.php`
- [x] Destructive actions require confirmation (`->requiresConfirmation()`)
- [x] System panel theme registered (`viteTheme`)
- [x] Pint clean
- [x] Tests pass
Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #137
5.6 KiB
5.6 KiB
Research — Spec 113: Platform Ops Runbooks
This file resolves the design unknowns required to produce an implementation plan that fits the existing TenantAtlas codebase.
Decisions
1) Reuse existing backfill pipeline (Command + Job) via a single service
- Decision: Extract a single “runbook service” that is called from:
/systemrunbook UI (preflight + start)- CLI command (
tenantpilot:findings:backfill-lifecycle) - deploy-time hook
- Rationale: The repo already contains a correct tenant-scoped implementation:
- Command:
app/Console/Commands/TenantpilotBackfillFindingLifecycle.php - Job:
app/Jobs/BackfillFindingLifecycleJob.php - It uses
OperationRunServicefor lifecycle transitions and idempotency, and a cache lock per tenant.
- Command:
- Alternatives considered:
- Build a new pipeline from scratch → rejected as it duplicates proven behavior and increases drift risk.
2) “All tenants” scope uses a single workspace run updated by many tenant jobs
- Decision: Implement All-tenants as:
- one workspace-scoped
OperationRun(tenant_id = null) created withOperationRunService::ensureWorkspaceRunWithIdentity() - fan-out to many queued tenant jobs that all increment the same workspace run’s
summary_countsand contribute failures - completion via
OperationRunService::maybeCompleteBulkRun()whenprocessed >= total(same pattern as workspace backfills)
- one workspace-scoped
- Rationale:
- This matches an existing proven pattern in the repo (
tenantpilot:backfill-workspace-ids+BackfillWorkspaceIdsJob). - It yields a single “View run” target with meaningful progress, without needing parent/child run stitching.
- Tenant isolation remains intact because each job still operates tenant-scoped and holds the existing per-tenant lock.
- This matches an existing proven pattern in the repo (
- Alternatives considered:
- Separate per-tenant
OperationRunrecords + an umbrella run → rejected for v1 due to added coordination complexity.
- Separate per-tenant
3) Workspace scope for /system runbooks (v1)
- Decision: v1 targets the default workspace (same workspace that owns the
platformTenant created byPlatformUserSeeder). - Rationale:
- Platform identity currently has no explicit workspace selector in the System panel.
- Existing seeder creates
Workspace(slug=default)and aTenant(external_id=platform)inside it.
- Alternatives considered:
- Multi-workspace operator selection in
/system→ deferred (not in spec, requires new UX + entitlement model).
- Multi-workspace operator selection in
4) Remove/disable /admin maintenance action (FR-001)
- Decision: Remove or feature-flag off the existing
/adminheader action “Backfill findings lifecycle” currently present inapp/Filament/Resources/FindingResource/Pages/ListFindings.php. - Rationale: Spec explicitly forbids customer-plane exposure in production-like environments.
- Alternatives considered:
- Keep the action but hide visually → rejected; it still exists as an affordance and is easy to re-enable by accident.
5) Session isolation for /system (SR-004)
- Decision: Add a System-panel-only middleware that sets a dedicated session cookie name for
/system/*beforeStartSessionruns. - Rationale:
- SystemPanelProvider defines its own middleware list; we can insert a middleware at the top.
- Changing
config(['session.cookie' => ...])per request is sufficient for cookie separation without introducing a new domain.
- Alternatives considered:
- Separate subdomain → deferred (explicitly “later”).
6) /system/login rate limiting (SR-003)
- Decision: Implement rate limiting inside
app/Filament/System/Pages/Auth/Login.php(overrideauthenticate()) using a combined key:ip + normalized(email)at 10/min. - Rationale:
- The System login already overrides
authenticate()to add auditing. - Implementing rate limiting here keeps the policy tightly scoped to the System login surface.
- The System login already overrides
- Alternatives considered:
- Global route middleware throttle → possible, but harder to scope precisely to this Filament auth page.
7) 404 vs 403 semantics for platform capability checks (SR-002)
- Decision: Keep cross-plane denial as 404 (existing
EnsureCorrectGuard), but missing platform capability should return 403. - Rationale:
- Spec requires: wrong plane → 404; platform lacking capability → 403.
- Current
EnsurePlatformCapabilityaborts(404), which conflicts with spec.
- Alternatives considered:
- Return 404 for missing platform capability → rejected because it contradicts the agreed spec.
8) Failure notifications (FR-009)
- Decision: On run failure, emit:
- the canonical terminal DB notification (
OperationRunCompleted) to the initiating platform operator (in-app) - an Alerts event (Teams / Email) if alert routing is configured
- the canonical terminal DB notification (
- Rationale:
- Alerts system already exists (
AlertDispatchService+ queued deliveries). It can route to Teams webhook / Email. OperationRunCompletedalready formats the correct persistent DB notification payload viaOperationUxPresenter.
- Alerts system already exists (
- Alternatives considered:
- Send Teams webhook directly from job → rejected; bypasses alert rules/cooldowns/quiet hours.
Notes for implementation
- Platform capabilities must be defined in the registry (
app/Support/Auth/PlatformCapabilities.php) and referenced via constants. - The System panel currently does not call
->databaseNotifications(). If we want in-app notifications for platform operators, add it. OperationRun.user_idcannot point toplatform_users; usecontextfields to record platform initiator metadata.