TenantAtlas/specs/113-platform-ops-runbooks/plan.md
ahmido 200498fa8e feat(113): Platform Ops Runbooks — UX Polish (Filament-native, system theme, live scope) (#137)
## Summary

Implements and polishes the Platform Ops Runbooks feature (Spec 113) — the operator control plane for safe backfills and data repair from `/system`.

## Changes

### UX Polish (Phase 7 — US4)
- **Filament-native components**: Rewrote `runbooks.blade.php` and `view-run.blade.php` using `<x-filament::section>` instead of raw Tailwind div cards. Cards now render correctly with Filament's built-in borders, shadows and dark mode.
- **System panel theme**: Created `resources/css/filament/system/theme.css` and registered `->viteTheme()` on `SystemPanelProvider`. The system panel previously had no theme CSS registered — Tailwind utility classes weren't compiled for its views, causing the warning icon SVG to expand to full container size.
- **Live scope selector**: Added `->live()` to the scope `Radio` field so "Single tenant" immediately reveals the tenant search dropdown without requiring a Submit first.

### Core Feature (Phases 1–6, previously shipped)
- `/system/ops/runbooks` — runbook catalog, preflight, run with typed confirmation + reason
- `/system/ops/runs` — run history table with status/outcome badges
- `/system/ops/runs/{id}` — run detail view with summary counts, failures, collapsible context
- `FindingsLifecycleBackfillRunbookService` — preflight + execution logic
- AllowedTenantUniverse — scopes tenant picker to non-platform tenants only
- RBAC: `platform.ops.view`, `platform.runbooks.view`, `platform.runbooks.run`, `platform.runbooks.findings.lifecycle_backfill`
- Rate-limited `/system/login` (10/min per IP+username)
- Distinct session cookie for `/system` isolation

## Test Coverage
- 16 tests / 141 assertions — all passing
- Covers: page access, RBAC, preflight, run dispatch, scope selector, run detail, run list

## Checklist
- [x] Filament v5 / Livewire v4 compliant
- [x] Provider registered in `bootstrap/providers.php`
- [x] Destructive actions require confirmation (`->requiresConfirmation()`)
- [x] System panel theme registered (`viteTheme`)
- [x] Pint clean
- [x] Tests pass

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #137
2026-02-27 01:11:25 +00:00

4.7 KiB

Implementation Plan: Platform Ops Runbooks (Spec 113)

Branch: [113-platform-ops-runbooks] | Date: 2026-02-26
Spec: specs/113-platform-ops-runbooks/spec.md
Input: Feature specification + design artifacts in specs/113-platform-ops-runbooks/

Note: This file is generated/maintained via Spec Kit (/speckit.plan). Keep it concise and free of placeholders/duplicates.

Summary

Introduce a /system operator control plane for safe backfills/data repair.

v1 delivers one runbook: Rebuild Findings Lifecycle. It must:

  • preflight (read-only)
  • require explicit confirmation (typed confirmation for all-tenants) + reason capture
  • execute as a tracked OperationRun with audit events + locking + idempotency
  • be never exposed in the customer /admin plane
  • reuse one shared code path across System UI + CLI + deploy hook

Technical Context

  • Language/Runtime: PHP 8.4, Laravel 12
  • Admin UI: Filament v5 (Livewire v4)
  • Storage: PostgreSQL
  • Testing: Pest v4 (required for runtime behavior changes)
  • Ops primitives: OperationRun + OperationRunService (service owns status/outcome transitions)

Non-negotiables (Constitution / Spec constraints)

  • Cross-plane access (/admin/system) must be deny-as-not-found (404).
  • Platform user missing a required capability must be 403.
  • /system session cookie must be isolated (distinct cookie name) and applied before StartSession.
  • /system/login throttling: 10/min per IP + username key; failed login attempts are audited.
  • Any destructive-like action uses Filament ->action(...) and ->requiresConfirmation().
  • Ops-UX contract: toast intent-only; progress in run detail; terminal DB notification is OperationRunCompleted (initiator-only); no queued/running DB notifications.
  • Audit writes are fail-safe (audit failure must not crash the runbook).

Scope decisions (v1)

  • Canonical run viewing for this spec is the System panel:
    • Runbooks: /system/ops/runbooks
    • Runs: /system/ops/runs
  • Allowed tenant universe (v1): all non-platform tenants present in the database (tenants.external_id != 'platform'). The System UI must not allow selecting or targeting the platform tenant.

Project Structure

Documentation

specs/113-platform-ops-runbooks/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── tasks.md
└── contracts/
    └── system-ops-runbooks.openapi.yaml

Source code (planned touch points)

app/
├── Console/Commands/
│   ├── TenantpilotBackfillFindingLifecycle.php
│   └── TenantpilotRunDeployRunbooks.php
├── Filament/System/Pages/
│   └── Ops/
│       ├── Runbooks.php
│       ├── Runs.php
│       └── ViewRun.php
├── Http/Middleware/
│   ├── EnsureCorrectGuard.php
│   ├── EnsurePlatformCapability.php
│   └── UseSystemSessionCookie.php
├── Jobs/
│   ├── BackfillFindingLifecycleJob.php
│   ├── BackfillFindingLifecycleWorkspaceJob.php
│   └── BackfillFindingLifecycleTenantIntoWorkspaceRunJob.php
├── Providers/Filament/
│   └── SystemPanelProvider.php
├── Services/
│   ├── Alerts/AlertDispatchService.php
│   ├── OperationRunService.php
│   └── Runbooks/FindingsLifecycleBackfillRunbookService.php
└── Support/Auth/
    └── PlatformCapabilities.php

resources/views/filament/system/pages/ops/
├── runbooks.blade.php
├── runs.blade.php
└── view-run.blade.php

tests/Feature/System/
├── Spec113/
└── OpsRunbooks/

Implementation Phases

  1. Foundational security hardening

    • Capability registry additions.
    • 404 vs 403 semantics correctness.
    • System session cookie isolation.
    • System login throttling.
  2. Runbook core service (single source of truth)

    • preflight(scope) + start(scope, initiator, reason, source).
    • Audit events (fail-safe).
    • Locking + idempotency.
  3. Execution pipeline

    • All-tenants orchestration as a workspace-scoped bulk run.
    • Fan-out tenant jobs update shared run counts and completion.
  4. System UI surfaces

    • /system/ops/runbooks (preflight + confirm + start).
    • /system/ops/runs list + /system/ops/runs/{run} detail.
  5. Remove customer-plane exposure

    • Remove/disable /admin maintenance trigger (feature flag default-off) + regression test.
  6. Shared entry points

    • Refactor existing CLI command to call the shared service.

    • Add deploy hook command that calls the same service.

    • Run focused tests + formatting (vendor/bin/sail artisan test --compact + vendor/bin/sail bin pint --dirty).