# Implementation Plan: Platform Ops Runbooks (Spec 113) **Branch**: `[113-platform-ops-runbooks]` | **Date**: 2026-02-26 **Spec**: `specs/113-platform-ops-runbooks/spec.md` **Input**: Feature specification + design artifacts in `specs/113-platform-ops-runbooks/` **Note**: This file is generated/maintained via Spec Kit (`/speckit.plan`). Keep it concise and free of placeholders/duplicates. ## Summary Introduce a `/system` operator control plane for safe backfills/data repair. v1 delivers one runbook: **Rebuild Findings Lifecycle**. It must: - preflight (read-only) - require explicit confirmation (typed confirmation for all-tenants) + reason capture - execute as a tracked `OperationRun` with audit events + locking + idempotency - be **never exposed** in the customer `/admin` plane - reuse one shared code path across System UI + CLI + deploy hook ## Technical Context - **Language/Runtime**: PHP 8.4, Laravel 12 - **Admin UI**: Filament v5 (Livewire v4) - **Storage**: PostgreSQL - **Testing**: Pest v4 (required for runtime behavior changes) - **Ops primitives**: `OperationRun` + `OperationRunService` (service owns status/outcome transitions) ## Non-negotiables (Constitution / Spec constraints) - Cross-plane access (`/admin` → `/system`) must be deny-as-not-found (**404**). - Platform user missing a required capability must be **403**. - `/system` session cookie must be isolated (distinct cookie name) and applied **before** `StartSession`. - `/system/login` throttling: **10/min** per **IP + username** key; failed login attempts are audited. - Any destructive-like action uses Filament `->action(...)` and `->requiresConfirmation()`. - Ops-UX contract: toast intent-only; progress in run detail; terminal DB notification is `OperationRunCompleted` (initiator-only); no queued/running DB notifications. - Audit writes are fail-safe (audit failure must not crash the runbook). ## Scope decisions (v1) - **Canonical run viewing** for this spec is the **System panel**: - Runbooks: `/system/ops/runbooks` - Runs: `/system/ops/runs` - **Allowed tenant universe (v1)**: all non-platform tenants present in the database (`tenants.external_id != 'platform'`). The System UI must not allow selecting or targeting the platform tenant. ## Project Structure ### Documentation ```text specs/113-platform-ops-runbooks/ ├── spec.md ├── plan.md ├── research.md ├── data-model.md ├── quickstart.md ├── tasks.md └── contracts/ └── system-ops-runbooks.openapi.yaml ``` ### Source code (planned touch points) ```text app/ ├── Console/Commands/ │ ├── TenantpilotBackfillFindingLifecycle.php │ └── TenantpilotRunDeployRunbooks.php ├── Filament/System/Pages/ │ └── Ops/ │ ├── Runbooks.php │ ├── Runs.php │ └── ViewRun.php ├── Http/Middleware/ │ ├── EnsureCorrectGuard.php │ ├── EnsurePlatformCapability.php │ └── UseSystemSessionCookie.php ├── Jobs/ │ ├── BackfillFindingLifecycleJob.php │ ├── BackfillFindingLifecycleWorkspaceJob.php │ └── BackfillFindingLifecycleTenantIntoWorkspaceRunJob.php ├── Providers/Filament/ │ └── SystemPanelProvider.php ├── Services/ │ ├── Alerts/AlertDispatchService.php │ ├── OperationRunService.php │ └── Runbooks/FindingsLifecycleBackfillRunbookService.php └── Support/Auth/ └── PlatformCapabilities.php resources/views/filament/system/pages/ops/ ├── runbooks.blade.php ├── runs.blade.php └── view-run.blade.php tests/Feature/System/ ├── Spec113/ └── OpsRunbooks/ ``` ## Implementation Phases 1) **Foundational security hardening** - Capability registry additions. - 404 vs 403 semantics correctness. - System session cookie isolation. - System login throttling. 2) **Runbook core service (single source of truth)** - `preflight(scope)` + `start(scope, initiator, reason, source)`. - Audit events (fail-safe). - Locking + idempotency. 3) **Execution pipeline** - All-tenants orchestration as a workspace-scoped bulk run. - Fan-out tenant jobs update shared run counts and completion. 4) **System UI surfaces** - `/system/ops/runbooks` (preflight + confirm + start). - `/system/ops/runs` list + `/system/ops/runs/{run}` detail. 5) **Remove customer-plane exposure** - Remove/disable `/admin` maintenance trigger (feature flag default-off) + regression test. 6) **Shared entry points** - Refactor existing CLI command to call the shared service. - Add deploy hook command that calls the same service. - Run focused tests + formatting (`vendor/bin/sail artisan test --compact` + `vendor/bin/sail bin pint --dirty`).