129 lines
4.7 KiB
Markdown
129 lines
4.7 KiB
Markdown
# Implementation Plan: Platform Ops Runbooks (Spec 113)
|
|
|
|
**Branch**: `[113-platform-ops-runbooks]` | **Date**: 2026-02-26
|
|
**Spec**: `specs/113-platform-ops-runbooks/spec.md`
|
|
**Input**: Feature specification + design artifacts in `specs/113-platform-ops-runbooks/`
|
|
|
|
**Note**: This file is generated/maintained via Spec Kit (`/speckit.plan`). Keep it concise and free of placeholders/duplicates.
|
|
|
|
## Summary
|
|
|
|
Introduce a `/system` operator control plane for safe backfills/data repair.
|
|
|
|
v1 delivers one runbook: **Rebuild Findings Lifecycle**. It must:
|
|
- preflight (read-only)
|
|
- require explicit confirmation (typed confirmation for all-tenants) + reason capture
|
|
- execute as a tracked `OperationRun` with audit events + locking + idempotency
|
|
- be **never exposed** in the customer `/admin` plane
|
|
- reuse one shared code path across System UI + CLI + deploy hook
|
|
|
|
## Technical Context
|
|
|
|
- **Language/Runtime**: PHP 8.4, Laravel 12
|
|
- **Admin UI**: Filament v5 (Livewire v4)
|
|
- **Storage**: PostgreSQL
|
|
- **Testing**: Pest v4 (required for runtime behavior changes)
|
|
- **Ops primitives**: `OperationRun` + `OperationRunService` (service owns status/outcome transitions)
|
|
|
|
## Non-negotiables (Constitution / Spec constraints)
|
|
|
|
- Cross-plane access (`/admin` → `/system`) must be deny-as-not-found (**404**).
|
|
- Platform user missing a required capability must be **403**.
|
|
- `/system` session cookie must be isolated (distinct cookie name) and applied **before** `StartSession`.
|
|
- `/system/login` throttling: **10/min** per **IP + username** key; failed login attempts are audited.
|
|
- Any destructive-like action uses Filament `->action(...)` and `->requiresConfirmation()`.
|
|
- Ops-UX contract: toast intent-only; progress in run detail; terminal DB notification is `OperationRunCompleted` (initiator-only); no queued/running DB notifications.
|
|
- Audit writes are fail-safe (audit failure must not crash the runbook).
|
|
|
|
## Scope decisions (v1)
|
|
|
|
- **Canonical run viewing** for this spec is the **System panel**:
|
|
- Runbooks: `/system/ops/runbooks`
|
|
- Runs: `/system/ops/runs`
|
|
- **Allowed tenant universe (v1)**: all non-platform tenants present in the database (`tenants.external_id != 'platform'`). The System UI must not allow selecting or targeting the platform tenant.
|
|
|
|
## Project Structure
|
|
|
|
### Documentation
|
|
|
|
```text
|
|
specs/113-platform-ops-runbooks/
|
|
├── spec.md
|
|
├── plan.md
|
|
├── research.md
|
|
├── data-model.md
|
|
├── quickstart.md
|
|
├── tasks.md
|
|
└── contracts/
|
|
└── system-ops-runbooks.openapi.yaml
|
|
```
|
|
|
|
### Source code (planned touch points)
|
|
|
|
```text
|
|
app/
|
|
├── Console/Commands/
|
|
│ ├── TenantpilotBackfillFindingLifecycle.php
|
|
│ └── TenantpilotRunDeployRunbooks.php
|
|
├── Filament/System/Pages/
|
|
│ └── Ops/
|
|
│ ├── Runbooks.php
|
|
│ ├── Runs.php
|
|
│ └── ViewRun.php
|
|
├── Http/Middleware/
|
|
│ ├── EnsureCorrectGuard.php
|
|
│ ├── EnsurePlatformCapability.php
|
|
│ └── UseSystemSessionCookie.php
|
|
├── Jobs/
|
|
│ ├── BackfillFindingLifecycleJob.php
|
|
│ ├── BackfillFindingLifecycleWorkspaceJob.php
|
|
│ └── BackfillFindingLifecycleTenantIntoWorkspaceRunJob.php
|
|
├── Providers/Filament/
|
|
│ └── SystemPanelProvider.php
|
|
├── Services/
|
|
│ ├── Alerts/AlertDispatchService.php
|
|
│ ├── OperationRunService.php
|
|
│ └── Runbooks/FindingsLifecycleBackfillRunbookService.php
|
|
└── Support/Auth/
|
|
└── PlatformCapabilities.php
|
|
|
|
resources/views/filament/system/pages/ops/
|
|
├── runbooks.blade.php
|
|
├── runs.blade.php
|
|
└── view-run.blade.php
|
|
|
|
tests/Feature/System/
|
|
├── Spec113/
|
|
└── OpsRunbooks/
|
|
```
|
|
|
|
## Implementation Phases
|
|
|
|
1) **Foundational security hardening**
|
|
- Capability registry additions.
|
|
- 404 vs 403 semantics correctness.
|
|
- System session cookie isolation.
|
|
- System login throttling.
|
|
|
|
2) **Runbook core service (single source of truth)**
|
|
- `preflight(scope)` + `start(scope, initiator, reason, source)`.
|
|
- Audit events (fail-safe).
|
|
- Locking + idempotency.
|
|
|
|
3) **Execution pipeline**
|
|
- All-tenants orchestration as a workspace-scoped bulk run.
|
|
- Fan-out tenant jobs update shared run counts and completion.
|
|
|
|
4) **System UI surfaces**
|
|
- `/system/ops/runbooks` (preflight + confirm + start).
|
|
- `/system/ops/runs` list + `/system/ops/runs/{run}` detail.
|
|
|
|
5) **Remove customer-plane exposure**
|
|
- Remove/disable `/admin` maintenance trigger (feature flag default-off) + regression test.
|
|
|
|
6) **Shared entry points**
|
|
- Refactor existing CLI command to call the shared service.
|
|
- Add deploy hook command that calls the same service.
|
|
|
|
- Run focused tests + formatting (`vendor/bin/sail artisan test --compact` + `vendor/bin/sail bin pint --dirty`).
|