## Summary
Implements and polishes the Platform Ops Runbooks feature (Spec 113) — the operator control plane for safe backfills and data repair from `/system`.
## Changes
### UX Polish (Phase 7 — US4)
- **Filament-native components**: Rewrote `runbooks.blade.php` and `view-run.blade.php` using `<x-filament::section>` instead of raw Tailwind div cards. Cards now render correctly with Filament's built-in borders, shadows and dark mode.
- **System panel theme**: Created `resources/css/filament/system/theme.css` and registered `->viteTheme()` on `SystemPanelProvider`. The system panel previously had no theme CSS registered — Tailwind utility classes weren't compiled for its views, causing the warning icon SVG to expand to full container size.
- **Live scope selector**: Added `->live()` to the scope `Radio` field so "Single tenant" immediately reveals the tenant search dropdown without requiring a Submit first.
### Core Feature (Phases 1–6, previously shipped)
- `/system/ops/runbooks` — runbook catalog, preflight, run with typed confirmation + reason
- `/system/ops/runs` — run history table with status/outcome badges
- `/system/ops/runs/{id}` — run detail view with summary counts, failures, collapsible context
- `FindingsLifecycleBackfillRunbookService` — preflight + execution logic
- AllowedTenantUniverse — scopes tenant picker to non-platform tenants only
- RBAC: `platform.ops.view`, `platform.runbooks.view`, `platform.runbooks.run`, `platform.runbooks.findings.lifecycle_backfill`
- Rate-limited `/system/login` (10/min per IP+username)
- Distinct session cookie for `/system` isolation
## Test Coverage
- 16 tests / 141 assertions — all passing
- Covers: page access, RBAC, preflight, run dispatch, scope selector, run detail, run list
## Checklist
- [x] Filament v5 / Livewire v4 compliant
- [x] Provider registered in `bootstrap/providers.php`
- [x] Destructive actions require confirmation (`->requiresConfirmation()`)
- [x] System panel theme registered (`viteTheme`)
- [x] Pint clean
- [x] Tests pass
Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #137
100 lines
4.0 KiB
Markdown
100 lines
4.0 KiB
Markdown
# Data Model — Spec 113: Platform Ops Runbooks
|
|
|
|
This design describes the data we will read/write to implement the `/system` operator runbooks, grounded in the existing schema.
|
|
|
|
## Core persisted entities
|
|
|
|
### OperationRun (existing)
|
|
- Table: `operation_runs`
|
|
- Ownership:
|
|
- Workspace-owned (always has `workspace_id`)
|
|
- Tenant association is optional (`tenant_id` nullable) to support workspace/canonical runs
|
|
- Fields (existing):
|
|
- `id`
|
|
- `workspace_id` (FK, NOT NULL)
|
|
- `tenant_id` (FK, nullable)
|
|
- `user_id` (FK to `users`, nullable)
|
|
- `initiator_name` (string)
|
|
- `type` (string; for this feature: `findings.lifecycle.backfill`)
|
|
- `status` (`queued|running|completed`)
|
|
- `outcome` (`pending|succeeded|failed|blocked|...`)
|
|
- `run_identity_hash` (string; active-run idempotency)
|
|
- `summary_counts` (json)
|
|
- `failure_summary` (json)
|
|
- `context` (json)
|
|
- `started_at`, `completed_at`
|
|
|
|
#### Summary counts contract
|
|
- Must only use keys from `App\Support\OpsUx\OperationSummaryKeys::all()`.
|
|
- v1 keys for this runbook:
|
|
- `total` (findings scanned)
|
|
- `processed` (findings processed)
|
|
- `updated` (findings updated + duplicate consolidations)
|
|
- `skipped` (findings unchanged)
|
|
- `failed` (per-tenant job failures)
|
|
- `tenants` (for all-tenants orchestrator: tenants targeted)
|
|
|
|
#### Context shape (for this feature)
|
|
Store these values in `operation_runs.context`:
|
|
|
|
- `runbook`:
|
|
- `key`: `findings.lifecycle.backfill`
|
|
- `scope`: `all_tenants` | `single_tenant`
|
|
- `target_tenant_id`: int|null
|
|
- `source`: `system_ui` | `cli` | `deploy_hook`
|
|
- `preflight`:
|
|
- `affected_count`: int (findings that would change)
|
|
- `total_count`: int (findings scanned)
|
|
- `estimated_tenants`: int|null (for all tenants)
|
|
- `reason` (required for all-tenants and break-glass):
|
|
- `reason_code`: `DATA_REPAIR|INCIDENT|SUPPORT|SECURITY`
|
|
- `reason_text`: string
|
|
- `platform_initiator` (when started from `/system`):
|
|
- `platform_user_id`: int
|
|
- `email`: string
|
|
- `name`: string
|
|
- `is_break_glass`: bool
|
|
|
|
Notes:
|
|
- We intentionally do not store secrets/PII beyond operator email/name already used in auditing.
|
|
- `failure_summary` should store sanitized messages + stable reason codes, as already done by `RunFailureSanitizer`.
|
|
|
|
#### All-tenants run modeling (v1)
|
|
- All-tenants executes as a single **workspace-scoped** run (`tenant_id = null`).
|
|
- Implementation fans out to multiple tenant jobs, but they all update the same workspace run via:
|
|
- `OperationRunService::incrementSummaryCounts()`
|
|
- `OperationRunService::appendFailures()`
|
|
- `OperationRunService::maybeCompleteBulkRun()`
|
|
- Per-tenant `OperationRun` rows are not required for v1 (avoids parent/child coordination).
|
|
|
|
### Audit log (existing infrastructure)
|
|
- Existing: `App\Services\Intune\AuditLogger` is already used for System login auditing.
|
|
- New audit actions (stable action IDs):
|
|
- `platform.ops.runbooks.preflight`
|
|
- `platform.ops.runbooks.start`
|
|
- `platform.ops.runbooks.completed`
|
|
- `platform.ops.runbooks.failed`
|
|
- Audit context should include:
|
|
- runbook key, scope, affected_count, operation_run_id, platform_user_id/email, ip/user_agent.
|
|
|
|
### Alerts (existing infrastructure)
|
|
- Use `AlertDispatchService` to create `alert_deliveries` for operators.
|
|
- New alert event:
|
|
- `event_type`: `operations.run.failed`
|
|
- `tenant_id`: platform tenant id (to route via workspace rules)
|
|
- `metadata`: run id, run type, scope, view-run URL
|
|
|
|
## Derived / non-persisted
|
|
|
|
### Runbook catalog
|
|
- Implementation as a PHP catalog (no DB table) with:
|
|
- key, label, description, capability required, estimated duration (can reuse `OperationCatalog`).
|
|
|
|
## State transitions
|
|
- `OperationRun.status/outcome` transitions are owned by `OperationRunService`.
|
|
- Expected transitions (per run):
|
|
- `queued` → `running` → `completed(succeeded|failed|blocked)`
|
|
- Locks:
|
|
- Tenant runs: already implemented via `Cache::lock('tenantpilot:findings:lifecycle_backfill:tenant:{id}', 900)`
|
|
- All-tenants orchestration: add a scope-level lock to prevent duplicate fan-out.
|