TenantAtlas/specs/113-platform-ops-runbooks/data-model.md
ahmido 200498fa8e feat(113): Platform Ops Runbooks — UX Polish (Filament-native, system theme, live scope) (#137)
## Summary

Implements and polishes the Platform Ops Runbooks feature (Spec 113) — the operator control plane for safe backfills and data repair from `/system`.

## Changes

### UX Polish (Phase 7 — US4)
- **Filament-native components**: Rewrote `runbooks.blade.php` and `view-run.blade.php` using `<x-filament::section>` instead of raw Tailwind div cards. Cards now render correctly with Filament's built-in borders, shadows and dark mode.
- **System panel theme**: Created `resources/css/filament/system/theme.css` and registered `->viteTheme()` on `SystemPanelProvider`. The system panel previously had no theme CSS registered — Tailwind utility classes weren't compiled for its views, causing the warning icon SVG to expand to full container size.
- **Live scope selector**: Added `->live()` to the scope `Radio` field so "Single tenant" immediately reveals the tenant search dropdown without requiring a Submit first.

### Core Feature (Phases 1–6, previously shipped)
- `/system/ops/runbooks` — runbook catalog, preflight, run with typed confirmation + reason
- `/system/ops/runs` — run history table with status/outcome badges
- `/system/ops/runs/{id}` — run detail view with summary counts, failures, collapsible context
- `FindingsLifecycleBackfillRunbookService` — preflight + execution logic
- AllowedTenantUniverse — scopes tenant picker to non-platform tenants only
- RBAC: `platform.ops.view`, `platform.runbooks.view`, `platform.runbooks.run`, `platform.runbooks.findings.lifecycle_backfill`
- Rate-limited `/system/login` (10/min per IP+username)
- Distinct session cookie for `/system` isolation

## Test Coverage
- 16 tests / 141 assertions — all passing
- Covers: page access, RBAC, preflight, run dispatch, scope selector, run detail, run list

## Checklist
- [x] Filament v5 / Livewire v4 compliant
- [x] Provider registered in `bootstrap/providers.php`
- [x] Destructive actions require confirmation (`->requiresConfirmation()`)
- [x] System panel theme registered (`viteTheme`)
- [x] Pint clean
- [x] Tests pass

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #137
2026-02-27 01:11:25 +00:00

4.0 KiB

Data Model — Spec 113: Platform Ops Runbooks

This design describes the data we will read/write to implement the /system operator runbooks, grounded in the existing schema.

Core persisted entities

OperationRun (existing)

  • Table: operation_runs
  • Ownership:
    • Workspace-owned (always has workspace_id)
    • Tenant association is optional (tenant_id nullable) to support workspace/canonical runs
  • Fields (existing):
    • id
    • workspace_id (FK, NOT NULL)
    • tenant_id (FK, nullable)
    • user_id (FK to users, nullable)
    • initiator_name (string)
    • type (string; for this feature: findings.lifecycle.backfill)
    • status (queued|running|completed)
    • outcome (pending|succeeded|failed|blocked|...)
    • run_identity_hash (string; active-run idempotency)
    • summary_counts (json)
    • failure_summary (json)
    • context (json)
    • started_at, completed_at

Summary counts contract

  • Must only use keys from App\Support\OpsUx\OperationSummaryKeys::all().
  • v1 keys for this runbook:
    • total (findings scanned)
    • processed (findings processed)
    • updated (findings updated + duplicate consolidations)
    • skipped (findings unchanged)
    • failed (per-tenant job failures)
    • tenants (for all-tenants orchestrator: tenants targeted)

Context shape (for this feature)

Store these values in operation_runs.context:

  • runbook:
    • key: findings.lifecycle.backfill
    • scope: all_tenants | single_tenant
    • target_tenant_id: int|null
    • source: system_ui | cli | deploy_hook
  • preflight:
    • affected_count: int (findings that would change)
    • total_count: int (findings scanned)
    • estimated_tenants: int|null (for all tenants)
  • reason (required for all-tenants and break-glass):
    • reason_code: DATA_REPAIR|INCIDENT|SUPPORT|SECURITY
    • reason_text: string
  • platform_initiator (when started from /system):
    • platform_user_id: int
    • email: string
    • name: string
    • is_break_glass: bool

Notes:

  • We intentionally do not store secrets/PII beyond operator email/name already used in auditing.
  • failure_summary should store sanitized messages + stable reason codes, as already done by RunFailureSanitizer.

All-tenants run modeling (v1)

  • All-tenants executes as a single workspace-scoped run (tenant_id = null).
  • Implementation fans out to multiple tenant jobs, but they all update the same workspace run via:
    • OperationRunService::incrementSummaryCounts()
    • OperationRunService::appendFailures()
    • OperationRunService::maybeCompleteBulkRun()
  • Per-tenant OperationRun rows are not required for v1 (avoids parent/child coordination).

Audit log (existing infrastructure)

  • Existing: App\Services\Intune\AuditLogger is already used for System login auditing.
  • New audit actions (stable action IDs):
    • platform.ops.runbooks.preflight
    • platform.ops.runbooks.start
    • platform.ops.runbooks.completed
    • platform.ops.runbooks.failed
  • Audit context should include:
    • runbook key, scope, affected_count, operation_run_id, platform_user_id/email, ip/user_agent.

Alerts (existing infrastructure)

  • Use AlertDispatchService to create alert_deliveries for operators.
  • New alert event:
    • event_type: operations.run.failed
    • tenant_id: platform tenant id (to route via workspace rules)
    • metadata: run id, run type, scope, view-run URL

Derived / non-persisted

Runbook catalog

  • Implementation as a PHP catalog (no DB table) with:
    • key, label, description, capability required, estimated duration (can reuse OperationCatalog).

State transitions

  • OperationRun.status/outcome transitions are owned by OperationRunService.
  • Expected transitions (per run):
    • queuedrunningcompleted(succeeded|failed|blocked)
  • Locks:
    • Tenant runs: already implemented via Cache::lock('tenantpilot:findings:lifecycle_backfill:tenant:{id}', 900)
    • All-tenants orchestration: add a scope-level lock to prevent duplicate fan-out.