# Data Model — Spec 113: Platform Ops Runbooks This design describes the data we will read/write to implement the `/system` operator runbooks, grounded in the existing schema. ## Core persisted entities ### OperationRun (existing) - Table: `operation_runs` - Ownership: - Workspace-owned (always has `workspace_id`) - Tenant association is optional (`tenant_id` nullable) to support workspace/canonical runs - Fields (existing): - `id` - `workspace_id` (FK, NOT NULL) - `tenant_id` (FK, nullable) - `user_id` (FK to `users`, nullable) - `initiator_name` (string) - `type` (string; for this feature: `findings.lifecycle.backfill`) - `status` (`queued|running|completed`) - `outcome` (`pending|succeeded|failed|blocked|...`) - `run_identity_hash` (string; active-run idempotency) - `summary_counts` (json) - `failure_summary` (json) - `context` (json) - `started_at`, `completed_at` #### Summary counts contract - Must only use keys from `App\Support\OpsUx\OperationSummaryKeys::all()`. - v1 keys for this runbook: - `total` (findings scanned) - `processed` (findings processed) - `updated` (findings updated + duplicate consolidations) - `skipped` (findings unchanged) - `failed` (per-tenant job failures) - `tenants` (for all-tenants orchestrator: tenants targeted) #### Context shape (for this feature) Store these values in `operation_runs.context`: - `runbook`: - `key`: `findings.lifecycle.backfill` - `scope`: `all_tenants` | `single_tenant` - `target_tenant_id`: int|null - `source`: `system_ui` | `cli` | `deploy_hook` - `preflight`: - `affected_count`: int (findings that would change) - `total_count`: int (findings scanned) - `estimated_tenants`: int|null (for all tenants) - `reason` (required for all-tenants and break-glass): - `reason_code`: `DATA_REPAIR|INCIDENT|SUPPORT|SECURITY` - `reason_text`: string - `platform_initiator` (when started from `/system`): - `platform_user_id`: int - `email`: string - `name`: string - `is_break_glass`: bool Notes: - We intentionally do not store secrets/PII beyond operator email/name already used in auditing. - `failure_summary` should store sanitized messages + stable reason codes, as already done by `RunFailureSanitizer`. #### All-tenants run modeling (v1) - All-tenants executes as a single **workspace-scoped** run (`tenant_id = null`). - Implementation fans out to multiple tenant jobs, but they all update the same workspace run via: - `OperationRunService::incrementSummaryCounts()` - `OperationRunService::appendFailures()` - `OperationRunService::maybeCompleteBulkRun()` - Per-tenant `OperationRun` rows are not required for v1 (avoids parent/child coordination). ### Audit log (existing infrastructure) - Existing: `App\Services\Intune\AuditLogger` is already used for System login auditing. - New audit actions (stable action IDs): - `platform.ops.runbooks.preflight` - `platform.ops.runbooks.start` - `platform.ops.runbooks.completed` - `platform.ops.runbooks.failed` - Audit context should include: - runbook key, scope, affected_count, operation_run_id, platform_user_id/email, ip/user_agent. ### Alerts (existing infrastructure) - Use `AlertDispatchService` to create `alert_deliveries` for operators. - New alert event: - `event_type`: `operations.run.failed` - `tenant_id`: platform tenant id (to route via workspace rules) - `metadata`: run id, run type, scope, view-run URL ## Derived / non-persisted ### Runbook catalog - Implementation as a PHP catalog (no DB table) with: - key, label, description, capability required, estimated duration (can reuse `OperationCatalog`). ## State transitions - `OperationRun.status/outcome` transitions are owned by `OperationRunService`. - Expected transitions (per run): - `queued` → `running` → `completed(succeeded|failed|blocked)` - Locks: - Tenant runs: already implemented via `Cache::lock('tenantpilot:findings:lifecycle_backfill:tenant:{id}', 900)` - All-tenants orchestration: add a scope-level lock to prevent duplicate fan-out.