# Data Model — Spec 113: Platform Ops Runbooks

This design describes the data we will read/write to implement the `/system` operator runbooks, grounded in the existing schema.

## Core persisted entities

### OperationRun (existing)
- Table: `operation_runs`
- Ownership:
  - Workspace-owned (always has `workspace_id`)
  - Tenant association is optional (`tenant_id` nullable) to support workspace/canonical runs
- Fields (existing):
  - `id`
  - `workspace_id` (FK, NOT NULL)
  - `tenant_id` (FK, nullable)
  - `user_id` (FK to `users`, nullable)
  - `initiator_name` (string)
  - `type` (string; for this feature: `findings.lifecycle.backfill`)
  - `status` (`queued|running|completed`)
  - `outcome` (`pending|succeeded|failed|blocked|...`)
  - `run_identity_hash` (string; active-run idempotency)
  - `summary_counts` (json)
  - `failure_summary` (json)
  - `context` (json)
  - `started_at`, `completed_at`

#### Summary counts contract
- Must only use keys from `App\Support\OpsUx\OperationSummaryKeys::all()`.
- v1 keys for this runbook:
  - `total` (findings scanned)
  - `processed` (findings processed)
  - `updated` (findings updated + duplicate consolidations)
  - `skipped` (findings unchanged)
  - `failed` (per-tenant job failures)
  - `tenants` (for all-tenants orchestrator: tenants targeted)

#### Context shape (for this feature)
Store these values in `operation_runs.context`:

- `runbook`:
  - `key`: `findings.lifecycle.backfill`
  - `scope`: `all_tenants` | `single_tenant`
  - `target_tenant_id`: int|null
  - `source`: `system_ui` | `cli` | `deploy_hook`
- `preflight`:
  - `affected_count`: int (findings that would change)
  - `total_count`: int (findings scanned)
  - `estimated_tenants`: int|null (for all tenants)
- `reason` (required for all-tenants and break-glass):
  - `reason_code`: `DATA_REPAIR|INCIDENT|SUPPORT|SECURITY`
  - `reason_text`: string
- `platform_initiator` (when started from `/system`):
  - `platform_user_id`: int
  - `email`: string
  - `name`: string
  - `is_break_glass`: bool

Notes:
- We intentionally do not store secrets/PII beyond operator email/name already used in auditing.
- `failure_summary` should store sanitized messages + stable reason codes, as already done by `RunFailureSanitizer`.

#### All-tenants run modeling (v1)
- All-tenants executes as a single **workspace-scoped** run (`tenant_id = null`).
- Implementation fans out to multiple tenant jobs, but they all update the same workspace run via:
  - `OperationRunService::incrementSummaryCounts()`
  - `OperationRunService::appendFailures()`
  - `OperationRunService::maybeCompleteBulkRun()`
- Per-tenant `OperationRun` rows are not required for v1 (avoids parent/child coordination).

### Audit log (existing infrastructure)
- Existing: `App\Services\Intune\AuditLogger` is already used for System login auditing.
- New audit actions (stable action IDs):
  - `platform.ops.runbooks.preflight`
  - `platform.ops.runbooks.start`
  - `platform.ops.runbooks.completed`
  - `platform.ops.runbooks.failed`
- Audit context should include:
  - runbook key, scope, affected_count, operation_run_id, platform_user_id/email, ip/user_agent.

### Alerts (existing infrastructure)
- Use `AlertDispatchService` to create `alert_deliveries` for operators.
- New alert event:
  - `event_type`: `operations.run.failed`
  - `tenant_id`: platform tenant id (to route via workspace rules)
  - `metadata`: run id, run type, scope, view-run URL

## Derived / non-persisted

### Runbook catalog
- Implementation as a PHP catalog (no DB table) with:
  - key, label, description, capability required, estimated duration (can reuse `OperationCatalog`).

## State transitions
- `OperationRun.status/outcome` transitions are owned by `OperationRunService`.
- Expected transitions (per run):
  - `queued` → `running` → `completed(succeeded|failed|blocked)`
- Locks:
  - Tenant runs: already implemented via `Cache::lock('tenantpilot:findings:lifecycle_backfill:tenant:{id}', 900)`
  - All-tenants orchestration: add a scope-level lock to prevent duplicate fan-out.