TenantAtlas/specs/053-unify-runs-monitoring/data-model.md
ahmido 30ad57baab feat/053-unify-runs-monitoring (#60)
Summary

This PR introduces Unified Operations Runs + Monitoring Hub (053).

Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware).

Phase 1 adoption scope (per spec):
	•	Drift generation (drift.generate)
	•	Backup Set “Add Policies” (backup_set.add_policies)

Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental.

⸻

What changed

Monitoring / Operations hub
	•	Moved/organized run monitoring under Monitoring → Operations
	•	Added:
	•	status buckets (queued / running / succeeded / partially succeeded / failed)
	•	filters (run type, status bucket, time range)
	•	run detail “Related” links (e.g. Drift findings, Backup Set context)
	•	All hub pages are DB-only and view-only (no rerun/cancel/delete actions)

Canonical run semantics
	•	Added canonical helpers on BulkOperationRun:
	•	runType() (resource.action)
	•	statusBucket() derived from status + counts (testable semantics)

Drift integration (Phase 1)
	•	Drift generation start behavior now:
	•	creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids)
	•	dispatches generation job
	•	emits DB notifications including “View run” link
	•	On generation failure: stores sanitized failure entries + sends failure notification

Permissions / tenant isolation
	•	Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access
	•	Readonly can view runs but cannot start drift generation

⸻

Tests

Added/updated Pest coverage:
	•	BulkOperationRunStatusBucketTest.php
	•	DriftGenerationDispatchTest.php
	•	GenerateDriftFindingsJobNotificationTest.php
	•	RunAuthorizationTenantIsolationTest.php

Validation run locally:
	•	./vendor/bin/pint --dirty
	•	targeted tests from feature quickstart / drift monitoring tests

⸻

Manual QA
	1.	Go to Monitoring → Operations
	•	verify filters (run type / status / time range)
	•	verify run detail shows counts + sanitized failures + “Related” links
	2.	Open Drift Landing
	•	with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run”
	•	as readonly: should not start generation
	3.	Run detail
	•	drift.generate runs show “Drift findings” related link
	•	failure entries are sanitized (no secrets/tokens/raw payload dumps)

⸻

Notes / Ops
	•	Queue workers must be restarted after deploy so they load the new code:
	•	php artisan queue:restart (or Sail equivalent)
	•	This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern.

⸻

Spec / Docs
	•	SpecKit artifacts added under specs/053-unify-runs-monitoring/
	•	Checklists are complete:
	•	requirements checklist PASS
	•	writing checklist PASS

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #60
2026-01-16 15:10:31 +00:00

108 lines
3.9 KiB
Markdown

# Data Model: Unified Operations Runs + Monitoring Hub (053)
This feature primarily standardizes and surfaces existing run records for long-running operations, and links operators to the underlying business artifacts (e.g., drift findings).
## Entities
## 1) BulkOperationRun (`bulk_operation_runs`)
**Purpose:** Canonical tenant-scoped run record for long-running operations (Phase 1).
**Model:** `App\Models\BulkOperationRun`
**Key fields (existing):**
- `id` (PK)
- `tenant_id` (FK → tenants)
- `user_id` (FK → users)
- `resource` (string) — e.g. `drift`, `backup_set`
- `action` (string) — e.g. `generate`, `add_policies`
- `idempotency_key` (string|null)
- `status` (string) — `pending`, `running`, `completed`, `completed_with_errors`, `failed`, `aborted`
- counters: `total_items`, `processed_items`, `succeeded`, `failed`, `skipped`
- `item_ids` (jsonb) — stable identifiers for the items/scope of the run
- Example (`drift.generate`): `{ scope_key, baseline_run_id, current_run_id }`
- Example (`backup_set.add_policies`): `{ backup_set_id, policy_ids, options }`
- `failures` (jsonb|null) — sanitized failure details (including per-item failures for itemized operations)
- `audit_log_id` (FK → audit_logs|null)
- `created_at`, `updated_at`
**Relationships:**
- `BulkOperationRun belongsTo Tenant`
- `BulkOperationRun belongsTo User`
- `BulkOperationRun belongsTo AuditLog` (nullable)
**Uniqueness / idempotency:**
- Active-run uniqueness enforced via a partial unique index on `(tenant_id, idempotency_key)` for active statuses.
- Idempotency keys are deterministic and stable per tenant + operation type + scope.
**State transitions (storage):**
- `pending → running → completed | completed_with_errors | failed | aborted`
**Status mapping (operator UI semantics):**
- `pending``queued`
- `running``running`
- `completed``succeeded`
- `completed_with_errors``partially succeeded`
- `failed`/`aborted` → `failed`
**Failure entry shape (sanitized):**
- `reason_code` (string, stable) + `reason` (short sanitized message)
- for itemized runs: `item_id` per failure entry (and optional `type=skipped` for non-failure outcomes)
---
## 2) Finding (`findings`) — Drift results
**Purpose:** Persisted analytic findings; drift findings are the primary “related artifact” for Drift generation runs.
**Model:** `App\Models\Finding`
**Key fields (existing, drift-related):**
- `id` (PK)
- `tenant_id` (FK → tenants)
- `finding_type` (`drift`)
- `scope_key` (string)
- `baseline_run_id` (FK → inventory_sync_runs|null)
- `current_run_id` (FK → inventory_sync_runs|null)
- `fingerprint` (string; deterministic; unique per tenant)
- `subject_type`, `subject_external_id`
- `status` (`new|acknowledged`)
- `evidence_jsonb` (jsonb; sanitized allowlist)
- `created_at`, `updated_at`
**Relationships:**
- `Finding belongsTo Tenant`
- `Finding belongsTo InventorySyncRun` via `baseline_run_id` and `current_run_id` (nullable)
**Notes:**
- Phase 1 can link operators from the drift run to findings through scope/baseline/current identifiers without introducing a new DB foreign key.
- If later needed, introduce an explicit link (e.g., `findings.bulk_operation_run_id`) to make navigation and reporting easier.
---
## 3) InventorySyncRun (`inventory_sync_runs`) — Drift inputs
**Purpose:** “Last observed” inventory run records used as baseline/current inputs for drift comparisons.
**Model:** `App\Models\InventorySyncRun`
**Relevant fields (existing):**
- `tenant_id`
- `status`
- `selection_hash` (used as `scope_key`)
- `finished_at`
---
## 4) Notification Event (DB notifications)
**Purpose:** Persist run lifecycle notifications (queued/completed) linking operators to the run detail page.
**Storage:** Laravel Notifications (DB channel).
**Payload (target):**
- tenant identifier
- run identifier + type (`bulk_operation_run`)
- status bucket (queued/running/succeeded/partial/failed)
- summary counts and a safe error summary (when applicable)