ahmido 30ad57baab feat/053-unify-runs-monitoring (#60 )

Summary

This PR introduces Unified Operations Runs + Monitoring Hub (053).

Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware).

Phase 1 adoption scope (per spec):
	•	Drift generation (drift.generate)
	•	Backup Set “Add Policies” (backup_set.add_policies)

Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental.

⸻

What changed

Monitoring / Operations hub
	•	Moved/organized run monitoring under Monitoring → Operations
	•	Added:
	•	status buckets (queued / running / succeeded / partially succeeded / failed)
	•	filters (run type, status bucket, time range)
	•	run detail “Related” links (e.g. Drift findings, Backup Set context)
	•	All hub pages are DB-only and view-only (no rerun/cancel/delete actions)

Canonical run semantics
	•	Added canonical helpers on BulkOperationRun:
	•	runType() (resource.action)
	•	statusBucket() derived from status + counts (testable semantics)

Drift integration (Phase 1)
	•	Drift generation start behavior now:
	•	creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids)
	•	dispatches generation job
	•	emits DB notifications including “View run” link
	•	On generation failure: stores sanitized failure entries + sends failure notification

Permissions / tenant isolation
	•	Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access
	•	Readonly can view runs but cannot start drift generation

⸻

Tests

Added/updated Pest coverage:
	•	BulkOperationRunStatusBucketTest.php
	•	DriftGenerationDispatchTest.php
	•	GenerateDriftFindingsJobNotificationTest.php
	•	RunAuthorizationTenantIsolationTest.php

Validation run locally:
	•	./vendor/bin/pint --dirty
	•	targeted tests from feature quickstart / drift monitoring tests

⸻

Manual QA
	1.	Go to Monitoring → Operations
	•	verify filters (run type / status / time range)
	•	verify run detail shows counts + sanitized failures + “Related” links
	2.	Open Drift Landing
	•	with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run”
	•	as readonly: should not start generation
	3.	Run detail
	•	drift.generate runs show “Drift findings” related link
	•	failure entries are sanitized (no secrets/tokens/raw payload dumps)

⸻

Notes / Ops
	•	Queue workers must be restarted after deploy so they load the new code:
	•	php artisan queue:restart (or Sail equivalent)
	•	This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern.

⸻

Spec / Docs
	•	SpecKit artifacts added under specs/053-unify-runs-monitoring/
	•	Checklists are complete:
	•	requirements checklist PASS
	•	writing checklist PASS

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #60

2026-01-16 15:10:31 +00:00

3.9 KiB

Raw Blame History

Data Model: Unified Operations Runs + Monitoring Hub (053)

This feature primarily standardizes and surfaces existing run records for long-running operations, and links operators to the underlying business artifacts (e.g., drift findings).

Entities

1) BulkOperationRun (`bulk_operation_runs`)

Purpose: Canonical tenant-scoped run record for long-running operations (Phase 1).

Model: App\Models\BulkOperationRun

Key fields (existing):

id (PK)
tenant_id (FK → tenants)
user_id (FK → users)
resource (string) — e.g. drift, backup_set
action (string) — e.g. generate, add_policies
idempotency_key (string|null)
status (string) — pending, running, completed, completed_with_errors, failed, aborted
counters: total_items, processed_items, succeeded, failed, skipped
item_ids (jsonb) — stable identifiers for the items/scope of the run
- Example (drift.generate): { scope_key, baseline_run_id, current_run_id }
- Example (backup_set.add_policies): { backup_set_id, policy_ids, options }
failures (jsonb|null) — sanitized failure details (including per-item failures for itemized operations)
audit_log_id (FK → audit_logs|null)
created_at, updated_at

Relationships:

BulkOperationRun belongsTo Tenant
BulkOperationRun belongsTo User
BulkOperationRun belongsTo AuditLog (nullable)

Uniqueness / idempotency:

Active-run uniqueness enforced via a partial unique index on (tenant_id, idempotency_key) for active statuses.
Idempotency keys are deterministic and stable per tenant + operation type + scope.

State transitions (storage):

pending → running → completed | completed_with_errors | failed | aborted

Status mapping (operator UI semantics):

pending → queued
running → running
completed → succeeded
completed_with_errors → partially succeeded
failed/aborted → failed

Failure entry shape (sanitized):

reason_code (string, stable) + reason (short sanitized message)
for itemized runs: item_id per failure entry (and optional type=skipped for non-failure outcomes)

2) Finding (`findings`) — Drift results

Purpose: Persisted analytic findings; drift findings are the primary “related artifact” for Drift generation runs.

Model: App\Models\Finding

Key fields (existing, drift-related):

id (PK)
tenant_id (FK → tenants)
finding_type (drift)
scope_key (string)
baseline_run_id (FK → inventory_sync_runs|null)
current_run_id (FK → inventory_sync_runs|null)
fingerprint (string; deterministic; unique per tenant)
subject_type, subject_external_id
status (new|acknowledged)
evidence_jsonb (jsonb; sanitized allowlist)
created_at, updated_at

Relationships:

Finding belongsTo Tenant
Finding belongsTo InventorySyncRun via baseline_run_id and current_run_id (nullable)

Notes:

Phase 1 can link operators from the drift run to findings through scope/baseline/current identifiers without introducing a new DB foreign key.
If later needed, introduce an explicit link (e.g., findings.bulk_operation_run_id) to make navigation and reporting easier.

3) InventorySyncRun (`inventory_sync_runs`) — Drift inputs

Purpose: “Last observed” inventory run records used as baseline/current inputs for drift comparisons.

Model: App\Models\InventorySyncRun

Relevant fields (existing):

tenant_id
status
selection_hash (used as scope_key)
finished_at

4) Notification Event (DB notifications)

Purpose: Persist run lifecycle notifications (queued/completed) linking operators to the run detail page.

Storage: Laravel Notifications (DB channel).

Payload (target):

tenant identifier
run identifier + type (bulk_operation_run)
status bucket (queued/running/succeeded/partial/failed)
summary counts and a safe error summary (when applicable)

3.9 KiB Raw Blame History