ahmido 30ad57baab feat/053-unify-runs-monitoring (#60 )

Summary

This PR introduces Unified Operations Runs + Monitoring Hub (053).

Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware).

Phase 1 adoption scope (per spec):
	•	Drift generation (drift.generate)
	•	Backup Set “Add Policies” (backup_set.add_policies)

Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental.

⸻

What changed

Monitoring / Operations hub
	•	Moved/organized run monitoring under Monitoring → Operations
	•	Added:
	•	status buckets (queued / running / succeeded / partially succeeded / failed)
	•	filters (run type, status bucket, time range)
	•	run detail “Related” links (e.g. Drift findings, Backup Set context)
	•	All hub pages are DB-only and view-only (no rerun/cancel/delete actions)

Canonical run semantics
	•	Added canonical helpers on BulkOperationRun:
	•	runType() (resource.action)
	•	statusBucket() derived from status + counts (testable semantics)

Drift integration (Phase 1)
	•	Drift generation start behavior now:
	•	creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids)
	•	dispatches generation job
	•	emits DB notifications including “View run” link
	•	On generation failure: stores sanitized failure entries + sends failure notification

Permissions / tenant isolation
	•	Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access
	•	Readonly can view runs but cannot start drift generation

⸻

Tests

Added/updated Pest coverage:
	•	BulkOperationRunStatusBucketTest.php
	•	DriftGenerationDispatchTest.php
	•	GenerateDriftFindingsJobNotificationTest.php
	•	RunAuthorizationTenantIsolationTest.php

Validation run locally:
	•	./vendor/bin/pint --dirty
	•	targeted tests from feature quickstart / drift monitoring tests

⸻

Manual QA
	1.	Go to Monitoring → Operations
	•	verify filters (run type / status / time range)
	•	verify run detail shows counts + sanitized failures + “Related” links
	2.	Open Drift Landing
	•	with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run”
	•	as readonly: should not start generation
	3.	Run detail
	•	drift.generate runs show “Drift findings” related link
	•	failure entries are sanitized (no secrets/tokens/raw payload dumps)

⸻

Notes / Ops
	•	Queue workers must be restarted after deploy so they load the new code:
	•	php artisan queue:restart (or Sail equivalent)
	•	This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern.

⸻

Spec / Docs
	•	SpecKit artifacts added under specs/053-unify-runs-monitoring/
	•	Checklists are complete:
	•	requirements checklist PASS
	•	writing checklist PASS

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #60

2026-01-16 15:10:31 +00:00

7.8 KiB

Raw Blame History

Implementation Plan: Unified Operations Runs + Monitoring Hub (053)

Branch: feat/053-unify-runs-monitoring | Date: 2026-01-16 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md (spec.md) Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md

Note: This plan is filled in by the /speckit.plan command. See /Users/ahmeddarrazi/Documents/projects/TenantAtlas/.specify/scripts/ for helper scripts.

Summary

Unify how long-running operations are tracked and monitored by using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, surfacing it in a single Monitoring/Operations hub (view-only, tenant-scoped, role-aware), and standardizing status semantics, notifications, failure detail minimization, and idempotent de-duplication.

Phase 1 adoption scope (per clarifications): Drift generation + Backup Set “Add Policies”.

Technical Context

Language/Version: PHP 8.4.15 (Laravel 12)
Primary Dependencies: Filament v4, Livewire v3
Storage: PostgreSQL (JSONB for run item_ids and failures)
Testing: Pest v4 (PHPUnit v12)
Target Platform: Web application (Sail-first locally, Dokploy-first deploy)
Project Type: web
Performance Goals: Monitoring/Operations index renders in <1s for the most recent ~250 runs; start actions confirm and provide a “View run” link within 2 seconds (aligns with SC-002).
Constraints: Monitoring pages are DB-only and view-only; strict tenant isolation; no secrets/tokens stored; run failures use stable reason codes + short sanitized messages; itemized runs store per-item failures (sanitized).
Scale/Scope: Tenant-scoped run history across multiple modules; Phase 1 covers drift generation + backup set “add policies”, with more run producers added in later phases.

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Inventory-first: PASS. Monitoring uses persisted run records; drift generation is based on inventory sync “last observed” state and stores findings (not raw snapshots).
Read/write separation: PASS. Monitoring/Operations is view-only (no start/rerun/cancel/delete). Write actions remain in their feature UIs and already use queued jobs + auditability.
Graph contract path: PASS (no new Graph calls introduced by Monitoring). Existing Graph calls remain behind existing abstractions and must not occur during Monitoring page render.
Deterministic capabilities: N/A for this feature (no new capability resolver). Existing idempotency key builder remains deterministic.
Tenant isolation: PASS. Run list/view/start remain tenant-scoped; cross-tenant access is forbidden (403).
Automation: PASS. Active-run de-duplication uses deterministic idempotency keys + partial unique indexes; runs remain observable with status + counts + safe errors.
Data minimization: PASS. Failures are sanitized/minimized; no secrets/tokens/raw external payload dumps stored or displayed.

Gate status (pre-Phase 0): PASS (no violations).

Gate status (post-Phase 1): PASS (design artifacts present: research.md, data-model.md, contracts/*, quickstart.md).

Project Structure

Documentation (this feature)

/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/
├── plan.md                     # This file (/speckit.plan command output)
├── spec.md                     # Feature specification (input)
├── checklists/
│   └── requirements.md         # Spec quality checklist
├── research.md                 # Phase 0 output
├── data-model.md               # Phase 1 output
├── quickstart.md               # Phase 1 output
├── contracts/                  # Phase 1 output
└── tasks.md                    # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)

Source Code (repository root)

app/
├── Filament/
│   ├── Pages/
│   └── Resources/
├── Jobs/
├── Models/
├── Policies/
├── Services/
└── Support/

config/

database/
└── migrations/

routes/

tests/
├── Feature/
└── Unit/

Structure Decision: Laravel web application. Implement Monitoring/Operations primarily via Filament (Resources/Pages) and reuse existing run-record primitives (bulk_operation_runs) with tenant-scoped policies and Pest tests.

Key Implementation Decisions (Pinned)

Phase 1 scope: Monitoring/Operations hub + Drift generation + Backup Set “Add Policies”.
Monitoring permissions: Owner, Manager, Operator, Readonly can view. Readonly is strictly view-only.
Monitoring guardrail: Monitoring/Operations is view-only in Phase 1 (no start/rerun/cancel/delete).
Status semantics: UI-level statuses are consistent and testable:
- partially succeeded = at least one success and at least one failure
- failed = zero successes (or the run could not proceed)
Failure detail: Stable reason codes + short sanitized messages; itemized operations include per-item failures (sanitized).

Execution Model

Run record primitive

Canonical run record: App\Models\BulkOperationRun (tenant-scoped) for Phase 1.
Producers in Phase 1:
- Drift generation: resource=drift, action=generate
- Backup Set “Add Policies”: resource=backup_set, action=add_policies (or existing canonical action naming)

Status mapping (storage ↔ UI semantics)

The UI MUST present consistent meanings while allowing storage to keep existing vocabulary:

pending → queued
running → running
completed → succeeded
completed_with_errors → partially succeeded
failed / aborted → failed

Idempotency & de-duplication

Deterministic idempotency key per tenant + operation type + scope via App\Support\RunIdempotency.
Active-run reuse: if an identical run is pending or running, reuse it (return the existing run and link to it).
Race reduction: rely on the existing partial unique index for active runs and handle collisions by “find existing and reuse”.

Notifications

Use DB notifications for “queued” and “completed” lifecycle events, linking to the run detail page.
Notifications and persisted run failures must remain sanitized (no secrets/tokens/raw payloads).

Monitoring/Operations hub

Central list + filters for the active tenant:
- filter by resource/action, status bucket (queued/running/succeeded/partial/failed), and time range
- drill-down to run detail (status + counts + sanitized failures + item identifiers)
View-only: no hub actions to start, rerun, cancel, or delete runs.

Definition of Done (ends at Phase 2 planning)

Phase 2 (MVP implementation readiness)

Monitoring/Operations navigation exists and lists tenant runs with the required filters and drill-down.
Role guardrail enforced: Readonly can view list + detail but has no action controls.
Status bucket semantics are consistent and testable (including partial vs failed).
Drift generation and Backup Set “Add Policies” runs are visible and linkable from their feature entry points and from Monitoring/Operations.
Design artifacts exist and are referenced by this plan:
- /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/research.md
- /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/data-model.md
- /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/contracts/
- /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/quickstart.md

Complexity Tracking

Fill ONLY if Constitution Check has violations that must be justified

N/A (no constitution violations anticipated)

7.8 KiB Raw Blame History