ahmido 30ad57baab feat/053-unify-runs-monitoring (#60 )

Summary

This PR introduces Unified Operations Runs + Monitoring Hub (053).

Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware).

Phase 1 adoption scope (per spec):
	•	Drift generation (drift.generate)
	•	Backup Set “Add Policies” (backup_set.add_policies)

Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental.

⸻

What changed

Monitoring / Operations hub
	•	Moved/organized run monitoring under Monitoring → Operations
	•	Added:
	•	status buckets (queued / running / succeeded / partially succeeded / failed)
	•	filters (run type, status bucket, time range)
	•	run detail “Related” links (e.g. Drift findings, Backup Set context)
	•	All hub pages are DB-only and view-only (no rerun/cancel/delete actions)

Canonical run semantics
	•	Added canonical helpers on BulkOperationRun:
	•	runType() (resource.action)
	•	statusBucket() derived from status + counts (testable semantics)

Drift integration (Phase 1)
	•	Drift generation start behavior now:
	•	creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids)
	•	dispatches generation job
	•	emits DB notifications including “View run” link
	•	On generation failure: stores sanitized failure entries + sends failure notification

Permissions / tenant isolation
	•	Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access
	•	Readonly can view runs but cannot start drift generation

⸻

Tests

Added/updated Pest coverage:
	•	BulkOperationRunStatusBucketTest.php
	•	DriftGenerationDispatchTest.php
	•	GenerateDriftFindingsJobNotificationTest.php
	•	RunAuthorizationTenantIsolationTest.php

Validation run locally:
	•	./vendor/bin/pint --dirty
	•	targeted tests from feature quickstart / drift monitoring tests

⸻

Manual QA
	1.	Go to Monitoring → Operations
	•	verify filters (run type / status / time range)
	•	verify run detail shows counts + sanitized failures + “Related” links
	2.	Open Drift Landing
	•	with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run”
	•	as readonly: should not start generation
	3.	Run detail
	•	drift.generate runs show “Drift findings” related link
	•	failure entries are sanitized (no secrets/tokens/raw payload dumps)

⸻

Notes / Ops
	•	Queue workers must be restarted after deploy so they load the new code:
	•	php artisan queue:restart (or Sail equivalent)
	•	This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern.

⸻

Spec / Docs
	•	SpecKit artifacts added under specs/053-unify-runs-monitoring/
	•	Checklists are complete:
	•	requirements checklist PASS
	•	writing checklist PASS

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #60

2026-01-16 15:10:31 +00:00

8.1 KiB

Raw Permalink Blame History

description
Task list for implementing Unified Operations Runs + Monitoring Hub (053)

Tasks: Unified Operations Runs + Monitoring Hub (053)

Input: Design documents from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/
Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md

Tests: Not explicitly requested in spec.md. Add/adjust Pest tests as needed during implementation; validate with the existing test suite.

Organization: Tasks are grouped by user story so each story can be implemented and validated independently.

Format: `- [ ] T### [P?] [US#?] Description with file path`

[P]: Can run in parallel (different files, no dependencies)
[US#]: User story mapping (US1/US2/US3). Setup/Foundational/Polish tasks have no story label.

Path Conventions (Laravel)

App code: app/
Filament admin: app/Filament/
Livewire: app/Livewire/
Jobs: app/Jobs/
DB: database/migrations/
Views: resources/views/
Tests (Pest): tests/Feature/, tests/Unit/

Phase 1: Setup (Shared Infrastructure)

Purpose: Confirm baseline assumptions and align documentation artifacts with the codebase.

T001 [P] Confirm “Monitoring/Operations hub = evolve BulkOperationRunResource” decision remains correct and update notes if needed in specs/053-unify-runs-monitoring/research.md
T002 [P] Verify Filament URLs match contracts (index/view) and update specs/053-unify-runs-monitoring/contracts/admin-pages.openapi.yaml if paths differ

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Shared building blocks required by all user stories.

⚠️ CRITICAL: No user story work should begin until this phase is complete.

T003 Add runType() and statusBucket() accessors (queued/running/succeeded/partial/failed) to app/Models/BulkOperationRun.php
T004 [P] Confirm Readonly users can view run list/detail tenant-scoped (and only view) by reviewing/updating app/Policies/BulkOperationRunPolicy.php

Checkpoint: Foundation ready — Monitoring UI and run producers can reuse consistent status semantics.

Phase 3: User Story 1 - Monitor operations in one place (Priority: P1) 🎯 MVP

Goal: Provide a single Monitoring/Operations area to list and drill into tenant runs with consistent status semantics and safe failure visibility.

Independent Test: Visit Monitoring → Operations for a tenant with runs; filter by type/status; open a run and confirm counts + sanitized failures are visible; verify Readonly sees view-only UI.

Implementation

T005 [US1] Move Operations runs into “Monitoring” navigation and label it “Operations” in app/Filament/Resources/BulkOperationRunResource.php
T006 [US1] Render status badges using statusBucket() (not raw status) in app/Filament/Resources/BulkOperationRunResource.php
T007 [US1] Add filters for run type (resource.action) and status bucket in app/Filament/Resources/BulkOperationRunResource.php
T008 [US1] Add time range filter (created_at from/to) in app/Filament/Resources/BulkOperationRunResource.php
T009 [US1] Add a “Related” section on the run detail view linking to the relevant feature context (e.g., Backup Set for backup_set.add_policies) in app/Filament/Resources/BulkOperationRunResource.php

Checkpoint: US1 complete — operators can monitor and drill into runs in one place.

Phase 4: User Story 2 - Start long-running actions without waiting (Priority: P2)

Goal: Starting a supported long-running operation is non-blocking and provides immediate confirmation + “View run” link; unauthorized users cannot start.

Independent Test: Trigger Drift generation and Backup Set “Add Policies”; confirm immediate feedback with “View run” link; confirm Readonly cannot start drift generation and no run is created.

Implementation

T010 [US2] Prevent drift generation from being started by Readonly users (blocked state + message) in app/Filament/Pages/DriftLanding.php
T011 [US2] Emit a queued DB notification with “View run” link when Drift generation is queued in app/Filament/Pages/DriftLanding.php
T012 [P] [US2] Emit Drift completion and failure DB notifications with “View run” link in app/Jobs/GenerateDriftFindingsJob.php

Checkpoint: US2 complete — start UX is consistent and permission-gated.

Phase 5: User Story 3 - Drift generation is observable like other operations (Priority: P3)

Goal: Drift generation creates/reuses a run, surfaces safe failure details, and links operators to results.

Independent Test: Trigger Drift generation; observe it in Monitoring → Operations; open the run and follow a link to Drift findings; simulate failure and confirm safe failure reason is visible on the run.

Implementation

T013 [US3] Store Drift context (scope_key, baseline_run_id, current_run_id) inside the run payload so Monitoring can link to results in app/Filament/Pages/DriftLanding.php
T014 [P] [US3] Record a sanitized failure entry (reason_code + short message) into BulkOperationRun.failures when Drift generation fails in app/Jobs/GenerateDriftFindingsJob.php
T015 [US3] Add a “Drift findings” link for drift.generate runs in the run detail “Related” section in app/Filament/Resources/BulkOperationRunResource.php

Checkpoint: US3 complete — drift runs are actionable and consistent with other operations.

Phase 6: Polish & Cross-Cutting Concerns

Purpose: Final alignment, validation, and guardrails.

T016 [P] Update operator-facing notes and validation commands in specs/053-unify-runs-monitoring/quickstart.md (only if implementation changes)
T017 [P] Update docs to match implementation if needed: specs/053-unify-runs-monitoring/spec.md and specs/053-unify-runs-monitoring/data-model.md
T018 Run formatting on changed PHP files with ./vendor/bin/pint --dirty (reference: specs/053-unify-runs-monitoring/quickstart.md)
T019 Run targeted validation commands from specs/053-unify-runs-monitoring/quickstart.md (queue worker optional; run relevant Pest tests)
T020 [P] Re-verify contracts match real URLs and access behavior in specs/053-unify-runs-monitoring/contracts/admin-pages.openapi.yaml

Dependencies & Execution Order

Dependency Graph (User Stories)

Phase 1 (Setup) ─┬─> Phase 2 (Foundational) ─┬─> US1 (P1)  ─┬─> Polish
                  │                           ├─> US2 (P2)  │
                  │                           └─> US3 (P3)  ┘
                  └────────────────────────────────────────────

User Story Dependencies

US1 depends on Phase 2 (Foundational); independent of US2/US3.
US2 depends on Phase 2 (Foundational); independent of US1/US3.
US3 depends on Phase 2 (Foundational) and benefits from US1 (Monitoring visibility) but can be implemented independently.

Parallel Execution Examples

US1 (Monitoring UI)

After Phase 2 is complete, one developer can focus on:
- app/Filament/Resources/BulkOperationRunResource.php (T005–T009)

US2 (Start UX / Notifications)

These can be done in parallel after Phase 2:
- app/Filament/Pages/DriftLanding.php (T010–T011)
- app/Jobs/GenerateDriftFindingsJob.php (T012)

US3 (Drift observability)

These can be done in parallel after Phase 2:
- app/Filament/Pages/DriftLanding.php (T013)
- app/Jobs/GenerateDriftFindingsJob.php (T014)

8.1 KiB

Raw Permalink Blame History

Tasks: Unified Operations Runs + Monitoring Hub (053)

Format: `- [ ] T### [P?] [US#?] Description with file path`

Path Conventions (Laravel)

Phase 1: Setup (Shared Infrastructure)

Phase 2: Foundational (Blocking Prerequisites)

Phase 3: User Story 1 - Monitor operations in one place (Priority: P1) 🎯 MVP

Implementation

Phase 4: User Story 2 - Start long-running actions without waiting (Priority: P2)

Implementation

Phase 5: User Story 3 - Drift generation is observable like other operations (Priority: P3)

Implementation

Phase 6: Polish & Cross-Cutting Concerns

Dependencies & Execution Order

Dependency Graph (User Stories)

User Story Dependencies

Parallel Execution Examples

US1 (Monitoring UI)

US2 (Start UX / Notifications)

US3 (Drift observability)

Implementation Strategy

MVP First (US1 Only)

Incremental Delivery

8.1 KiB Raw Permalink Blame History Unescape Escape

Tasks: Unified Operations Runs + Monitoring Hub (053)

Format: - [ ] T### [P?] [US#?] Description with file path

Path Conventions (Laravel)

Phase 1: Setup (Shared Infrastructure)

Phase 2: Foundational (Blocking Prerequisites)

Phase 3: User Story 1 - Monitor operations in one place (Priority: P1) 🎯 MVP

Implementation

Phase 4: User Story 2 - Start long-running actions without waiting (Priority: P2)

Implementation

Phase 5: User Story 3 - Drift generation is observable like other operations (Priority: P3)

Implementation

Phase 6: Polish & Cross-Cutting Concerns

Dependencies & Execution Order

Dependency Graph (User Stories)

User Story Dependencies

Parallel Execution Examples

US1 (Monitoring UI)

US2 (Start UX / Notifications)

US3 (Drift observability)

Implementation Strategy

MVP First (US1 Only)

Incremental Delivery

8.1 KiB

Raw Permalink Blame History

Format: `- [ ] T### [P?] [US#?] Description with file path`