Summary This PR introduces Unified Operations Runs + Monitoring Hub (053). Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware). Phase 1 adoption scope (per spec): • Drift generation (drift.generate) • Backup Set “Add Policies” (backup_set.add_policies) Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental. ⸻ What changed Monitoring / Operations hub • Moved/organized run monitoring under Monitoring → Operations • Added: • status buckets (queued / running / succeeded / partially succeeded / failed) • filters (run type, status bucket, time range) • run detail “Related” links (e.g. Drift findings, Backup Set context) • All hub pages are DB-only and view-only (no rerun/cancel/delete actions) Canonical run semantics • Added canonical helpers on BulkOperationRun: • runType() (resource.action) • statusBucket() derived from status + counts (testable semantics) Drift integration (Phase 1) • Drift generation start behavior now: • creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids) • dispatches generation job • emits DB notifications including “View run” link • On generation failure: stores sanitized failure entries + sends failure notification Permissions / tenant isolation • Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access • Readonly can view runs but cannot start drift generation ⸻ Tests Added/updated Pest coverage: • BulkOperationRunStatusBucketTest.php • DriftGenerationDispatchTest.php • GenerateDriftFindingsJobNotificationTest.php • RunAuthorizationTenantIsolationTest.php Validation run locally: • ./vendor/bin/pint --dirty • targeted tests from feature quickstart / drift monitoring tests ⸻ Manual QA 1. Go to Monitoring → Operations • verify filters (run type / status / time range) • verify run detail shows counts + sanitized failures + “Related” links 2. Open Drift Landing • with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run” • as readonly: should not start generation 3. Run detail • drift.generate runs show “Drift findings” related link • failure entries are sanitized (no secrets/tokens/raw payload dumps) ⸻ Notes / Ops • Queue workers must be restarted after deploy so they load the new code: • php artisan queue:restart (or Sail equivalent) • This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern. ⸻ Spec / Docs • SpecKit artifacts added under specs/053-unify-runs-monitoring/ • Checklists are complete: • requirements checklist PASS • writing checklist PASS Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #60
8.1 KiB
| description |
|---|
| Task list for implementing Unified Operations Runs + Monitoring Hub (053) |
Tasks: Unified Operations Runs + Monitoring Hub (053)
Input: Design documents from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/
Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md
Tests: Not explicitly requested in spec.md. Add/adjust Pest tests as needed during implementation; validate with the existing test suite.
Organization: Tasks are grouped by user story so each story can be implemented and validated independently.
Format: - [ ] T### [P?] [US#?] Description with file path
- [P]: Can run in parallel (different files, no dependencies)
- [US#]: User story mapping (US1/US2/US3). Setup/Foundational/Polish tasks have no story label.
Path Conventions (Laravel)
- App code:
app/ - Filament admin:
app/Filament/ - Livewire:
app/Livewire/ - Jobs:
app/Jobs/ - DB:
database/migrations/ - Views:
resources/views/ - Tests (Pest):
tests/Feature/,tests/Unit/
Phase 1: Setup (Shared Infrastructure)
Purpose: Confirm baseline assumptions and align documentation artifacts with the codebase.
- T001 [P] Confirm “Monitoring/Operations hub = evolve BulkOperationRunResource” decision remains correct and update notes if needed in specs/053-unify-runs-monitoring/research.md
- T002 [P] Verify Filament URLs match contracts (index/view) and update specs/053-unify-runs-monitoring/contracts/admin-pages.openapi.yaml if paths differ
Phase 2: Foundational (Blocking Prerequisites)
Purpose: Shared building blocks required by all user stories.
⚠️ CRITICAL: No user story work should begin until this phase is complete.
- T003 Add
runType()andstatusBucket()accessors (queued/running/succeeded/partial/failed) to app/Models/BulkOperationRun.php - T004 [P] Confirm
Readonlyusers can view run list/detail tenant-scoped (and only view) by reviewing/updating app/Policies/BulkOperationRunPolicy.php
Checkpoint: Foundation ready — Monitoring UI and run producers can reuse consistent status semantics.
Phase 3: User Story 1 - Monitor operations in one place (Priority: P1) 🎯 MVP
Goal: Provide a single Monitoring/Operations area to list and drill into tenant runs with consistent status semantics and safe failure visibility.
Independent Test: Visit Monitoring → Operations for a tenant with runs; filter by type/status; open a run and confirm counts + sanitized failures are visible; verify Readonly sees view-only UI.
Implementation
- T005 [US1] Move Operations runs into “Monitoring” navigation and label it “Operations” in app/Filament/Resources/BulkOperationRunResource.php
- T006 [US1] Render status badges using
statusBucket()(not raw status) in app/Filament/Resources/BulkOperationRunResource.php - T007 [US1] Add filters for run type (
resource.action) and status bucket in app/Filament/Resources/BulkOperationRunResource.php - T008 [US1] Add time range filter (created_at from/to) in app/Filament/Resources/BulkOperationRunResource.php
- T009 [US1] Add a “Related” section on the run detail view linking to the relevant feature context (e.g., Backup Set for
backup_set.add_policies) in app/Filament/Resources/BulkOperationRunResource.php
Checkpoint: US1 complete — operators can monitor and drill into runs in one place.
Phase 4: User Story 2 - Start long-running actions without waiting (Priority: P2)
Goal: Starting a supported long-running operation is non-blocking and provides immediate confirmation + “View run” link; unauthorized users cannot start.
Independent Test: Trigger Drift generation and Backup Set “Add Policies”; confirm immediate feedback with “View run” link; confirm Readonly cannot start drift generation and no run is created.
Implementation
- T010 [US2] Prevent drift generation from being started by
Readonlyusers (blocked state + message) in app/Filament/Pages/DriftLanding.php - T011 [US2] Emit a queued DB notification with “View run” link when Drift generation is queued in app/Filament/Pages/DriftLanding.php
- T012 [P] [US2] Emit Drift completion and failure DB notifications with “View run” link in app/Jobs/GenerateDriftFindingsJob.php
Checkpoint: US2 complete — start UX is consistent and permission-gated.
Phase 5: User Story 3 - Drift generation is observable like other operations (Priority: P3)
Goal: Drift generation creates/reuses a run, surfaces safe failure details, and links operators to results.
Independent Test: Trigger Drift generation; observe it in Monitoring → Operations; open the run and follow a link to Drift findings; simulate failure and confirm safe failure reason is visible on the run.
Implementation
- T013 [US3] Store Drift context (scope_key, baseline_run_id, current_run_id) inside the run payload so Monitoring can link to results in app/Filament/Pages/DriftLanding.php
- T014 [P] [US3] Record a sanitized failure entry (reason_code + short message) into
BulkOperationRun.failureswhen Drift generation fails in app/Jobs/GenerateDriftFindingsJob.php - T015 [US3] Add a “Drift findings” link for
drift.generateruns in the run detail “Related” section in app/Filament/Resources/BulkOperationRunResource.php
Checkpoint: US3 complete — drift runs are actionable and consistent with other operations.
Phase 6: Polish & Cross-Cutting Concerns
Purpose: Final alignment, validation, and guardrails.
- T016 [P] Update operator-facing notes and validation commands in specs/053-unify-runs-monitoring/quickstart.md (only if implementation changes)
- T017 [P] Update docs to match implementation if needed: specs/053-unify-runs-monitoring/spec.md and specs/053-unify-runs-monitoring/data-model.md
- T018 Run formatting on changed PHP files with
./vendor/bin/pint --dirty(reference: specs/053-unify-runs-monitoring/quickstart.md) - T019 Run targeted validation commands from specs/053-unify-runs-monitoring/quickstart.md (queue worker optional; run relevant Pest tests)
- T020 [P] Re-verify contracts match real URLs and access behavior in specs/053-unify-runs-monitoring/contracts/admin-pages.openapi.yaml
Dependencies & Execution Order
Dependency Graph (User Stories)
Phase 1 (Setup) ─┬─> Phase 2 (Foundational) ─┬─> US1 (P1) ─┬─> Polish
│ ├─> US2 (P2) │
│ └─> US3 (P3) ┘
└────────────────────────────────────────────
User Story Dependencies
- US1 depends on Phase 2 (Foundational); independent of US2/US3.
- US2 depends on Phase 2 (Foundational); independent of US1/US3.
- US3 depends on Phase 2 (Foundational) and benefits from US1 (Monitoring visibility) but can be implemented independently.
Parallel Execution Examples
US1 (Monitoring UI)
After Phase 2 is complete, one developer can focus on:
- app/Filament/Resources/BulkOperationRunResource.php (T005–T009)
US2 (Start UX / Notifications)
These can be done in parallel after Phase 2:
- app/Filament/Pages/DriftLanding.php (T010–T011)
- app/Jobs/GenerateDriftFindingsJob.php (T012)
US3 (Drift observability)
These can be done in parallel after Phase 2:
- app/Filament/Pages/DriftLanding.php (T013)
- app/Jobs/GenerateDriftFindingsJob.php (T014)
Implementation Strategy
MVP First (US1 Only)
- Complete Phase 1 + Phase 2
- Complete US1 (Phase 3) and validate Monitoring/Operations end-to-end
- Ship/demonstrate Monitoring value before expanding run producer behavior
Incremental Delivery
- US1 (Monitoring hub) → validates visibility/auditability
- US2 (start guardrails + notifications) → standardizes operator feedback
- US3 (drift linking + safe failure detail) → makes drift runs fully actionable