Summary
This PR introduces Unified Operations Runs + Monitoring Hub (053).
Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware).
Phase 1 adoption scope (per spec):
• Drift generation (drift.generate)
• Backup Set “Add Policies” (backup_set.add_policies)
Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental.
⸻
What changed
Monitoring / Operations hub
• Moved/organized run monitoring under Monitoring → Operations
• Added:
• status buckets (queued / running / succeeded / partially succeeded / failed)
• filters (run type, status bucket, time range)
• run detail “Related” links (e.g. Drift findings, Backup Set context)
• All hub pages are DB-only and view-only (no rerun/cancel/delete actions)
Canonical run semantics
• Added canonical helpers on BulkOperationRun:
• runType() (resource.action)
• statusBucket() derived from status + counts (testable semantics)
Drift integration (Phase 1)
• Drift generation start behavior now:
• creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids)
• dispatches generation job
• emits DB notifications including “View run” link
• On generation failure: stores sanitized failure entries + sends failure notification
Permissions / tenant isolation
• Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access
• Readonly can view runs but cannot start drift generation
⸻
Tests
Added/updated Pest coverage:
• BulkOperationRunStatusBucketTest.php
• DriftGenerationDispatchTest.php
• GenerateDriftFindingsJobNotificationTest.php
• RunAuthorizationTenantIsolationTest.php
Validation run locally:
• ./vendor/bin/pint --dirty
• targeted tests from feature quickstart / drift monitoring tests
⸻
Manual QA
1. Go to Monitoring → Operations
• verify filters (run type / status / time range)
• verify run detail shows counts + sanitized failures + “Related” links
2. Open Drift Landing
• with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run”
• as readonly: should not start generation
3. Run detail
• drift.generate runs show “Drift findings” related link
• failure entries are sanitized (no secrets/tokens/raw payload dumps)
⸻
Notes / Ops
• Queue workers must be restarted after deploy so they load the new code:
• php artisan queue:restart (or Sail equivalent)
• This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern.
⸻
Spec / Docs
• SpecKit artifacts added under specs/053-unify-runs-monitoring/
• Checklists are complete:
• requirements checklist PASS
• writing checklist PASS
Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #60
Summary
This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.
Why
We want predictable UX and operations at MSP scale:
• no timeouts / long-running requests
• reproducible run state + per-item results
• safe error persistence (no secrets / no token leakage)
• strict tenant isolation + auditability for write paths
What changed
Foundational (Runs + Idempotency + Observability)
• Added a shared RunIdempotency helper (dedupe while queued/running).
• Added a read-only BulkOperationRuns surface (list + view) for status/progress.
• Added DB notifications for run status changes (with “View run” link).
US1 – Policy “Capture snapshot” is job-only
• Policy detail “Capture snapshot” now:
• creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
• dispatches a queued job
• returns immediately with notification + link to run detail
• Graph capture work moved fully into the job; request path stays Graph-free.
US3 – Restore runs orchestration is job-only + safe
• Live restore execution is queued and updates RestoreRun status/progress.
• Per-item outcomes are persisted deterministically (per internal DB record).
• Audit logging is written for live restore.
• Preview/dry-run is enforced as read-only (no writes).
Tenant isolation / authorization (non-negotiable)
• Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
• Explicit Pest tests cover cross-tenant denial and start authorization.
Tests / Verification
• ./vendor/bin/pint --dirty
• Targeted suite (examples):
• policy capture snapshot queued + idempotency tests
• restore orchestration + audit logging + preview read-only tests
• run authorization / tenant isolation tests
Notes / Scope boundaries
• Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
• Resilience/backoff is tracked in tasks but can be iterated further after merge.
Review focus
• Dedupe behavior for queued/running runs (reuse vs create-new)
• Tenant scoping & policy gates for all run surfaces
• Restore safety: audit event + preview no-writes
Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56