Summary This PR introduces Unified Operations Runs + Monitoring Hub (053). Goal: Standardize how long-running operations are tracked and monitored using the existing tenant-scoped run record (BulkOperationRun) as the canonical “operation run”, and surface it in a single Monitoring → Operations hub (view-only, tenant-scoped, role-aware). Phase 1 adoption scope (per spec): • Drift generation (drift.generate) • Backup Set “Add Policies” (backup_set.add_policies) Note: This PR does not convert every run type yet (e.g. GroupSyncRuns / InventorySyncRuns remain separate for now). This is intentionally incremental. ⸻ What changed Monitoring / Operations hub • Moved/organized run monitoring under Monitoring → Operations • Added: • status buckets (queued / running / succeeded / partially succeeded / failed) • filters (run type, status bucket, time range) • run detail “Related” links (e.g. Drift findings, Backup Set context) • All hub pages are DB-only and view-only (no rerun/cancel/delete actions) Canonical run semantics • Added canonical helpers on BulkOperationRun: • runType() (resource.action) • statusBucket() derived from status + counts (testable semantics) Drift integration (Phase 1) • Drift generation start behavior now: • creates/reuses a BulkOperationRun with drift context payload (scope_key + baseline/current run ids) • dispatches generation job • emits DB notifications including “View run” link • On generation failure: stores sanitized failure entries + sends failure notification Permissions / tenant isolation • Monitoring run list/view is tenant-scoped and returns 403 for cross-tenant access • Readonly can view runs but cannot start drift generation ⸻ Tests Added/updated Pest coverage: • BulkOperationRunStatusBucketTest.php • DriftGenerationDispatchTest.php • GenerateDriftFindingsJobNotificationTest.php • RunAuthorizationTenantIsolationTest.php Validation run locally: • ./vendor/bin/pint --dirty • targeted tests from feature quickstart / drift monitoring tests ⸻ Manual QA 1. Go to Monitoring → Operations • verify filters (run type / status / time range) • verify run detail shows counts + sanitized failures + “Related” links 2. Open Drift Landing • with >=2 successful inventory runs for scope: should queue drift generation + show notification with “View run” • as readonly: should not start generation 3. Run detail • drift.generate runs show “Drift findings” related link • failure entries are sanitized (no secrets/tokens/raw payload dumps) ⸻ Notes / Ops • Queue workers must be restarted after deploy so they load the new code: • php artisan queue:restart (or Sail equivalent) • This PR standardizes monitoring for Phase 1 producers only; follow-ups will migrate additional run types into the unified pattern. ⸻ Spec / Docs • SpecKit artifacts added under specs/053-unify-runs-monitoring/ • Checklists are complete: • requirements checklist PASS • writing checklist PASS Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #60
151 lines
7.8 KiB
Markdown
151 lines
7.8 KiB
Markdown
# Implementation Plan: Unified Operations Runs + Monitoring Hub (053)
|
|
|
|
**Branch**: `feat/053-unify-runs-monitoring` | **Date**: 2026-01-16 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md` ([spec.md](spec.md))
|
|
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md`
|
|
|
|
**Note**: This plan is filled in by the `/speckit.plan` command. See `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/.specify/scripts/` for helper scripts.
|
|
|
|
## Summary
|
|
|
|
Unify how long-running operations are tracked and monitored by using the existing tenant-scoped run record (`BulkOperationRun`) as the canonical “operation run”, surfacing it in a single Monitoring/Operations hub (view-only, tenant-scoped, role-aware), and standardizing status semantics, notifications, failure detail minimization, and idempotent de-duplication.
|
|
|
|
Phase 1 adoption scope (per clarifications): Drift generation + Backup Set “Add Policies”.
|
|
|
|
## Technical Context
|
|
|
|
**Language/Version**: PHP 8.4.15 (Laravel 12)
|
|
**Primary Dependencies**: Filament v4, Livewire v3
|
|
**Storage**: PostgreSQL (JSONB for run `item_ids` and `failures`)
|
|
**Testing**: Pest v4 (PHPUnit v12)
|
|
**Target Platform**: Web application (Sail-first locally, Dokploy-first deploy)
|
|
**Project Type**: web
|
|
**Performance Goals**: Monitoring/Operations index renders in <1s for the most recent ~250 runs; start actions confirm and provide a “View run” link within 2 seconds (aligns with SC-002).
|
|
**Constraints**: Monitoring pages are DB-only and view-only; strict tenant isolation; no secrets/tokens stored; run failures use stable reason codes + short sanitized messages; itemized runs store per-item failures (sanitized).
|
|
**Scale/Scope**: Tenant-scoped run history across multiple modules; Phase 1 covers drift generation + backup set “add policies”, with more run producers added in later phases.
|
|
|
|
## Constitution Check
|
|
|
|
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
|
|
|
- Inventory-first: PASS. Monitoring uses persisted run records; drift generation is based on inventory sync “last observed” state and stores findings (not raw snapshots).
|
|
- Read/write separation: PASS. Monitoring/Operations is view-only (no start/rerun/cancel/delete). Write actions remain in their feature UIs and already use queued jobs + auditability.
|
|
- Graph contract path: PASS (no new Graph calls introduced by Monitoring). Existing Graph calls remain behind existing abstractions and must not occur during Monitoring page render.
|
|
- Deterministic capabilities: N/A for this feature (no new capability resolver). Existing idempotency key builder remains deterministic.
|
|
- Tenant isolation: PASS. Run list/view/start remain tenant-scoped; cross-tenant access is forbidden (403).
|
|
- Automation: PASS. Active-run de-duplication uses deterministic idempotency keys + partial unique indexes; runs remain observable with status + counts + safe errors.
|
|
- Data minimization: PASS. Failures are sanitized/minimized; no secrets/tokens/raw external payload dumps stored or displayed.
|
|
|
|
**Gate status (pre-Phase 0)**: PASS (no violations).
|
|
|
|
**Gate status (post-Phase 1)**: PASS (design artifacts present: research.md, data-model.md, contracts/*, quickstart.md).
|
|
|
|
## Project Structure
|
|
|
|
### Documentation (this feature)
|
|
|
|
```text
|
|
/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/
|
|
├── plan.md # This file (/speckit.plan command output)
|
|
├── spec.md # Feature specification (input)
|
|
├── checklists/
|
|
│ └── requirements.md # Spec quality checklist
|
|
├── research.md # Phase 0 output
|
|
├── data-model.md # Phase 1 output
|
|
├── quickstart.md # Phase 1 output
|
|
├── contracts/ # Phase 1 output
|
|
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
|
|
```
|
|
|
|
### Source Code (repository root)
|
|
```text
|
|
app/
|
|
├── Filament/
|
|
│ ├── Pages/
|
|
│ └── Resources/
|
|
├── Jobs/
|
|
├── Models/
|
|
├── Policies/
|
|
├── Services/
|
|
└── Support/
|
|
|
|
config/
|
|
|
|
database/
|
|
└── migrations/
|
|
|
|
routes/
|
|
|
|
tests/
|
|
├── Feature/
|
|
└── Unit/
|
|
```
|
|
|
|
**Structure Decision**: Laravel web application. Implement Monitoring/Operations primarily via Filament (Resources/Pages) and reuse existing run-record primitives (`bulk_operation_runs`) with tenant-scoped policies and Pest tests.
|
|
|
|
## Key Implementation Decisions (Pinned)
|
|
|
|
- **Phase 1 scope**: Monitoring/Operations hub + Drift generation + Backup Set “Add Policies”.
|
|
- **Monitoring permissions**: `Owner`, `Manager`, `Operator`, `Readonly` can view. `Readonly` is strictly view-only.
|
|
- **Monitoring guardrail**: Monitoring/Operations is view-only in Phase 1 (no start/rerun/cancel/delete).
|
|
- **Status semantics**: UI-level statuses are consistent and testable:
|
|
- `partially succeeded` = at least one success and at least one failure
|
|
- `failed` = zero successes (or the run could not proceed)
|
|
- **Failure detail**: Stable reason codes + short sanitized messages; itemized operations include per-item failures (sanitized).
|
|
|
|
## Execution Model
|
|
|
|
### Run record primitive
|
|
|
|
- Canonical run record: `App\Models\BulkOperationRun` (tenant-scoped) for Phase 1.
|
|
- Producers in Phase 1:
|
|
- Drift generation: `resource=drift`, `action=generate`
|
|
- Backup Set “Add Policies”: `resource=backup_set`, `action=add_policies` (or existing canonical action naming)
|
|
|
|
### Status mapping (storage ↔ UI semantics)
|
|
|
|
The UI MUST present consistent meanings while allowing storage to keep existing vocabulary:
|
|
|
|
- `pending` → `queued`
|
|
- `running` → `running`
|
|
- `completed` → `succeeded`
|
|
- `completed_with_errors` → `partially succeeded`
|
|
- `failed` / `aborted` → `failed`
|
|
|
|
### Idempotency & de-duplication
|
|
|
|
- Deterministic idempotency key per tenant + operation type + scope via `App\Support\RunIdempotency`.
|
|
- Active-run reuse: if an identical run is `pending` or `running`, reuse it (return the existing run and link to it).
|
|
- Race reduction: rely on the existing partial unique index for active runs and handle collisions by “find existing and reuse”.
|
|
|
|
### Notifications
|
|
|
|
- Use DB notifications for “queued” and “completed” lifecycle events, linking to the run detail page.
|
|
- Notifications and persisted run failures must remain sanitized (no secrets/tokens/raw payloads).
|
|
|
|
### Monitoring/Operations hub
|
|
|
|
- Central list + filters for the active tenant:
|
|
- filter by `resource`/`action`, status bucket (queued/running/succeeded/partial/failed), and time range
|
|
- drill-down to run detail (status + counts + sanitized failures + item identifiers)
|
|
- View-only: no hub actions to start, rerun, cancel, or delete runs.
|
|
|
|
## Definition of Done (ends at Phase 2 planning)
|
|
|
|
### Phase 2 (MVP implementation readiness)
|
|
|
|
- Monitoring/Operations navigation exists and lists tenant runs with the required filters and drill-down.
|
|
- Role guardrail enforced: `Readonly` can view list + detail but has no action controls.
|
|
- Status bucket semantics are consistent and testable (including partial vs failed).
|
|
- Drift generation and Backup Set “Add Policies” runs are visible and linkable from their feature entry points and from Monitoring/Operations.
|
|
- Design artifacts exist and are referenced by this plan:
|
|
- `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/research.md`
|
|
- `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/data-model.md`
|
|
- `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/contracts/`
|
|
- `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/quickstart.md`
|
|
|
|
## Complexity Tracking
|
|
|
|
> **Fill ONLY if Constitution Check has violations that must be justified**
|
|
|
|
N/A (no constitution violations anticipated)
|