# Implementation Plan: Unified Operations Runs + Monitoring Hub (053) **Branch**: `feat/053-unify-runs-monitoring` | **Date**: 2026-01-16 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md` ([spec.md](spec.md)) **Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/spec.md` **Note**: This plan is filled in by the `/speckit.plan` command. See `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/.specify/scripts/` for helper scripts. ## Summary Unify how long-running operations are tracked and monitored by using the existing tenant-scoped run record (`BulkOperationRun`) as the canonical “operation run”, surfacing it in a single Monitoring/Operations hub (view-only, tenant-scoped, role-aware), and standardizing status semantics, notifications, failure detail minimization, and idempotent de-duplication. Phase 1 adoption scope (per clarifications): Drift generation + Backup Set “Add Policies”. ## Technical Context **Language/Version**: PHP 8.4.15 (Laravel 12) **Primary Dependencies**: Filament v4, Livewire v3 **Storage**: PostgreSQL (JSONB for run `item_ids` and `failures`) **Testing**: Pest v4 (PHPUnit v12) **Target Platform**: Web application (Sail-first locally, Dokploy-first deploy) **Project Type**: web **Performance Goals**: Monitoring/Operations index renders in <1s for the most recent ~250 runs; start actions confirm and provide a “View run” link within 2 seconds (aligns with SC-002). **Constraints**: Monitoring pages are DB-only and view-only; strict tenant isolation; no secrets/tokens stored; run failures use stable reason codes + short sanitized messages; itemized runs store per-item failures (sanitized). **Scale/Scope**: Tenant-scoped run history across multiple modules; Phase 1 covers drift generation + backup set “add policies”, with more run producers added in later phases. ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* - Inventory-first: PASS. Monitoring uses persisted run records; drift generation is based on inventory sync “last observed” state and stores findings (not raw snapshots). - Read/write separation: PASS. Monitoring/Operations is view-only (no start/rerun/cancel/delete). Write actions remain in their feature UIs and already use queued jobs + auditability. - Graph contract path: PASS (no new Graph calls introduced by Monitoring). Existing Graph calls remain behind existing abstractions and must not occur during Monitoring page render. - Deterministic capabilities: N/A for this feature (no new capability resolver). Existing idempotency key builder remains deterministic. - Tenant isolation: PASS. Run list/view/start remain tenant-scoped; cross-tenant access is forbidden (403). - Automation: PASS. Active-run de-duplication uses deterministic idempotency keys + partial unique indexes; runs remain observable with status + counts + safe errors. - Data minimization: PASS. Failures are sanitized/minimized; no secrets/tokens/raw external payload dumps stored or displayed. **Gate status (pre-Phase 0)**: PASS (no violations). **Gate status (post-Phase 1)**: PASS (design artifacts present: research.md, data-model.md, contracts/*, quickstart.md). ## Project Structure ### Documentation (this feature) ```text /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/ ├── plan.md # This file (/speckit.plan command output) ├── spec.md # Feature specification (input) ├── checklists/ │ └── requirements.md # Spec quality checklist ├── research.md # Phase 0 output ├── data-model.md # Phase 1 output ├── quickstart.md # Phase 1 output ├── contracts/ # Phase 1 output └── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan) ``` ### Source Code (repository root) ```text app/ ├── Filament/ │ ├── Pages/ │ └── Resources/ ├── Jobs/ ├── Models/ ├── Policies/ ├── Services/ └── Support/ config/ database/ └── migrations/ routes/ tests/ ├── Feature/ └── Unit/ ``` **Structure Decision**: Laravel web application. Implement Monitoring/Operations primarily via Filament (Resources/Pages) and reuse existing run-record primitives (`bulk_operation_runs`) with tenant-scoped policies and Pest tests. ## Key Implementation Decisions (Pinned) - **Phase 1 scope**: Monitoring/Operations hub + Drift generation + Backup Set “Add Policies”. - **Monitoring permissions**: `Owner`, `Manager`, `Operator`, `Readonly` can view. `Readonly` is strictly view-only. - **Monitoring guardrail**: Monitoring/Operations is view-only in Phase 1 (no start/rerun/cancel/delete). - **Status semantics**: UI-level statuses are consistent and testable: - `partially succeeded` = at least one success and at least one failure - `failed` = zero successes (or the run could not proceed) - **Failure detail**: Stable reason codes + short sanitized messages; itemized operations include per-item failures (sanitized). ## Execution Model ### Run record primitive - Canonical run record: `App\Models\BulkOperationRun` (tenant-scoped) for Phase 1. - Producers in Phase 1: - Drift generation: `resource=drift`, `action=generate` - Backup Set “Add Policies”: `resource=backup_set`, `action=add_policies` (or existing canonical action naming) ### Status mapping (storage ↔ UI semantics) The UI MUST present consistent meanings while allowing storage to keep existing vocabulary: - `pending` → `queued` - `running` → `running` - `completed` → `succeeded` - `completed_with_errors` → `partially succeeded` - `failed` / `aborted` → `failed` ### Idempotency & de-duplication - Deterministic idempotency key per tenant + operation type + scope via `App\Support\RunIdempotency`. - Active-run reuse: if an identical run is `pending` or `running`, reuse it (return the existing run and link to it). - Race reduction: rely on the existing partial unique index for active runs and handle collisions by “find existing and reuse”. ### Notifications - Use DB notifications for “queued” and “completed” lifecycle events, linking to the run detail page. - Notifications and persisted run failures must remain sanitized (no secrets/tokens/raw payloads). ### Monitoring/Operations hub - Central list + filters for the active tenant: - filter by `resource`/`action`, status bucket (queued/running/succeeded/partial/failed), and time range - drill-down to run detail (status + counts + sanitized failures + item identifiers) - View-only: no hub actions to start, rerun, cancel, or delete runs. ## Definition of Done (ends at Phase 2 planning) ### Phase 2 (MVP implementation readiness) - Monitoring/Operations navigation exists and lists tenant runs with the required filters and drill-down. - Role guardrail enforced: `Readonly` can view list + detail but has no action controls. - Status bucket semantics are consistent and testable (including partial vs failed). - Drift generation and Backup Set “Add Policies” runs are visible and linkable from their feature entry points and from Monitoring/Operations. - Design artifacts exist and are referenced by this plan: - `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/research.md` - `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/data-model.md` - `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/contracts/` - `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/053-unify-runs-monitoring/quickstart.md` ## Complexity Tracking > **Fill ONLY if Constitution Check has violations that must be justified** N/A (no constitution violations anticipated)