diff --git a/specs/052-async-add-policies/checklists/requirements.md b/specs/052-async-add-policies/checklists/requirements.md new file mode 100644 index 0000000..50ebed6 --- /dev/null +++ b/specs/052-async-add-policies/checklists/requirements.md @@ -0,0 +1,32 @@ +# Specification Quality Checklist: Async “Add Policies” to Backup Set (052) + +**Purpose**: Validate specification completeness and quality before proceeding to implementation +**Created**: 2026-01-14 +**Feature**: [specs/052-async-add-policies/spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details leaked into user requirements (spec focuses on behaviors, run observability, and safety rules) +- [x] Focused on user value (fast submit, dedupe, observable progress, safe failures) +- [x] Written in operator language (Backup Sets, Add Policies, runs, failures) +- [x] Mandatory sections present (User Scenarios & Testing, Requirements, Success Criteria) + +## Requirement Completeness + +- [x] Scope is bounded (explicit in-scope/out-of-scope list) +- [x] Requirements are testable and unambiguous (FR-001..FR-005 + NFRs + acceptance scenarios) +- [x] Idempotency and dedupe behavior is explicitly specified +- [x] Tenant isolation and authorization are explicitly specified +- [x] Data minimization and safe logging rules are explicitly specified +- [x] “No UI-time Graph calls” is explicitly specified and test-gated + +## Feature Readiness + +- [x] Plan includes concrete touch points and test strategy (plan.md) +- [x] Tasks are grouped by user story and prioritize tests-first (tasks.md) + +## Notes + +- Items marked incomplete require spec updates before implementation. +- Constitution gate: this checklist must exist for features that change runtime behavior. + diff --git a/specs/052-async-add-policies/plan.md b/specs/052-async-add-policies/plan.md new file mode 100644 index 0000000..c60222f --- /dev/null +++ b/specs/052-async-add-policies/plan.md @@ -0,0 +1,108 @@ +# Implementation Plan: Async “Add Policies” to Backup Set (052) + +**Branch**: `feat/052-async-add-policies` | **Date**: 2026-01-14 | **Spec**: [specs/052-async-add-policies/spec.md](spec.md) +**Input**: Feature specification from `/specs/052-async-add-policies/spec.md` + +## Summary + +Convert Backup Set → Add Policies (“Add selected”) to job-only execution by creating/reusing a `BulkOperationRun`, dispatching a queued job to do Graph/snapshot work, and returning immediately with observable feedback and a “View run” link. + +## Technical Context + +**Language/Version**: PHP 8.4 (Laravel 12) +**Admin UI**: Filament v4 + Livewire v3 +**Storage**: PostgreSQL +**Queue**: Laravel queues (Sail-first locally) +**Testing**: Pest v4 +**Constraints**: No Graph calls during interactive request/response; tenant isolation; idempotent dedupe; safe error persistence. + +## Constitution Check + +- Inventory-first: this feature updates backup storage (explicit snapshot capture), never “inventory on render”. +- Read/write separation: interactive action is “write intent only” (enqueue); job performs DB writes with auditability. +- Graph contract path: any Graph reads happen only inside the job via `GraphClientInterface` (indirectly via existing services). +- Deterministic capabilities: selection hashing and idempotency key are deterministic. +- Tenant isolation: run records and backup set mutations are scoped to the active tenant. +- Automation: job has run record lifecycle + counts + failures; dedupe prevents duplicates. +- Data minimization: failures and notifications store sanitized messages only. + +## Project Structure + +### Documentation (this feature) + +```text +specs/052-async-add-policies/ +├── plan.md +├── spec.md +├── tasks.md +└── checklists/ + └── requirements.md +``` + +### Source Code (planned touch points) + +```text +app/Livewire/BackupSetPolicyPickerTable.php # “Add selected” handler becomes enqueue-only +app/Jobs/AddPoliciesToBackupSetJob.php # new queued job (name may vary) +app/Models/BulkOperationRun.php # reused run record +app/Services/BulkOperationService.php # reused counters + failure sanitization +app/Support/RunIdempotency.php # reused deterministic key builder + active-run lookup +app/Filament/Resources/BulkOperationRunResource.php # existing run UI used for “View run” +tests/Feature/Filament/BackupSetPolicyPickerTableTest.php # updated for async behavior +tests/Feature/* # new/updated tests for idempotency + tenant isolation +``` + +## Execution Model + +### Action Handler (UI request) + +Responsibilities (must remain fast): +1. Resolve tenant + user context and authorize “add policies to backup set”. +2. Convert selected policy records into an ID list (no Graph/snapshot work). +3. Compute a deterministic idempotency key using: + - tenant id + - target backup_set_id + - operation type (e.g., `backup_set.add_policies`) + - context payload containing (sorted) policy IDs + option flags (include assignments/scope tags/foundations) +4. Reuse an active run if one exists; otherwise create a new `BulkOperationRun` with `resource=backup_set`, `action=add_policies`, `item_ids` = selected policy IDs, `total_items` = count, and `idempotency_key` set. +5. Dispatch a queued job (always async; no sync shortcut) with the run id + backup_set_id + option flags. +6. Return immediately with a Filament notification and a “View run” link (DB notification preferred). + +### Queued Job + +Responsibilities (all heavy work): +1. Load the `BulkOperationRun` and ensure it is still active (`pending` → `running`). +2. Validate the backup set exists and belongs to the run tenant. +3. Process the run’s stored policy IDs deterministically (sequentially or chunked): + - Add/capture each policy into the backup set using existing capture services. + - Update counts and record per-item failures with safe reasons. + - Continue through all items (no circuit-breaker abort). +4. Complete the run with: + - `completed` if no failures + - `completed_with_errors` (partial) if some failures + - `failed` if the job cannot proceed (e.g., backup set missing) +5. Emit a DB notification to the initiating user summarizing the outcome and linking to the run. + +### Idempotency & Concurrency + +- Primary dedupe mechanism: `RunIdempotency::findActiveBulkOperationRun(tenant_id, idempotency_key)`. +- Secondary guard: existing partial unique index on `(tenant_id, idempotency_key)` for active statuses. +- Race handling: if concurrent submissions collide, prefer “find existing run and redirect” over throwing. +- Dedupe scope: tenant-wide (idempotency key does not include `user_id`). + +## Testing Strategy (Pest) + +Minimum required tests for 052: +- Action dispatch test: `Add selected` queues the job and creates/reuses a run. +- Fail-hard Graph guard test: binding `GraphClientInterface` (or a capture service) to a mock that must not be called during the action. +- Idempotency test: calling the action twice with the same selection queues only one job and creates only one active run. +- Tenant isolation test: run view under another tenant context is forbidden (403). + +Target commands: +- `./vendor/bin/sail artisan test --filter=BackupSetAddPolicies` +- `./vendor/bin/pint --dirty` + +## Rollout Notes + +- Requires queue workers running for background processing (Sail locally; Dokploy workers in staging/prod). +- No destructive migrations expected; if schema is extended for better observability, it must remain backwards compatible. diff --git a/specs/052-async-add-policies/spec.md b/specs/052-async-add-policies/spec.md new file mode 100644 index 0000000..78d4b73 --- /dev/null +++ b/specs/052-async-add-policies/spec.md @@ -0,0 +1,196 @@ +# Feature Specification: Async “Add Policies” to Backup Set (052) + +**Feature Branch**: `feat/052-async-add-policies` +**Created**: 2026-01-14 +**Status**: Draft (implementation-ready) +**Input**: Make Backup Sets → “Add Policies” (Add selected) non-blocking by moving all Graph/snapshot work into a queued job. The UI action only creates/reuses a Run record and dispatches the job. + +## Purpose + +Make “Add selected” in the Backup Set → Add Policies flow reliable and fast by ensuring heavy work (Graph reads, snapshot capture, per-policy loops) never runs inline in an interactive request. + +Primary outcomes: +- Prevent request timeouts and long waits. +- Prevent duplicate work from double-clicks/retries. +- Provide observable progress and safe failure visibility via an existing Run record type. + +## Clarifications + +### Session 2026-01-15 + +- Q: Who may start “Add selected”? → A: Roles `Owner`, `Manager`, `Operator` may start; `Readonly` may not. +- Q: Is idempotency/dedupe per-user or tenant-wide? → A: Tenant-wide (dedupe key does not include `user_id`). +- Q: Should execution ever run synchronously for small selections? → A: No; always async (never in-request). +- Q: What happens for empty selection? → A: No run/job; show “No policies selected” (current behavior). +- Q: How are run counts defined? → A: `total_items` equals the number of selected policies at submit time; “already in backup set” counts as `skipped`. +- Q: Do we use a circuit breaker (e.g., abort if >50% fails)? → A: No; process all items and surface partial results. +- Q: How should failures be persisted? → A: Stable `reason_code` + sanitized short text; never store secrets/tokens/raw payload dumps. + +## Pinned Decisions (052 defaults) + +- **Authorization**: start allowed for `Owner`, `Manager`, `Operator`; forbidden for `Readonly`. +- **Dedupe scope**: tenant-wide idempotency (no `user_id` in dedupe key). +- **Execution**: always async; action never performs Graph/snapshot work inline. +- **Empty selection**: no run/job; show “No policies selected”. +- **Counts**: `total_items` = selected policies; “already in backup set” = `skipped`. +- **Circuit breaker**: none; job processes all items. +- **Failure format**: `reason_code` + sanitized short text (no secrets). + +## In Scope (052) + +- Convert the “Add selected” action (Backup Sets → Add Policies modal) to job-only execution. +- Ensure an observable Run exists (status, counts, errors) using an existing run record type (prefer `BulkOperationRun`). +- Ensure idempotency: repeated clicks while queued/running reuse the same run for the same tenant + backup set + selection. +- Emit DB notification + optional existing progress widget integration (if already supported by the run type); no new UI framework work required. +- Add guard tests ensuring no Graph calls occur in the action handler/request. + +## Out of Scope (052) + +- UI redesign/polish (layout, navigation reorg, fancy tables, etc.) +- New Monitoring area or unified operations dashboard +- Group browsing/typeahead improvements (directory cache / later) +- New run tables if an existing run record can be reused +- Changing what a “policy version” means or how snapshots are stored + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Add selected policies without blocking (Priority: P1) + +As a tenant admin, I can add selected policies to a backup set without the UI hanging or timing out, because the system queues background work and gives me a Run record to monitor. + +**Why this priority**: This is a common workflow and currently risks long requests/timeouts when Graph capture is slow or throttled. + +**Independent Test**: Trigger “Add selected” and assert the request returns quickly, a Run exists, and a job is queued (no Graph work happens inline). + +**Acceptance Scenarios**: + +1. **Given** I am on a Backup Set and select policies in “Add Policies”, **When** I click “Add selected”, **Then** the UI returns quickly and a queued Run is created (or reused) with a link to its detail view. +2. **Given** the job runs, **When** it completes, **Then** the backup set contains the added policies and the Run shows final status + counts. + +--- + +### User Story 2 - Double click / repeated submissions are deduplicated (Priority: P2) + +As a tenant admin, I can click “Add selected” repeatedly (or double click) without creating duplicate work, because identical operations reuse the same active Run. + +**Why this priority**: Prevents accidental duplication, inconsistent outcomes, and unnecessary Graph load. + +**Independent Test**: Call the action twice with the same selection while the first run is active and assert only one Run and one queued job. + +**Acceptance Scenarios**: + +1. **Given** a matching Run for the same tenant + backup set + selection is queued or running, **When** I click “Add selected” again, **Then** the system reuses the existing Run and does not enqueue duplicate work. + +--- + +### User Story 3 - Failures are visible and safe (Priority: P3) + +As a tenant admin, I can see safe failure summaries when some policies cannot be captured from Graph, without secrets or raw payloads being stored or displayed. + +**Why this priority**: Operators must be able to triage issues safely; secret leakage is unacceptable. + +**Independent Test**: Force a simulated Graph failure in the job and assert the Run ends as failed/partial with a sanitized reason code/summary (no secrets). + +**Acceptance Scenarios**: + +1. **Given** Graph returns an error for some policies, **When** the job completes, **Then** the Run is marked partial/completed-with-errors and includes per-item failures with safe reason codes/summaries. +2. **Given** a failure includes sensitive substrings (e.g., “Bearer ”), **When** it is persisted, **Then** stored reasons are redacted/sanitized. + +--- + +### Edge Cases + +- Empty selection (no policies selected) +- Backup set deleted or tenant context missing between queue and execution +- Some selected policies are already in the backup set +- Policies deleted/ignored locally between queue and execution +- Graph throttling/transient failures (429/503) during capture + +## Requirements *(mandatory)* + +**Constitution alignment (required):** This feature performs Graph reads and writes local DB state, so it MUST be idempotent & observable, tenant-scoped, and safe-loggable. Graph calls MUST go through `GraphClientInterface` and MUST NOT occur during UI render or action request handling. + +### Functional Requirements + +- **FR-001 (Job-only action handler)**: The “Add selected” handler MUST: + - validate input + authorization + - create/reuse a Run record + - dispatch a queued job + - return immediately with a notification and a link to the Run + It MUST NOT: + - call `GraphClientInterface` + - call snapshot capture services + - loop over selected policies to do work inline + - run a synchronous “small selection” shortcut + +- **FR-002 (Run record + observability)**: The system MUST persist for each run: + - `tenant_id`, `initiator_user_id` (or equivalent) + - `resource = backup_set`, `action = add_policies` (or existing taxonomy) + - status lifecycle: queued → running → succeeded|failed|partial (map to existing run status semantics where possible, e.g., queued=`pending`, succeeded=`completed`, partial=`completed_with_errors`) + - counts: `total_items` = selected policies at submit time; `processed_items` = succeeded+failed+skipped; “already in backup set” increments `skipped` + - safe error context (summary + per-item outcomes at least for failures) + +- **FR-003 (Idempotency / dedupe)**: While a matching run is queued or running, the action MUST reuse it and MUST NOT enqueue duplicate work. + + Recommended dedupe key: `tenant_id + backup_set_id + operation_type(add_policies) + selection_hash`. + +- **FR-004 (Job execution semantics)**: The queued job MUST: + - load the selection deterministically (IDs or a stored selection payload) + - process items sequentially or in safe batches + - update run progress counts as it goes + - process all items (no circuit-breaker abort) + - record per-item outcomes (at minimum: failure entries with stable `reason_code` + sanitized short text) + + **Reason codes (minimum set):** + - `already_in_backup_set` (skipped) + - `policy_not_found` (failed) + - `policy_ignored` (skipped) + - `backup_set_not_found` (failed) + - `backup_set_archived` (failed) + - `graph_forbidden` (failed) + - `graph_throttled` (failed) + - `graph_transient` (failed) + - `unknown` (failed) + +- **FR-005 (User-visible feedback)**: On action submit, the UI MUST: + - show “Queued” feedback + - provide a “View run” link (DB notification preferred) + +### Non-Functional Requirements + +- **NFR-001 (Tenant isolation and authorization)**: + - Run list/view MUST be tenant-scoped + - Cross-tenant run access MUST be denied (403) + - Only authorized roles can start “Add Policies” + +- **NFR-002 (Data minimization & safe logging)**: + - No access tokens, “Bearer ” strings, or raw Graph payload dumps in notifications, run failures, or logs + - Persist only sanitized/allowlisted error fields and stable identifiers + +- **NFR-003 (Determinism)**: Given the same input selection, results are reproducible and the dedupe key is stable. + +- **NFR-004 (No UI-time Graph calls)**: Rendering the Backup Set UI and Run detail pages MUST not require Graph calls. + +### Key Entities *(include if feature involves data)* + +- **BulkOperationRun**: Existing run record used for long-running operations (status + counts + failures). +- **BackupSet**: Target set to which policies are added. +- **BackupItem**: Persisted item rows representing a policy in a backup set. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001 (Fast submit)**: Clicking “Add selected” returns without timing out and does not perform Graph work in-request (proved by guard tests). +- **SC-002 (Observable run)**: A Run is created/reused and is visible in the UI with status and progress counts. +- **SC-003 (No duplicates)**: Double-click does not create duplicate runs/jobs (proved by idempotency tests). +- **SC-004 (Safe failures)**: Failures show safe reason codes/summaries and do not store secrets/tokens (proved by sanitization tests). + +## Rollout / Migration + +- No destructive migrations required for 052. +- If additional idempotency indexing is needed beyond existing run infrastructure, add a minimal migration (backwards compatible). + +## Open Questions (Optional) + +- Should progress appear in an existing global progress widget (only if already supported; new widget work is out of scope)? diff --git a/specs/052-async-add-policies/tasks.md b/specs/052-async-add-policies/tasks.md new file mode 100644 index 0000000..d80bfa4 --- /dev/null +++ b/specs/052-async-add-policies/tasks.md @@ -0,0 +1,104 @@ +--- + +description: "Task list for implementing Async “Add Policies” to Backup Set (052)" +--- + +# Tasks: Async “Add Policies” to Backup Set (052) + +**Input**: Design documents from `/specs/052-async-add-policies/` +**Prerequisites**: plan.md (required), spec.md (required) + +**Tests**: Required (Pest), per spec.md (SC-001..SC-004). + +**Organization**: Tasks are grouped by user story so each story can be implemented and tested independently. + +## Format: `- [ ] T### [P?] [US#] Description (with file path)` + +- **[P]**: Can run in parallel (different files, no dependencies) +- **[US#]**: User story mapping (US1/US2/US3) + +## Path Conventions (Laravel) + +- App code: `app/` +- DB: `database/migrations/` +- Filament admin: `app/Filament/Resources/` +- Livewire tables: `app/Livewire/` +- Tests (Pest): `tests/Feature/`, `tests/Unit/` + +--- + +## Phase 1: Setup (Shared Infrastructure) + +- [ ] T001 [P] Confirm existing run infra supports this operation (BulkOperationRun statuses + unique idempotency index) and document mapping queued↔pending, partial↔completed_with_errors in specs/052-async-add-policies/plan.md +- [ ] T002 [P] Decide and standardize taxonomy strings: operationType (`backup_set.add_policies`), resource (`backup_set`), action (`add_policies`) in specs/052-async-add-policies/spec.md and plan.md + +--- + +## Phase 2: User Story 1 - Add selected policies without blocking (Priority: P1) 🎯 MVP + +**Goal**: “Add selected” returns quickly and queues background work with an observable Run record. + +**Independent Test**: Trigger the action and assert a job is queued and a run exists; no Graph/capture work occurs in-request. + +### Tests (write first) ⚠️ + +- [ ] T010 [P] [US1] Update/replace sync-path assertions in tests/Feature/Filament/BackupSetPolicyPickerTableTest.php to assert job dispatch + run creation +- [ ] T011 [P] [US1] Add fail-hard guard test to ensure no Graph calls occur during the bulk action (mock `App\\Services\\Graph\\GraphClientInterface` and assert `->never()`) + +### Implementation + +- [ ] T020 [US1] Create queued job to add policies to a backup set in app/Jobs/AddPoliciesToBackupSetJob.php (uses run lifecycle + safe failures via BulkOperationService) +- [ ] T021 [US1] Update bulk action handler in app/Livewire/BackupSetPolicyPickerTable.php to create/reuse BulkOperationRun and dispatch AddPoliciesToBackupSetJob (no inline snapshot capture) +- [ ] T022 [US1] Emit “queued” notification with “View run” link on submit (Filament DB notification preferred) from app/Livewire/BackupSetPolicyPickerTable.php +- [ ] T023 [US1] Emit completion/failure DB notification from app/Jobs/AddPoliciesToBackupSetJob.php with a safe summary + +**Checkpoint**: US1 complete — submit is non-blocking, job executes, run is visible. + +--- + +## Phase 3: User Story 2 - Double click / repeated submissions are deduplicated (Priority: P2) + +**Goal**: Matching queued/running operations reuse the same run and do not enqueue duplicates. + +**Independent Test**: Call the action twice for the same tenant + backup set + selection and assert only one run and one job dispatch. + +### Tests ⚠️ + +- [ ] T030 [P] [US2] Add idempotency test in tests/Feature/BackupSets/BackupSetAddPoliciesIdempotencyTest.php (or extend existing picker test) asserting one run + one queued job + +### Implementation + +- [ ] T031 [US2] Implement deterministic selection hashing and idempotency key creation using app/Support/RunIdempotency.php (sorted policy ids + option flags), and reuse active runs via findActiveBulkOperationRun +- [ ] T032 [US2] Handle race conditions safely (unique index collisions) by recovering the existing run rather than failing the request +- [ ] T033 [US2] Ensure the action is always async (no `dispatchSync` path for small selections) in app/Livewire/BackupSetPolicyPickerTable.php + +**Checkpoint**: US2 complete — double clicks are safe and deduped. + +--- + +## Phase 4: User Story 3 - Failures are visible and safe (Priority: P3) + +**Goal**: Failures are persisted safely and tenant isolation is enforced for run visibility. + +**Independent Test**: Force failure paths and confirm safe failures persisted; cross-tenant access is forbidden. + +### Tests ⚠️ + +- [ ] T040 [P] [US3] Add tenant isolation test for the created run (403 cross-tenant) in tests/Feature/BackupSets/BackupSetAddPoliciesTenantIsolationTest.php (or extend tests/Feature/RunAuthorizationTenantIsolationTest.php) +- [ ] T041 [P] [US3] Add sanitization test: failure reason containing token-like content is stored as redacted (exercise BulkOperationService::sanitizeFailureReason) + +### Implementation + +- [ ] T042 [US3] Ensure job records per-item failures with sanitized reasons and does not store raw Graph payloads in run failures or notifications (app/Jobs/AddPoliciesToBackupSetJob.php) +- [ ] T043 [US3] Record stable failure `reason_code` values (per spec.md) alongside sanitized short text in run failures (app/Jobs/AddPoliciesToBackupSetJob.php and/or app/Services/BulkOperationService.php) +- [ ] T044 [US3] Record “already in backup set” as `skipped` (with reason_code `already_in_backup_set`) and ensure counts match spec.md (app/Jobs/AddPoliciesToBackupSetJob.php) +- [ ] T045 [US3] Ensure job processes all items (no circuit breaker abort) and run status reflects partial completion (app/Jobs/AddPoliciesToBackupSetJob.php) + +**Checkpoint**: US3 complete — failures are safe and observable; tenant isolation holds. + +--- + +## Phase 5: Polish & Validation + +- [ ] T050 [P] Run formatting on changed files with ./vendor/bin/pint --dirty +- [ ] T051 Run targeted tests: ./vendor/bin/sail artisan test --filter=BackupSetAddPolicies