spec(052): async add policies

2026-01-15 21:08:14 +01:00 · 2026-01-15 21:08:14 +01:00 · 19361579f2
commit 19361579f2
parent a449ecec5b
4 changed files with 440 additions and 0 deletions
--- a/specs/052-async-add-policies/checklists/requirements.md
+++ b/specs/052-async-add-policies/checklists/requirements.md
@ -0,0 +1,32 @@
+# Specification Quality Checklist: Async “Add Policies” to Backup Set (052)
+
+**Purpose**: Validate specification completeness and quality before proceeding to implementation
+**Created**: 2026-01-14
+**Feature**: [specs/052-async-add-policies/spec.md](../spec.md)
+
+## Content Quality
+
+- [x] No implementation details leaked into user requirements (spec focuses on behaviors, run observability, and safety rules)
+- [x] Focused on user value (fast submit, dedupe, observable progress, safe failures)
+- [x] Written in operator language (Backup Sets, Add Policies, runs, failures)
+- [x] Mandatory sections present (User Scenarios & Testing, Requirements, Success Criteria)
+
+## Requirement Completeness
+
+- [x] Scope is bounded (explicit in-scope/out-of-scope list)
+- [x] Requirements are testable and unambiguous (FR-001..FR-005 + NFRs + acceptance scenarios)
+- [x] Idempotency and dedupe behavior is explicitly specified
+- [x] Tenant isolation and authorization are explicitly specified
+- [x] Data minimization and safe logging rules are explicitly specified
+- [x] “No UI-time Graph calls” is explicitly specified and test-gated
+
+## Feature Readiness
+
+- [x] Plan includes concrete touch points and test strategy (plan.md)
+- [x] Tasks are grouped by user story and prioritize tests-first (tasks.md)
+
+## Notes
+
+- Items marked incomplete require spec updates before implementation.
+- Constitution gate: this checklist must exist for features that change runtime behavior.
+
--- a/specs/052-async-add-policies/plan.md
+++ b/specs/052-async-add-policies/plan.md
@ -0,0 +1,108 @@
+# Implementation Plan: Async “Add Policies” to Backup Set (052)
+
+**Branch**: `feat/052-async-add-policies` | **Date**: 2026-01-14 | **Spec**: [specs/052-async-add-policies/spec.md](spec.md)
+**Input**: Feature specification from `/specs/052-async-add-policies/spec.md`
+
+## Summary
+
+Convert Backup Set → Add Policies (“Add selected”) to job-only execution by creating/reusing a `BulkOperationRun`, dispatching a queued job to do Graph/snapshot work, and returning immediately with observable feedback and a “View run” link.
+
+## Technical Context
+
+**Language/Version**: PHP 8.4 (Laravel 12)  
+**Admin UI**: Filament v4 + Livewire v3  
+**Storage**: PostgreSQL  
+**Queue**: Laravel queues (Sail-first locally)  
+**Testing**: Pest v4  
+**Constraints**: No Graph calls during interactive request/response; tenant isolation; idempotent dedupe; safe error persistence.
+
+## Constitution Check
+
+- Inventory-first: this feature updates backup storage (explicit snapshot capture), never “inventory on render”.
+- Read/write separation: interactive action is “write intent only” (enqueue); job performs DB writes with auditability.
+- Graph contract path: any Graph reads happen only inside the job via `GraphClientInterface` (indirectly via existing services).
+- Deterministic capabilities: selection hashing and idempotency key are deterministic.
+- Tenant isolation: run records and backup set mutations are scoped to the active tenant.
+- Automation: job has run record lifecycle + counts + failures; dedupe prevents duplicates.
+- Data minimization: failures and notifications store sanitized messages only.
+
+## Project Structure
+
+### Documentation (this feature)
+
+```text
+specs/052-async-add-policies/
+├── plan.md
+├── spec.md
+├── tasks.md
+└── checklists/
+    └── requirements.md
+```
+
+### Source Code (planned touch points)
+
+```text
+app/Livewire/BackupSetPolicyPickerTable.php        # “Add selected” handler becomes enqueue-only
+app/Jobs/AddPoliciesToBackupSetJob.php            # new queued job (name may vary)
+app/Models/BulkOperationRun.php                   # reused run record
+app/Services/BulkOperationService.php             # reused counters + failure sanitization
+app/Support/RunIdempotency.php                    # reused deterministic key builder + active-run lookup
+app/Filament/Resources/BulkOperationRunResource.php  # existing run UI used for “View run”
+tests/Feature/Filament/BackupSetPolicyPickerTableTest.php  # updated for async behavior
+tests/Feature/*                                   # new/updated tests for idempotency + tenant isolation
+```
+
+## Execution Model
+
+### Action Handler (UI request)
+
+Responsibilities (must remain fast):
+1. Resolve tenant + user context and authorize “add policies to backup set”.
+2. Convert selected policy records into an ID list (no Graph/snapshot work).
+3. Compute a deterministic idempotency key using:
+	- tenant id
+	- target backup_set_id
+	- operation type (e.g., `backup_set.add_policies`)
+	- context payload containing (sorted) policy IDs + option flags (include assignments/scope tags/foundations)
+4. Reuse an active run if one exists; otherwise create a new `BulkOperationRun` with `resource=backup_set`, `action=add_policies`, `item_ids` = selected policy IDs, `total_items` = count, and `idempotency_key` set.
+5. Dispatch a queued job (always async; no sync shortcut) with the run id + backup_set_id + option flags.
+6. Return immediately with a Filament notification and a “View run” link (DB notification preferred).
+
+### Queued Job
+
+Responsibilities (all heavy work):
+1. Load the `BulkOperationRun` and ensure it is still active (`pending` → `running`).
+2. Validate the backup set exists and belongs to the run tenant.
+3. Process the run’s stored policy IDs deterministically (sequentially or chunked):
+	- Add/capture each policy into the backup set using existing capture services.
+	- Update counts and record per-item failures with safe reasons.
+	- Continue through all items (no circuit-breaker abort).
+4. Complete the run with:
+	- `completed` if no failures
+	- `completed_with_errors` (partial) if some failures
+	- `failed` if the job cannot proceed (e.g., backup set missing)
+5. Emit a DB notification to the initiating user summarizing the outcome and linking to the run.
+
+### Idempotency & Concurrency
+
+- Primary dedupe mechanism: `RunIdempotency::findActiveBulkOperationRun(tenant_id, idempotency_key)`.
+- Secondary guard: existing partial unique index on `(tenant_id, idempotency_key)` for active statuses.
+- Race handling: if concurrent submissions collide, prefer “find existing run and redirect” over throwing.
+- Dedupe scope: tenant-wide (idempotency key does not include `user_id`).
+
+## Testing Strategy (Pest)
+
+Minimum required tests for 052:
+- Action dispatch test: `Add selected` queues the job and creates/reuses a run.
+- Fail-hard Graph guard test: binding `GraphClientInterface` (or a capture service) to a mock that must not be called during the action.
+- Idempotency test: calling the action twice with the same selection queues only one job and creates only one active run.
+- Tenant isolation test: run view under another tenant context is forbidden (403).
+
+Target commands:
+- `./vendor/bin/sail artisan test --filter=BackupSetAddPolicies`
+- `./vendor/bin/pint --dirty`
+
+## Rollout Notes
+
+- Requires queue workers running for background processing (Sail locally; Dokploy workers in staging/prod).
+- No destructive migrations expected; if schema is extended for better observability, it must remain backwards compatible.
--- a/specs/052-async-add-policies/spec.md
+++ b/specs/052-async-add-policies/spec.md
@ -0,0 +1,196 @@
+# Feature Specification: Async “Add Policies” to Backup Set (052)
+
+**Feature Branch**: `feat/052-async-add-policies`  
+**Created**: 2026-01-14  
+**Status**: Draft (implementation-ready)  
+**Input**: Make Backup Sets → “Add Policies” (Add selected) non-blocking by moving all Graph/snapshot work into a queued job. The UI action only creates/reuses a Run record and dispatches the job.
+
+## Purpose
+
+Make “Add selected” in the Backup Set → Add Policies flow reliable and fast by ensuring heavy work (Graph reads, snapshot capture, per-policy loops) never runs inline in an interactive request.
+
+Primary outcomes:
+- Prevent request timeouts and long waits.
+- Prevent duplicate work from double-clicks/retries.
+- Provide observable progress and safe failure visibility via an existing Run record type.
+
+## Clarifications
+
+### Session 2026-01-15
+
+- Q: Who may start “Add selected”? → A: Roles `Owner`, `Manager`, `Operator` may start; `Readonly` may not.
+- Q: Is idempotency/dedupe per-user or tenant-wide? → A: Tenant-wide (dedupe key does not include `user_id`).
+- Q: Should execution ever run synchronously for small selections? → A: No; always async (never in-request).
+- Q: What happens for empty selection? → A: No run/job; show “No policies selected” (current behavior).
+- Q: How are run counts defined? → A: `total_items` equals the number of selected policies at submit time; “already in backup set” counts as `skipped`.
+- Q: Do we use a circuit breaker (e.g., abort if >50% fails)? → A: No; process all items and surface partial results.
+- Q: How should failures be persisted? → A: Stable `reason_code` + sanitized short text; never store secrets/tokens/raw payload dumps.
+
+## Pinned Decisions (052 defaults)
+
+- **Authorization**: start allowed for `Owner`, `Manager`, `Operator`; forbidden for `Readonly`.
+- **Dedupe scope**: tenant-wide idempotency (no `user_id` in dedupe key).
+- **Execution**: always async; action never performs Graph/snapshot work inline.
+- **Empty selection**: no run/job; show “No policies selected”.
+- **Counts**: `total_items` = selected policies; “already in backup set” = `skipped`.
+- **Circuit breaker**: none; job processes all items.
+- **Failure format**: `reason_code` + sanitized short text (no secrets).
+
+## In Scope (052)
+
+- Convert the “Add selected” action (Backup Sets → Add Policies modal) to job-only execution.
+- Ensure an observable Run exists (status, counts, errors) using an existing run record type (prefer `BulkOperationRun`).
+- Ensure idempotency: repeated clicks while queued/running reuse the same run for the same tenant + backup set + selection.
+- Emit DB notification + optional existing progress widget integration (if already supported by the run type); no new UI framework work required.
+- Add guard tests ensuring no Graph calls occur in the action handler/request.
+
+## Out of Scope (052)
+
+- UI redesign/polish (layout, navigation reorg, fancy tables, etc.)
+- New Monitoring area or unified operations dashboard
+- Group browsing/typeahead improvements (directory cache / later)
+- New run tables if an existing run record can be reused
+- Changing what a “policy version” means or how snapshots are stored
+
+## User Scenarios & Testing *(mandatory)*
+
+### User Story 1 - Add selected policies without blocking (Priority: P1)
+
+As a tenant admin, I can add selected policies to a backup set without the UI hanging or timing out, because the system queues background work and gives me a Run record to monitor.
+
+**Why this priority**: This is a common workflow and currently risks long requests/timeouts when Graph capture is slow or throttled.
+
+**Independent Test**: Trigger “Add selected” and assert the request returns quickly, a Run exists, and a job is queued (no Graph work happens inline).
+
+**Acceptance Scenarios**:
+
+1. **Given** I am on a Backup Set and select policies in “Add Policies”, **When** I click “Add selected”, **Then** the UI returns quickly and a queued Run is created (or reused) with a link to its detail view.
+2. **Given** the job runs, **When** it completes, **Then** the backup set contains the added policies and the Run shows final status + counts.
+
+---
+
+### User Story 2 - Double click / repeated submissions are deduplicated (Priority: P2)
+
+As a tenant admin, I can click “Add selected” repeatedly (or double click) without creating duplicate work, because identical operations reuse the same active Run.
+
+**Why this priority**: Prevents accidental duplication, inconsistent outcomes, and unnecessary Graph load.
+
+**Independent Test**: Call the action twice with the same selection while the first run is active and assert only one Run and one queued job.
+
+**Acceptance Scenarios**:
+
+1. **Given** a matching Run for the same tenant + backup set + selection is queued or running, **When** I click “Add selected” again, **Then** the system reuses the existing Run and does not enqueue duplicate work.
+
+---
+
+### User Story 3 - Failures are visible and safe (Priority: P3)
+
+As a tenant admin, I can see safe failure summaries when some policies cannot be captured from Graph, without secrets or raw payloads being stored or displayed.
+
+**Why this priority**: Operators must be able to triage issues safely; secret leakage is unacceptable.
+
+**Independent Test**: Force a simulated Graph failure in the job and assert the Run ends as failed/partial with a sanitized reason code/summary (no secrets).
+
+**Acceptance Scenarios**:
+
+1. **Given** Graph returns an error for some policies, **When** the job completes, **Then** the Run is marked partial/completed-with-errors and includes per-item failures with safe reason codes/summaries.
+2. **Given** a failure includes sensitive substrings (e.g., “Bearer ”), **When** it is persisted, **Then** stored reasons are redacted/sanitized.
+
+---
+
+### Edge Cases
+
+- Empty selection (no policies selected)
+- Backup set deleted or tenant context missing between queue and execution
+- Some selected policies are already in the backup set
+- Policies deleted/ignored locally between queue and execution
+- Graph throttling/transient failures (429/503) during capture
+
+## Requirements *(mandatory)*
+
+**Constitution alignment (required):** This feature performs Graph reads and writes local DB state, so it MUST be idempotent & observable, tenant-scoped, and safe-loggable. Graph calls MUST go through `GraphClientInterface` and MUST NOT occur during UI render or action request handling.
+
+### Functional Requirements
+
+- **FR-001 (Job-only action handler)**: The “Add selected” handler MUST:
+	- validate input + authorization
+	- create/reuse a Run record
+	- dispatch a queued job
+	- return immediately with a notification and a link to the Run
+  It MUST NOT:
+	- call `GraphClientInterface`
+	- call snapshot capture services
+	- loop over selected policies to do work inline
+	- run a synchronous “small selection” shortcut
+
+- **FR-002 (Run record + observability)**: The system MUST persist for each run:
+	- `tenant_id`, `initiator_user_id` (or equivalent)
+	- `resource = backup_set`, `action = add_policies` (or existing taxonomy)
+	- status lifecycle: queued → running → succeeded|failed|partial (map to existing run status semantics where possible, e.g., queued=`pending`, succeeded=`completed`, partial=`completed_with_errors`)
+	- counts: `total_items` = selected policies at submit time; `processed_items` = succeeded+failed+skipped; “already in backup set” increments `skipped`
+	- safe error context (summary + per-item outcomes at least for failures)
+
+- **FR-003 (Idempotency / dedupe)**: While a matching run is queued or running, the action MUST reuse it and MUST NOT enqueue duplicate work.
+
+  Recommended dedupe key: `tenant_id + backup_set_id + operation_type(add_policies) + selection_hash`.
+
+- **FR-004 (Job execution semantics)**: The queued job MUST:
+	- load the selection deterministically (IDs or a stored selection payload)
+	- process items sequentially or in safe batches
+	- update run progress counts as it goes
+	- process all items (no circuit-breaker abort)
+	- record per-item outcomes (at minimum: failure entries with stable `reason_code` + sanitized short text)
+
+  **Reason codes (minimum set):**
+	- `already_in_backup_set` (skipped)
+	- `policy_not_found` (failed)
+	- `policy_ignored` (skipped)
+	- `backup_set_not_found` (failed)
+	- `backup_set_archived` (failed)
+	- `graph_forbidden` (failed)
+	- `graph_throttled` (failed)
+	- `graph_transient` (failed)
+	- `unknown` (failed)
+
+- **FR-005 (User-visible feedback)**: On action submit, the UI MUST:
+	- show “Queued” feedback
+	- provide a “View run” link (DB notification preferred)
+
+### Non-Functional Requirements
+
+- **NFR-001 (Tenant isolation and authorization)**:
+	- Run list/view MUST be tenant-scoped
+	- Cross-tenant run access MUST be denied (403)
+	- Only authorized roles can start “Add Policies”
+
+- **NFR-002 (Data minimization & safe logging)**:
+	- No access tokens, “Bearer ” strings, or raw Graph payload dumps in notifications, run failures, or logs
+	- Persist only sanitized/allowlisted error fields and stable identifiers
+
+- **NFR-003 (Determinism)**: Given the same input selection, results are reproducible and the dedupe key is stable.
+
+- **NFR-004 (No UI-time Graph calls)**: Rendering the Backup Set UI and Run detail pages MUST not require Graph calls.
+
+### Key Entities *(include if feature involves data)*
+
+- **BulkOperationRun**: Existing run record used for long-running operations (status + counts + failures).
+- **BackupSet**: Target set to which policies are added.
+- **BackupItem**: Persisted item rows representing a policy in a backup set.
+
+## Success Criteria *(mandatory)*
+
+### Measurable Outcomes
+
+- **SC-001 (Fast submit)**: Clicking “Add selected” returns without timing out and does not perform Graph work in-request (proved by guard tests).
+- **SC-002 (Observable run)**: A Run is created/reused and is visible in the UI with status and progress counts.
+- **SC-003 (No duplicates)**: Double-click does not create duplicate runs/jobs (proved by idempotency tests).
+- **SC-004 (Safe failures)**: Failures show safe reason codes/summaries and do not store secrets/tokens (proved by sanitization tests).
+
+## Rollout / Migration
+
+- No destructive migrations required for 052.
+- If additional idempotency indexing is needed beyond existing run infrastructure, add a minimal migration (backwards compatible).
+
+## Open Questions (Optional)
+
+- Should progress appear in an existing global progress widget (only if already supported; new widget work is out of scope)?
--- a/specs/052-async-add-policies/tasks.md
+++ b/specs/052-async-add-policies/tasks.md
@ -0,0 +1,104 @@
+---
+
+description: "Task list for implementing Async “Add Policies” to Backup Set (052)"
+---
+
+# Tasks: Async “Add Policies” to Backup Set (052)
+
+**Input**: Design documents from `/specs/052-async-add-policies/`
+**Prerequisites**: plan.md (required), spec.md (required)
+
+**Tests**: Required (Pest), per spec.md (SC-001..SC-004).
+
+**Organization**: Tasks are grouped by user story so each story can be implemented and tested independently.
+
+## Format: `- [ ] T### [P?] [US#] Description (with file path)`
+
+- **[P]**: Can run in parallel (different files, no dependencies)
+- **[US#]**: User story mapping (US1/US2/US3)
+
+## Path Conventions (Laravel)
+
+- App code: `app/`
+- DB: `database/migrations/`
+- Filament admin: `app/Filament/Resources/`
+- Livewire tables: `app/Livewire/`
+- Tests (Pest): `tests/Feature/`, `tests/Unit/`
+
+---
+
+## Phase 1: Setup (Shared Infrastructure)
+
+- [ ] T001 [P] Confirm existing run infra supports this operation (BulkOperationRun statuses + unique idempotency index) and document mapping queued↔pending, partial↔completed_with_errors in specs/052-async-add-policies/plan.md
+- [ ] T002 [P] Decide and standardize taxonomy strings: operationType (`backup_set.add_policies`), resource (`backup_set`), action (`add_policies`) in specs/052-async-add-policies/spec.md and plan.md
+
+---
+
+## Phase 2: User Story 1 - Add selected policies without blocking (Priority: P1) 🎯 MVP
+
+**Goal**: “Add selected” returns quickly and queues background work with an observable Run record.
+
+**Independent Test**: Trigger the action and assert a job is queued and a run exists; no Graph/capture work occurs in-request.
+
+### Tests (write first) ⚠️
+
+- [ ] T010 [P] [US1] Update/replace sync-path assertions in tests/Feature/Filament/BackupSetPolicyPickerTableTest.php to assert job dispatch + run creation
+- [ ] T011 [P] [US1] Add fail-hard guard test to ensure no Graph calls occur during the bulk action (mock `App\\Services\\Graph\\GraphClientInterface` and assert `->never()`)
+
+### Implementation
+
+- [ ] T020 [US1] Create queued job to add policies to a backup set in app/Jobs/AddPoliciesToBackupSetJob.php (uses run lifecycle + safe failures via BulkOperationService)
+- [ ] T021 [US1] Update bulk action handler in app/Livewire/BackupSetPolicyPickerTable.php to create/reuse BulkOperationRun and dispatch AddPoliciesToBackupSetJob (no inline snapshot capture)
+- [ ] T022 [US1] Emit “queued” notification with “View run” link on submit (Filament DB notification preferred) from app/Livewire/BackupSetPolicyPickerTable.php
+- [ ] T023 [US1] Emit completion/failure DB notification from app/Jobs/AddPoliciesToBackupSetJob.php with a safe summary
+
+**Checkpoint**: US1 complete — submit is non-blocking, job executes, run is visible.
+
+---
+
+## Phase 3: User Story 2 - Double click / repeated submissions are deduplicated (Priority: P2)
+
+**Goal**: Matching queued/running operations reuse the same run and do not enqueue duplicates.
+
+**Independent Test**: Call the action twice for the same tenant + backup set + selection and assert only one run and one job dispatch.
+
+### Tests ⚠️
+
+- [ ] T030 [P] [US2] Add idempotency test in tests/Feature/BackupSets/BackupSetAddPoliciesIdempotencyTest.php (or extend existing picker test) asserting one run + one queued job
+
+### Implementation
+
+- [ ] T031 [US2] Implement deterministic selection hashing and idempotency key creation using app/Support/RunIdempotency.php (sorted policy ids + option flags), and reuse active runs via findActiveBulkOperationRun
+- [ ] T032 [US2] Handle race conditions safely (unique index collisions) by recovering the existing run rather than failing the request
+- [ ] T033 [US2] Ensure the action is always async (no `dispatchSync` path for small selections) in app/Livewire/BackupSetPolicyPickerTable.php
+
+**Checkpoint**: US2 complete — double clicks are safe and deduped.
+
+---
+
+## Phase 4: User Story 3 - Failures are visible and safe (Priority: P3)
+
+**Goal**: Failures are persisted safely and tenant isolation is enforced for run visibility.
+
+**Independent Test**: Force failure paths and confirm safe failures persisted; cross-tenant access is forbidden.
+
+### Tests ⚠️
+
+- [ ] T040 [P] [US3] Add tenant isolation test for the created run (403 cross-tenant) in tests/Feature/BackupSets/BackupSetAddPoliciesTenantIsolationTest.php (or extend tests/Feature/RunAuthorizationTenantIsolationTest.php)
+- [ ] T041 [P] [US3] Add sanitization test: failure reason containing token-like content is stored as redacted (exercise BulkOperationService::sanitizeFailureReason)
+
+### Implementation
+
+- [ ] T042 [US3] Ensure job records per-item failures with sanitized reasons and does not store raw Graph payloads in run failures or notifications (app/Jobs/AddPoliciesToBackupSetJob.php)
+- [ ] T043 [US3] Record stable failure `reason_code` values (per spec.md) alongside sanitized short text in run failures (app/Jobs/AddPoliciesToBackupSetJob.php and/or app/Services/BulkOperationService.php)
+- [ ] T044 [US3] Record “already in backup set” as `skipped` (with reason_code `already_in_backup_set`) and ensure counts match spec.md (app/Jobs/AddPoliciesToBackupSetJob.php)
+- [ ] T045 [US3] Ensure job processes all items (no circuit breaker abort) and run status reflects partial completion (app/Jobs/AddPoliciesToBackupSetJob.php)
+
+**Checkpoint**: US3 complete — failures are safe and observable; tenant isolation holds.
+
+---
+
+## Phase 5: Polish & Validation
+
+- [ ] T050 [P] Run formatting on changed files with ./vendor/bin/pint --dirty
+- [ ] T051 Run targeted tests: ./vendor/bin/sail artisan test --filter=BackupSetAddPolicies