Summary This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows. Why We want predictable UX and operations at MSP scale: • no timeouts / long-running requests • reproducible run state + per-item results • safe error persistence (no secrets / no token leakage) • strict tenant isolation + auditability for write paths What changed Foundational (Runs + Idempotency + Observability) • Added a shared RunIdempotency helper (dedupe while queued/running). • Added a read-only BulkOperationRuns surface (list + view) for status/progress. • Added DB notifications for run status changes (with “View run” link). US1 – Policy “Capture snapshot” is job-only • Policy detail “Capture snapshot” now: • creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id) • dispatches a queued job • returns immediately with notification + link to run detail • Graph capture work moved fully into the job; request path stays Graph-free. US3 – Restore runs orchestration is job-only + safe • Live restore execution is queued and updates RestoreRun status/progress. • Per-item outcomes are persisted deterministically (per internal DB record). • Audit logging is written for live restore. • Preview/dry-run is enforced as read-only (no writes). Tenant isolation / authorization (non-negotiable) • Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404). • Explicit Pest tests cover cross-tenant denial and start authorization. Tests / Verification • ./vendor/bin/pint --dirty • Targeted suite (examples): • policy capture snapshot queued + idempotency tests • restore orchestration + audit logging + preview read-only tests • run authorization / tenant isolation tests Notes / Scope boundaries • Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge. • Resilience/backoff is tracked in tasks but can be iterated further after merge. Review focus • Dedupe behavior for queued/running runs (reuse vs create-new) • Tenant scoping & policy gates for all run surfaces • Restore safety: audit event + preview no-writes Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #56
203 lines
12 KiB
Markdown
203 lines
12 KiB
Markdown
# Tasks: Backup/Restore Job Orchestration (049)
|
||
|
||
**Input**: Design documents from `specs/049-backup-restore-job-orchestration/`
|
||
|
||
**Prerequisites**: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md
|
||
|
||
**Tests**: REQUIRED (Pest) for these runtime behavior changes.
|
||
|
||
**MVP scope**: Strictly limited to **T001–T016 (US1 only)**. The **Phase 7 global progress widget (T037)** is **Phase 2** and explicitly **NOT** part of the MVP.
|
||
|
||
## Phase 1: Setup (Shared Infrastructure)
|
||
|
||
- [x] T001 Verify queue + DB notifications prerequisites in config/queue.php and database/migrations/*notifications* (add missing migration if needed)
|
||
- [x] T002 Confirm existing run tables and status enums used by RestoreRun in app/Support/RestoreRunStatus.php and database/migrations/2025_12_10_000150_create_restore_runs_table.php
|
||
- [x] T003 [P] Add quickstart sanity commands for this feature in specs/049-backup-restore-job-orchestration/quickstart.md
|
||
|
||
---
|
||
|
||
## Phase 2: Foundational (Blocking Prerequisites)
|
||
|
||
**⚠️ CRITICAL**: No user story work should begin until this phase is complete.
|
||
|
||
- [x] T004 Add idempotency support to bulk_operation_runs via database/migrations/2026_01_11_120001_add_idempotency_key_to_bulk_operation_runs_table.php
|
||
- [x] T005 Add idempotency support to restore_runs via database/migrations/2026_01_11_120002_add_idempotency_key_to_restore_runs_table.php
|
||
- [x] T006 [P] Add casts/fillables for idempotency + timestamps in app/Models/BulkOperationRun.php and app/Models/RestoreRun.php
|
||
- [x] T007 Implement idempotency key helpers in app/Support/RunIdempotency.php (build key, find active run, enforce reuse)
|
||
- [x] T008 [P] Add a read-only Filament resource to inspect run details for BulkOperationRun in app/Filament/Resources/BulkOperationRunResource.php
|
||
- [x] T009 [P] Add notification for run status transitions in app/Notifications/RunStatusChangedNotification.php (DB channel)
|
||
- [x] T010 Add unit tests for RunIdempotency helpers in tests/Unit/RunIdempotencyTest.php
|
||
|
||
**CRITICAL (must-fix before implementing any new run flows): Tenant isolation + authorization**
|
||
|
||
- [x] T042 Add tenant-scoped authorization for run list/view/start across all run flows (BulkOperationRun + RestoreRun) using policies/resources and ensure every query is tenant-scoped (e.g., app/Filament/Resources/BulkOperationRunResource.php, app/Filament/Resources/RestoreRunResource.php, and each start action/page that creates runs)
|
||
- [x] T043 [P] Add Pest feature tests that run list/view are tenant-scoped (cannot list/view another tenant’s runs) in tests/Feature/RunAuthorizationTenantIsolationTest.php
|
||
- [x] T044 [P] Add Pest feature tests that unaffiliated users cannot start runs (capture snapshot / restore execute / preview / backup set capture) in tests/Feature/RunStartAuthorizationTest.php
|
||
|
||
**Checkpoint**: Foundation ready (idempotency + run detail view + notifications).
|
||
|
||
---
|
||
|
||
## Phase 3: User Story 1 - Capture snapshot runs in background (Priority: P1) 🎯 MVP
|
||
|
||
**Goal**: Capturing a policy snapshot never blocks the UI; it creates/reuses a run record and processes in a queued job with visible progress.
|
||
|
||
**Independent Test**: Trigger “Capture snapshot” on a policy; the request returns quickly and a BulkOperationRun transitions `queued → running → succeeded|failed|partial`, with details viewable.
|
||
|
||
### Tests (write first)
|
||
|
||
- [x] T011 [P] [US1] Add Pest feature test that capture snapshot queues a job (no inline capture) in tests/Feature/PolicyCaptureSnapshotQueuedTest.php
|
||
- [x] T012 [P] [US1] Add Pest feature test that double-click reuses the active run (idempotency) in tests/Feature/PolicyCaptureSnapshotIdempotencyTest.php
|
||
|
||
### Implementation
|
||
|
||
- [x] T013 [US1] Create queued job to capture one policy snapshot in app/Jobs/CapturePolicySnapshotJob.php (updates BulkOperationRun counts + failures)
|
||
- [x] T014 [US1] Update UI action to create/reuse run and dispatch job in app/Filament/Resources/PolicyResource/Pages/ViewPolicy.php
|
||
- [x] T015 [P] [US1] Add linking from UI notifications to BulkOperationRunResource view page in app/Filament/Resources/BulkOperationRunResource.php
|
||
- [x] T016 [US1] Ensure failures are safe/minimized (no secrets) when recording run failures in app/Services/BulkOperationService.php
|
||
|
||
**Checkpoint**: User Story 1 is independently usable and testable.
|
||
|
||
---
|
||
|
||
## Phase 4: User Story 3 - Restore runs in background with per-item results (Priority: P1)
|
||
|
||
**Goal**: Restore execution and re-run restore operate exclusively via queued jobs, with persisted per-item outcomes and safe error summaries visible in the run detail UI.
|
||
|
||
**Independent Test**: Starting restore creates/reuses a RestoreRun in `queued` state, queues execution, and later shows item outcomes without relying on logs.
|
||
|
||
### Tests (write first)
|
||
|
||
|
||
- [x] T017 [P] [US3] Add Pest feature test that restore execution reuses active run for identical (tenant+backup_set+scope) starts in tests/Feature/RestoreRunIdempotencyTest.php
|
||
- [x] T018 [P] [US3] Extend existing restore job test to assert per-item outcome persistence in tests/Feature/ExecuteRestoreRunJobTest.php
|
||
- [x] T045 [P] [US3] Add Pest feature test that live restore writes an audit event (run-id linked) in tests/Feature/RestoreAuditLoggingTest.php
|
||
|
||
### Implementation
|
||
|
||
- [x] T019 [US3] Implement idempotency key computation for restore runs (tenant + operation + target + scope hash) in app/Support/RunIdempotency.php
|
||
- [x] T020 [US3] Update restore run creation/execute flow to reuse active runs (no duplicates) in app/Filament/Resources/RestoreRunResource.php
|
||
- [x] T021 [US3] Update app/Jobs/ExecuteRestoreRunJob.php to set started/finished timestamps and emit DB notifications (queued/running/terminal)
|
||
- [x] T022 [US3] Persist deterministic per-item outcomes into restore_runs.results (keyed by backup_item_id) in app/Services/Intune/RestoreService.php
|
||
- [x] T023 [US3] Derive total/succeeded/failed counts from persisted results and surface in RestoreRunResource view/table in app/Filament/Resources/RestoreRunResource.php
|
||
- [x] T046 [US3] Ensure live restore execution emits an auditable event linked to the run (e.g., audit_logs FK or structured audit record) in app/Jobs/ExecuteRestoreRunJob.php and/or app/Services/Intune/RestoreService.php
|
||
|
||
**Checkpoint**: Restore runs are job-only, idempotent, and observable with item outcomes.
|
||
|
||
---
|
||
|
||
## Phase 5: User Story 2 - Backup set create/capture runs in background (Priority: P2)
|
||
|
||
**Goal**: Creating a backup set and adding policies to a backup set does not perform Graph-heavy snapshot capture inline; capture occurs in jobs with a run record.
|
||
|
||
**Independent Test**: Creating a backup set returns quickly and produces a BulkOperationRun showing progress; adding policies via the picker also queues work.
|
||
|
||
### Tests (write first)
|
||
|
||
- [ ] T024 [P] [US2] Add Pest feature test that backup set create does not run capture inline and instead queues a job in tests/Feature/BackupSetCreateCaptureQueuedTest.php
|
||
- [ ] T025 [P] [US2] Add Pest feature test that “Add selected” in policy picker queues background work in tests/Feature/BackupSetPolicyPickerQueuesCaptureTest.php
|
||
|
||
### Implementation
|
||
|
||
- [ ] T026 [US2] Refactor capture work out of BackupService::createBackupSet into separate methods in app/Services/Intune/BackupService.php
|
||
- [ ] T027 [US2] Create queued job to capture backup set items in app/Jobs/CaptureBackupSetJob.php (uses BackupService; updates BulkOperationRun)
|
||
- [ ] T028 [US2] Update backup set create flow to create backup_set record quickly and dispatch CaptureBackupSetJob in app/Filament/Resources/BackupSetResource.php
|
||
- [ ] T029 [US2] Create queued job to add policies to a backup set (and capture foundations if requested) in app/Jobs/AddPoliciesToBackupSetJob.php
|
||
- [ ] T030 [US2] Update bulk action in app/Livewire/BackupSetPolicyPickerTable.php to create/reuse BulkOperationRun and dispatch AddPoliciesToBackupSetJob
|
||
|
||
**Checkpoint**: Backup set capture workloads are job-only and observable.
|
||
|
||
---
|
||
|
||
## Phase 6: User Story 4 - Dry-run/preview runs in background (Priority: P2)
|
||
|
||
**Goal**: Restore preview generation is queued, persisted, and viewable without re-execution.
|
||
|
||
**Independent Test**: Clicking “Generate preview” returns quickly; a queued RestoreRun performs the diff generation asynchronously and persists preview output that the UI can display.
|
||
|
||
### Tests (write first)
|
||
|
||
- [ ] T031 [P] [US4] Add Pest feature test that preview generation queues a job (no inline RestoreDiffGenerator call) in tests/Feature/RestorePreviewQueuedTest.php
|
||
- [ ] T032 [P] [US4] Add Pest feature test that preview results persist and are reusable in tests/Feature/RestorePreviewPersistenceTest.php
|
||
- [ ] T047 [P] [US4] Add Pest feature test that preview/dry-run never performs writes (must be read-only) in tests/Feature/RestorePreviewReadOnlySafetyTest.php
|
||
|
||
### Implementation
|
||
|
||
- [ ] T033 [US4] Create queued job to generate preview diffs and persist to restore_runs.preview + metadata in app/Jobs/GenerateRestorePreviewJob.php
|
||
- [ ] T034 [US4] Update preview action in app/Filament/Resources/RestoreRunResource.php to create/reuse a dry-run RestoreRun and dispatch GenerateRestorePreviewJob
|
||
- [ ] T035 [US4] Update restore run view component to read preview from the persisted run record in resources/views/filament/forms/components/restore-run-preview.blade.php
|
||
- [ ] T036 [US4] Emit DB notifications for preview queued/running/completed/failed transitions in app/Jobs/GenerateRestorePreviewJob.php
|
||
- [ ] T048 [US4] Enforce preview/dry-run read-only behavior: block write-capable operations and record a safe failure if a write would occur (in app/Jobs/GenerateRestorePreviewJob.php and/or restore diff generation service)
|
||
|
||
**Checkpoint**: Preview is asynchronous, persisted, and visible.
|
||
|
||
---
|
||
|
||
## Phase 7: Phase 2 - Global Progress Widget (All Run Types)
|
||
|
||
- [ ] T037 [P] Add a global progress widget for restore runs (Phase 2 requirement) by extending app/Livewire/BulkOperationProgress.php or adding a dedicated Livewire component in app/Livewire/RestoreRunProgress.php
|
||
|
||
---
|
||
|
||
## Phase 8: Polish & Cross-Cutting Concerns
|
||
|
||
- [ ] T038 Ensure Graph throttling/backoff behavior is applied inside queued jobs (429/503) in app/Services/Intune/PolicySnapshotService.php and app/Services/Intune/RestoreService.php
|
||
- [ ] T039 [P] Add/extend run status notification formatting to include safe error codes/contexts in app/Notifications/RunStatusChangedNotification.php
|
||
- [ ] T040 Run formatter on modified files: vendor/bin/pint --dirty
|
||
- [ ] T041 Run targeted tests for affected areas: tests/Feature/*Restore* tests/Feature/*BackupSet* tests/Feature/*Policy* (use php artisan test with filters)
|
||
|
||
---
|
||
|
||
## Dependencies & Execution Order
|
||
|
||
### Story order
|
||
|
||
- Phase 1 → Phase 2 must complete first.
|
||
- After Phase 2:
|
||
- US1 and US3 can proceed in parallel.
|
||
- US4 can proceed in parallel but may be easiest after US3 (shared RestoreRun patterns).
|
||
- US2 can proceed independently after Phase 2.
|
||
|
||
### Dependency graph
|
||
|
||
- Setup → Foundational → { US1, US2, US3, US4 } → Polish
|
||
- Setup → Foundational → { US1, US2, US3, US4 } → Phase 2 Global Widget → Polish
|
||
- Suggested minimal MVP: Setup → Foundational → US1
|
||
|
||
---
|
||
|
||
## Parallel execution examples
|
||
|
||
### US1
|
||
|
||
- In parallel: T011 (queues test), T012 (idempotency test)
|
||
- In parallel: T013 (job), T014 (UI action update) after foundational tasks
|
||
|
||
### US2
|
||
|
||
- In parallel: T024 (create queues test), T025 (picker queues test)
|
||
- In parallel: T027 (job) and T029 (job) after BackupService refactor task T026
|
||
|
||
### US3
|
||
|
||
- In parallel: T017 (idempotency test), T018 (job behavior test)
|
||
- In parallel: T021 (job notifications) and T023 (UI view enhancements) once results format is defined
|
||
|
||
### US4
|
||
|
||
- In parallel: T031 (queues test), T032 (persistence test)
|
||
- In parallel: T033 (job) and T035 (view reads persisted preview) once run persistence shape is agreed
|
||
|
||
---
|
||
|
||
## Implementation strategy
|
||
|
||
- MVP (fastest value): deliver US1 first (policy snapshot capture becomes queued + idempotent + observable).
|
||
- Next: US3 + US4 to fully de-risk restore execution and preview.
|
||
- Then: US2 to eliminate inline Graph work from backup set flows.
|
||
|
||
## Format validation
|
||
|
||
All tasks above follow the required checklist format:
|
||
`- [ ] T### [P?] [US#?] Description with file path`
|