TenantAtlas/specs/049-backup-restore-job-orchestration/tasks.md
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

203 lines
12 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Tasks: Backup/Restore Job Orchestration (049)
**Input**: Design documents from `specs/049-backup-restore-job-orchestration/`
**Prerequisites**: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md
**Tests**: REQUIRED (Pest) for these runtime behavior changes.
**MVP scope**: Strictly limited to **T001T016 (US1 only)**. The **Phase 7 global progress widget (T037)** is **Phase 2** and explicitly **NOT** part of the MVP.
## Phase 1: Setup (Shared Infrastructure)
- [x] T001 Verify queue + DB notifications prerequisites in config/queue.php and database/migrations/*notifications* (add missing migration if needed)
- [x] T002 Confirm existing run tables and status enums used by RestoreRun in app/Support/RestoreRunStatus.php and database/migrations/2025_12_10_000150_create_restore_runs_table.php
- [x] T003 [P] Add quickstart sanity commands for this feature in specs/049-backup-restore-job-orchestration/quickstart.md
---
## Phase 2: Foundational (Blocking Prerequisites)
**⚠️ CRITICAL**: No user story work should begin until this phase is complete.
- [x] T004 Add idempotency support to bulk_operation_runs via database/migrations/2026_01_11_120001_add_idempotency_key_to_bulk_operation_runs_table.php
- [x] T005 Add idempotency support to restore_runs via database/migrations/2026_01_11_120002_add_idempotency_key_to_restore_runs_table.php
- [x] T006 [P] Add casts/fillables for idempotency + timestamps in app/Models/BulkOperationRun.php and app/Models/RestoreRun.php
- [x] T007 Implement idempotency key helpers in app/Support/RunIdempotency.php (build key, find active run, enforce reuse)
- [x] T008 [P] Add a read-only Filament resource to inspect run details for BulkOperationRun in app/Filament/Resources/BulkOperationRunResource.php
- [x] T009 [P] Add notification for run status transitions in app/Notifications/RunStatusChangedNotification.php (DB channel)
- [x] T010 Add unit tests for RunIdempotency helpers in tests/Unit/RunIdempotencyTest.php
**CRITICAL (must-fix before implementing any new run flows): Tenant isolation + authorization**
- [x] T042 Add tenant-scoped authorization for run list/view/start across all run flows (BulkOperationRun + RestoreRun) using policies/resources and ensure every query is tenant-scoped (e.g., app/Filament/Resources/BulkOperationRunResource.php, app/Filament/Resources/RestoreRunResource.php, and each start action/page that creates runs)
- [x] T043 [P] Add Pest feature tests that run list/view are tenant-scoped (cannot list/view another tenants runs) in tests/Feature/RunAuthorizationTenantIsolationTest.php
- [x] T044 [P] Add Pest feature tests that unaffiliated users cannot start runs (capture snapshot / restore execute / preview / backup set capture) in tests/Feature/RunStartAuthorizationTest.php
**Checkpoint**: Foundation ready (idempotency + run detail view + notifications).
---
## Phase 3: User Story 1 - Capture snapshot runs in background (Priority: P1) 🎯 MVP
**Goal**: Capturing a policy snapshot never blocks the UI; it creates/reuses a run record and processes in a queued job with visible progress.
**Independent Test**: Trigger “Capture snapshot” on a policy; the request returns quickly and a BulkOperationRun transitions `queued → running → succeeded|failed|partial`, with details viewable.
### Tests (write first)
- [x] T011 [P] [US1] Add Pest feature test that capture snapshot queues a job (no inline capture) in tests/Feature/PolicyCaptureSnapshotQueuedTest.php
- [x] T012 [P] [US1] Add Pest feature test that double-click reuses the active run (idempotency) in tests/Feature/PolicyCaptureSnapshotIdempotencyTest.php
### Implementation
- [x] T013 [US1] Create queued job to capture one policy snapshot in app/Jobs/CapturePolicySnapshotJob.php (updates BulkOperationRun counts + failures)
- [x] T014 [US1] Update UI action to create/reuse run and dispatch job in app/Filament/Resources/PolicyResource/Pages/ViewPolicy.php
- [x] T015 [P] [US1] Add linking from UI notifications to BulkOperationRunResource view page in app/Filament/Resources/BulkOperationRunResource.php
- [x] T016 [US1] Ensure failures are safe/minimized (no secrets) when recording run failures in app/Services/BulkOperationService.php
**Checkpoint**: User Story 1 is independently usable and testable.
---
## Phase 4: User Story 3 - Restore runs in background with per-item results (Priority: P1)
**Goal**: Restore execution and re-run restore operate exclusively via queued jobs, with persisted per-item outcomes and safe error summaries visible in the run detail UI.
**Independent Test**: Starting restore creates/reuses a RestoreRun in `queued` state, queues execution, and later shows item outcomes without relying on logs.
### Tests (write first)
- [x] T017 [P] [US3] Add Pest feature test that restore execution reuses active run for identical (tenant+backup_set+scope) starts in tests/Feature/RestoreRunIdempotencyTest.php
- [x] T018 [P] [US3] Extend existing restore job test to assert per-item outcome persistence in tests/Feature/ExecuteRestoreRunJobTest.php
- [x] T045 [P] [US3] Add Pest feature test that live restore writes an audit event (run-id linked) in tests/Feature/RestoreAuditLoggingTest.php
### Implementation
- [x] T019 [US3] Implement idempotency key computation for restore runs (tenant + operation + target + scope hash) in app/Support/RunIdempotency.php
- [x] T020 [US3] Update restore run creation/execute flow to reuse active runs (no duplicates) in app/Filament/Resources/RestoreRunResource.php
- [x] T021 [US3] Update app/Jobs/ExecuteRestoreRunJob.php to set started/finished timestamps and emit DB notifications (queued/running/terminal)
- [x] T022 [US3] Persist deterministic per-item outcomes into restore_runs.results (keyed by backup_item_id) in app/Services/Intune/RestoreService.php
- [x] T023 [US3] Derive total/succeeded/failed counts from persisted results and surface in RestoreRunResource view/table in app/Filament/Resources/RestoreRunResource.php
- [x] T046 [US3] Ensure live restore execution emits an auditable event linked to the run (e.g., audit_logs FK or structured audit record) in app/Jobs/ExecuteRestoreRunJob.php and/or app/Services/Intune/RestoreService.php
**Checkpoint**: Restore runs are job-only, idempotent, and observable with item outcomes.
---
## Phase 5: User Story 2 - Backup set create/capture runs in background (Priority: P2)
**Goal**: Creating a backup set and adding policies to a backup set does not perform Graph-heavy snapshot capture inline; capture occurs in jobs with a run record.
**Independent Test**: Creating a backup set returns quickly and produces a BulkOperationRun showing progress; adding policies via the picker also queues work.
### Tests (write first)
- [ ] T024 [P] [US2] Add Pest feature test that backup set create does not run capture inline and instead queues a job in tests/Feature/BackupSetCreateCaptureQueuedTest.php
- [ ] T025 [P] [US2] Add Pest feature test that “Add selected” in policy picker queues background work in tests/Feature/BackupSetPolicyPickerQueuesCaptureTest.php
### Implementation
- [ ] T026 [US2] Refactor capture work out of BackupService::createBackupSet into separate methods in app/Services/Intune/BackupService.php
- [ ] T027 [US2] Create queued job to capture backup set items in app/Jobs/CaptureBackupSetJob.php (uses BackupService; updates BulkOperationRun)
- [ ] T028 [US2] Update backup set create flow to create backup_set record quickly and dispatch CaptureBackupSetJob in app/Filament/Resources/BackupSetResource.php
- [ ] T029 [US2] Create queued job to add policies to a backup set (and capture foundations if requested) in app/Jobs/AddPoliciesToBackupSetJob.php
- [ ] T030 [US2] Update bulk action in app/Livewire/BackupSetPolicyPickerTable.php to create/reuse BulkOperationRun and dispatch AddPoliciesToBackupSetJob
**Checkpoint**: Backup set capture workloads are job-only and observable.
---
## Phase 6: User Story 4 - Dry-run/preview runs in background (Priority: P2)
**Goal**: Restore preview generation is queued, persisted, and viewable without re-execution.
**Independent Test**: Clicking “Generate preview” returns quickly; a queued RestoreRun performs the diff generation asynchronously and persists preview output that the UI can display.
### Tests (write first)
- [ ] T031 [P] [US4] Add Pest feature test that preview generation queues a job (no inline RestoreDiffGenerator call) in tests/Feature/RestorePreviewQueuedTest.php
- [ ] T032 [P] [US4] Add Pest feature test that preview results persist and are reusable in tests/Feature/RestorePreviewPersistenceTest.php
- [ ] T047 [P] [US4] Add Pest feature test that preview/dry-run never performs writes (must be read-only) in tests/Feature/RestorePreviewReadOnlySafetyTest.php
### Implementation
- [ ] T033 [US4] Create queued job to generate preview diffs and persist to restore_runs.preview + metadata in app/Jobs/GenerateRestorePreviewJob.php
- [ ] T034 [US4] Update preview action in app/Filament/Resources/RestoreRunResource.php to create/reuse a dry-run RestoreRun and dispatch GenerateRestorePreviewJob
- [ ] T035 [US4] Update restore run view component to read preview from the persisted run record in resources/views/filament/forms/components/restore-run-preview.blade.php
- [ ] T036 [US4] Emit DB notifications for preview queued/running/completed/failed transitions in app/Jobs/GenerateRestorePreviewJob.php
- [ ] T048 [US4] Enforce preview/dry-run read-only behavior: block write-capable operations and record a safe failure if a write would occur (in app/Jobs/GenerateRestorePreviewJob.php and/or restore diff generation service)
**Checkpoint**: Preview is asynchronous, persisted, and visible.
---
## Phase 7: Phase 2 - Global Progress Widget (All Run Types)
- [ ] T037 [P] Add a global progress widget for restore runs (Phase 2 requirement) by extending app/Livewire/BulkOperationProgress.php or adding a dedicated Livewire component in app/Livewire/RestoreRunProgress.php
---
## Phase 8: Polish & Cross-Cutting Concerns
- [ ] T038 Ensure Graph throttling/backoff behavior is applied inside queued jobs (429/503) in app/Services/Intune/PolicySnapshotService.php and app/Services/Intune/RestoreService.php
- [ ] T039 [P] Add/extend run status notification formatting to include safe error codes/contexts in app/Notifications/RunStatusChangedNotification.php
- [ ] T040 Run formatter on modified files: vendor/bin/pint --dirty
- [ ] T041 Run targeted tests for affected areas: tests/Feature/*Restore* tests/Feature/*BackupSet* tests/Feature/*Policy* (use php artisan test with filters)
---
## Dependencies & Execution Order
### Story order
- Phase 1 → Phase 2 must complete first.
- After Phase 2:
- US1 and US3 can proceed in parallel.
- US4 can proceed in parallel but may be easiest after US3 (shared RestoreRun patterns).
- US2 can proceed independently after Phase 2.
### Dependency graph
- Setup → Foundational → { US1, US2, US3, US4 } → Polish
- Setup → Foundational → { US1, US2, US3, US4 } → Phase 2 Global Widget → Polish
- Suggested minimal MVP: Setup → Foundational → US1
---
## Parallel execution examples
### US1
- In parallel: T011 (queues test), T012 (idempotency test)
- In parallel: T013 (job), T014 (UI action update) after foundational tasks
### US2
- In parallel: T024 (create queues test), T025 (picker queues test)
- In parallel: T027 (job) and T029 (job) after BackupService refactor task T026
### US3
- In parallel: T017 (idempotency test), T018 (job behavior test)
- In parallel: T021 (job notifications) and T023 (UI view enhancements) once results format is defined
### US4
- In parallel: T031 (queues test), T032 (persistence test)
- In parallel: T033 (job) and T035 (view reads persisted preview) once run persistence shape is agreed
---
## Implementation strategy
- MVP (fastest value): deliver US1 first (policy snapshot capture becomes queued + idempotent + observable).
- Next: US3 + US4 to fully de-risk restore execution and preview.
- Then: US2 to eliminate inline Graph work from backup set flows.
## Format validation
All tasks above follow the required checklist format:
`- [ ] T### [P?] [US#?] Description with file path`