Summary This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows. Why We want predictable UX and operations at MSP scale: • no timeouts / long-running requests • reproducible run state + per-item results • safe error persistence (no secrets / no token leakage) • strict tenant isolation + auditability for write paths What changed Foundational (Runs + Idempotency + Observability) • Added a shared RunIdempotency helper (dedupe while queued/running). • Added a read-only BulkOperationRuns surface (list + view) for status/progress. • Added DB notifications for run status changes (with “View run” link). US1 – Policy “Capture snapshot” is job-only • Policy detail “Capture snapshot” now: • creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id) • dispatches a queued job • returns immediately with notification + link to run detail • Graph capture work moved fully into the job; request path stays Graph-free. US3 – Restore runs orchestration is job-only + safe • Live restore execution is queued and updates RestoreRun status/progress. • Per-item outcomes are persisted deterministically (per internal DB record). • Audit logging is written for live restore. • Preview/dry-run is enforced as read-only (no writes). Tenant isolation / authorization (non-negotiable) • Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404). • Explicit Pest tests cover cross-tenant denial and start authorization. Tests / Verification • ./vendor/bin/pint --dirty • Targeted suite (examples): • policy capture snapshot queued + idempotency tests • restore orchestration + audit logging + preview read-only tests • run authorization / tenant isolation tests Notes / Scope boundaries • Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge. • Resilience/backoff is tracked in tasks but can be iterated further after merge. Review focus • Dedupe behavior for queued/running runs (reuse vs create-new) • Tenant scoping & policy gates for all run surfaces • Restore safety: audit event + preview no-writes Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #56
4.5 KiB
Implementation Plan: Backup/Restore Job Orchestration (049)
Branch: feat/049-backup-restore-job-orchestration-session-1768091854 | Date: 2026-01-11 | Spec: specs/049-backup-restore-job-orchestration/spec.md
Input: Feature specification from specs/049-backup-restore-job-orchestration/spec.md
Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.
Summary
Move all backup/restore “start/execute” actions off the interactive request path.
- Interactive actions must only create (or reuse) a tenant-scoped Run Record and enqueue work.
- Background jobs perform Graph calls, capture/restore work, and update run records with status + counts + safe error summaries.
- Idempotency prevents double-click duplicates by reusing an active run for the same
(tenant + operation type + target).
Design choices are captured in specs/049-backup-restore-job-orchestration/research.md.
Phasing
Phase 1 (this spec’s implementation target)
- Ensure all in-scope operations are job-only (no heavy work inline).
- Create/reuse run records with idempotency for active runs.
- Provide Run detail views for progress (status + counts) and DB notifications for state transitions.
Phase 2 (explicitly out-of-scope for Phase 1)
- Add a global progress widget that surfaces all run types (not just bulk ops) across the admin UI.
Technical Context
Language/Version: PHP 8.4.15
Primary Dependencies: Laravel 12, Filament 4, Livewire 3
Storage: PostgreSQL (JSONB used for run payloads/summaries where appropriate)
Testing: Pest 4 (feature tests + job tests)
Target Platform: Containerized web app (Sail for local dev; Dokploy for staging/prod)
Project Type: Web application (Laravel monolith)
Performance Goals: 95% of start actions confirm “queued” within 2 seconds (SC-001)
Constraints: No heavy work during interactive requests; jobs must be idempotent + observable; no secrets in run records
Scale/Scope: Multi-tenant MSP usage; long-running Graph operations; frequent retries/double-click scenarios
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
- Inventory-first: orchestration is run-record centric; inventory stays “last observed”, backups remain explicit actions.
- Read/write separation: preview/dry-run stays read-only; live restore remains behind explicit confirmation + audit + tests.
- Graph contract path: all Graph calls remain behind
GraphClientInterfaceand contract registry (config/graph_contracts.php). - Deterministic capabilities: no new capability derivation introduced by this feature (existing resolver remains authoritative).
- Tenant isolation: all run visibility + execution is tenant-scoped; no cross-tenant run access.
- Automation: enforce de-duplication for active runs; jobs use locks/backoff for 429/503 where applicable.
- Data minimization: run records store only safe summaries (error codes + whitelisted context), never secrets/tokens.
Project Structure
Documentation (this feature)
specs/049-backup-restore-job-orchestration/
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
Source Code (repository root)
app/
├── Filament/
│ └── Resources/
├── Jobs/
├── Livewire/
├── Models/
├── Services/
└── Support/
database/
└── migrations/
resources/
└── views/
tests/
├── Feature/
└── Unit/
Structure Decision: Laravel monolith; orchestration implemented via queued jobs + run records in existing models/tables.
Complexity Tracking
Fill ONLY if Constitution Check has violations that must be justified
| Violation | Why Needed | Simpler Alternative Rejected Because |
|---|---|---|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
No constitution violations are required for this feature.