TenantAtlas/specs/049-backup-restore-job-orchestration/plan.md
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

4.5 KiB
Raw Blame History

Implementation Plan: Backup/Restore Job Orchestration (049)

Branch: feat/049-backup-restore-job-orchestration-session-1768091854 | Date: 2026-01-11 | Spec: specs/049-backup-restore-job-orchestration/spec.md Input: Feature specification from specs/049-backup-restore-job-orchestration/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

Move all backup/restore “start/execute” actions off the interactive request path.

  • Interactive actions must only create (or reuse) a tenant-scoped Run Record and enqueue work.
  • Background jobs perform Graph calls, capture/restore work, and update run records with status + counts + safe error summaries.
  • Idempotency prevents double-click duplicates by reusing an active run for the same (tenant + operation type + target).

Design choices are captured in specs/049-backup-restore-job-orchestration/research.md.

Phasing

Phase 1 (this specs implementation target)

  • Ensure all in-scope operations are job-only (no heavy work inline).
  • Create/reuse run records with idempotency for active runs.
  • Provide Run detail views for progress (status + counts) and DB notifications for state transitions.

Phase 2 (explicitly out-of-scope for Phase 1)

  • Add a global progress widget that surfaces all run types (not just bulk ops) across the admin UI.

Technical Context

Language/Version: PHP 8.4.15
Primary Dependencies: Laravel 12, Filament 4, Livewire 3
Storage: PostgreSQL (JSONB used for run payloads/summaries where appropriate)
Testing: Pest 4 (feature tests + job tests)
Target Platform: Containerized web app (Sail for local dev; Dokploy for staging/prod) Project Type: Web application (Laravel monolith)
Performance Goals: 95% of start actions confirm “queued” within 2 seconds (SC-001)
Constraints: No heavy work during interactive requests; jobs must be idempotent + observable; no secrets in run records
Scale/Scope: Multi-tenant MSP usage; long-running Graph operations; frequent retries/double-click scenarios

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

  • Inventory-first: orchestration is run-record centric; inventory stays “last observed”, backups remain explicit actions.
  • Read/write separation: preview/dry-run stays read-only; live restore remains behind explicit confirmation + audit + tests.
  • Graph contract path: all Graph calls remain behind GraphClientInterface and contract registry (config/graph_contracts.php).
  • Deterministic capabilities: no new capability derivation introduced by this feature (existing resolver remains authoritative).
  • Tenant isolation: all run visibility + execution is tenant-scoped; no cross-tenant run access.
  • Automation: enforce de-duplication for active runs; jobs use locks/backoff for 429/503 where applicable.
  • Data minimization: run records store only safe summaries (error codes + whitelisted context), never secrets/tokens.

Project Structure

Documentation (this feature)

specs/049-backup-restore-job-orchestration/
├── plan.md              # This file (/speckit.plan command output)
├── research.md          # Phase 0 output (/speckit.plan command)
├── data-model.md        # Phase 1 output (/speckit.plan command)
├── quickstart.md        # Phase 1 output (/speckit.plan command)
├── contracts/           # Phase 1 output (/speckit.plan command)
└── tasks.md             # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)

Source Code (repository root)

app/
├── Filament/
│   └── Resources/
├── Jobs/
├── Livewire/
├── Models/
├── Services/
└── Support/

database/
└── migrations/

resources/
└── views/

tests/
├── Feature/
└── Unit/

Structure Decision: Laravel monolith; orchestration implemented via queued jobs + run records in existing models/tables.

Complexity Tracking

Fill ONLY if Constitution Check has violations that must be justified

Violation Why Needed Simpler Alternative Rejected Because
[e.g., 4th project] [current need] [why 3 projects insufficient]
[e.g., Repository pattern] [specific problem] [why direct DB access insufficient]

No constitution violations are required for this feature.