ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56 )

Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
• no timeouts / long-running requests
• reproducible run state + per-item results
• safe error persistence (no secrets / no token leakage)
• strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
• Added a shared RunIdempotency helper (dedupe while queued/running).
• Added a read-only BulkOperationRuns surface (list + view) for status/progress.
• Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
• Policy detail “Capture snapshot” now:
• creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
• dispatches a queued job
• returns immediately with notification + link to run detail
• Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
• Live restore execution is queued and updates RestoreRun status/progress.
• Per-item outcomes are persisted deterministically (per internal DB record).
• Audit logging is written for live restore.
• Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
• Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
• Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
• ./vendor/bin/pint --dirty
• Targeted suite (examples):
• policy capture snapshot queued + idempotency tests
• restore orchestration + audit logging + preview read-only tests
• run authorization / tenant isolation tests

Notes / Scope boundaries
• Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
• Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
• Dedupe behavior for queued/running runs (reuse vs create-new)
• Tenant scoping & policy gates for all run surfaces
• Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56

2026-01-11 15:59:06 +00:00

4.5 KiB

Raw Permalink Blame History

Implementation Plan: Backup/Restore Job Orchestration (049)

Branch: feat/049-backup-restore-job-orchestration-session-1768091854 | Date: 2026-01-11 | Spec: specs/049-backup-restore-job-orchestration/spec.md Input: Feature specification from specs/049-backup-restore-job-orchestration/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

Move all backup/restore “start/execute” actions off the interactive request path.

Interactive actions must only create (or reuse) a tenant-scoped Run Record and enqueue work.
Background jobs perform Graph calls, capture/restore work, and update run records with status + counts + safe error summaries.
Idempotency prevents double-click duplicates by reusing an active run for the same (tenant + operation type + target).

Design choices are captured in specs/049-backup-restore-job-orchestration/research.md.

Phasing

Phase 1 (this spec’s implementation target)

Ensure all in-scope operations are job-only (no heavy work inline).
Create/reuse run records with idempotency for active runs.
Provide Run detail views for progress (status + counts) and DB notifications for state transitions.

Phase 2 (explicitly out-of-scope for Phase 1)

Add a global progress widget that surfaces all run types (not just bulk ops) across the admin UI.

Technical Context

Language/Version: PHP 8.4.15
Primary Dependencies: Laravel 12, Filament 4, Livewire 3
Storage: PostgreSQL (JSONB used for run payloads/summaries where appropriate)
Testing: Pest 4 (feature tests + job tests)
Target Platform: Containerized web app (Sail for local dev; Dokploy for staging/prod) Project Type: Web application (Laravel monolith)
Performance Goals: 95% of start actions confirm “queued” within 2 seconds (SC-001)
Constraints: No heavy work during interactive requests; jobs must be idempotent + observable; no secrets in run records
Scale/Scope: Multi-tenant MSP usage; long-running Graph operations; frequent retries/double-click scenarios

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Inventory-first: orchestration is run-record centric; inventory stays “last observed”, backups remain explicit actions.
Read/write separation: preview/dry-run stays read-only; live restore remains behind explicit confirmation + audit + tests.
Graph contract path: all Graph calls remain behind GraphClientInterface and contract registry (config/graph_contracts.php).
Deterministic capabilities: no new capability derivation introduced by this feature (existing resolver remains authoritative).
Tenant isolation: all run visibility + execution is tenant-scoped; no cross-tenant run access.
Automation: enforce de-duplication for active runs; jobs use locks/backoff for 429/503 where applicable.
Data minimization: run records store only safe summaries (error codes + whitelisted context), never secrets/tokens.

Project Structure

Documentation (this feature)

specs/049-backup-restore-job-orchestration/
├── plan.md              # This file (/speckit.plan command output)
├── research.md          # Phase 0 output (/speckit.plan command)
├── data-model.md        # Phase 1 output (/speckit.plan command)
├── quickstart.md        # Phase 1 output (/speckit.plan command)
├── contracts/           # Phase 1 output (/speckit.plan command)
└── tasks.md             # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)

Source Code (repository root)

app/
├── Filament/
│   └── Resources/
├── Jobs/
├── Livewire/
├── Models/
├── Services/
└── Support/

database/
└── migrations/

resources/
└── views/

tests/
├── Feature/
└── Unit/

Structure Decision: Laravel monolith; orchestration implemented via queued jobs + run records in existing models/tables.

Complexity Tracking

Fill ONLY if Constitution Check has violations that must be justified

Violation	Why Needed	Simpler Alternative Rejected Because
[e.g., 4th project]	[current need]	[why 3 projects insufficient]
[e.g., Repository pattern]	[specific problem]	[why direct DB access insufficient]

No constitution violations are required for this feature.

4.5 KiB Raw Permalink Blame History Unescape Escape

Implementation Plan: Backup/Restore Job Orchestration (049)

Summary

Phasing

Phase 1 (this spec’s implementation target)

Phase 2 (explicitly out-of-scope for Phase 1)

Technical Context

Constitution Check

Project Structure

Documentation (this feature)

Source Code (repository root)

Complexity Tracking

4.5 KiB

Raw Permalink Blame History