TenantAtlas/specs/049-backup-restore-job-orchestration/plan.md
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

103 lines
4.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation Plan: Backup/Restore Job Orchestration (049)
**Branch**: `feat/049-backup-restore-job-orchestration-session-1768091854` | **Date**: 2026-01-11 | **Spec**: [specs/049-backup-restore-job-orchestration/spec.md](specs/049-backup-restore-job-orchestration/spec.md)
**Input**: Feature specification from `specs/049-backup-restore-job-orchestration/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/scripts/` for helper scripts.
## Summary
Move all backup/restore “start/execute” actions off the interactive request path.
- Interactive actions must only create (or reuse) a tenant-scoped Run Record and enqueue work.
- Background jobs perform Graph calls, capture/restore work, and update run records with status + counts + safe error summaries.
- Idempotency prevents double-click duplicates by reusing an active run for the same `(tenant + operation type + target)`.
Design choices are captured in [specs/049-backup-restore-job-orchestration/research.md](specs/049-backup-restore-job-orchestration/research.md).
## Phasing
### Phase 1 (this specs implementation target)
- Ensure all in-scope operations are job-only (no heavy work inline).
- Create/reuse run records with idempotency for active runs.
- Provide **Run detail** views for progress (status + counts) and **DB notifications** for state transitions.
### Phase 2 (explicitly out-of-scope for Phase 1)
- Add a **global progress widget** that surfaces all run types (not just bulk ops) across the admin UI.
## Technical Context
**Language/Version**: PHP 8.4.15
**Primary Dependencies**: Laravel 12, Filament 4, Livewire 3
**Storage**: PostgreSQL (JSONB used for run payloads/summaries where appropriate)
**Testing**: Pest 4 (feature tests + job tests)
**Target Platform**: Containerized web app (Sail for local dev; Dokploy for staging/prod)
**Project Type**: Web application (Laravel monolith)
**Performance Goals**: 95% of start actions confirm “queued” within 2 seconds (SC-001)
**Constraints**: No heavy work during interactive requests; jobs must be idempotent + observable; no secrets in run records
**Scale/Scope**: Multi-tenant MSP usage; long-running Graph operations; frequent retries/double-click scenarios
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
- Inventory-first: orchestration is run-record centric; inventory stays “last observed”, backups remain explicit actions.
- Read/write separation: preview/dry-run stays read-only; live restore remains behind explicit confirmation + audit + tests.
- Graph contract path: all Graph calls remain behind `GraphClientInterface` and contract registry (`config/graph_contracts.php`).
- Deterministic capabilities: no new capability derivation introduced by this feature (existing resolver remains authoritative).
- Tenant isolation: all run visibility + execution is tenant-scoped; no cross-tenant run access.
- Automation: enforce de-duplication for active runs; jobs use locks/backoff for 429/503 where applicable.
- Data minimization: run records store only safe summaries (error codes + whitelisted context), never secrets/tokens.
## Project Structure
### Documentation (this feature)
```text
specs/049-backup-restore-job-orchestration/
├── plan.md # This file (/speckit.plan command output)
├── research.md # Phase 0 output (/speckit.plan command)
├── data-model.md # Phase 1 output (/speckit.plan command)
├── quickstart.md # Phase 1 output (/speckit.plan command)
├── contracts/ # Phase 1 output (/speckit.plan command)
└── tasks.md # Phase 2 output (/speckit.tasks command - NOT created by /speckit.plan)
```
### Source Code (repository root)
```text
app/
├── Filament/
│ └── Resources/
├── Jobs/
├── Livewire/
├── Models/
├── Services/
└── Support/
database/
└── migrations/
resources/
└── views/
tests/
├── Feature/
└── Unit/
```
**Structure Decision**: Laravel monolith; orchestration implemented via queued jobs + run records in existing models/tables.
## Complexity Tracking
> **Fill ONLY if Constitution Check has violations that must be justified**
| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| [e.g., 4th project] | [current need] | [why 3 projects insufficient] |
| [e.g., Repository pattern] | [specific problem] | [why direct DB access insufficient] |
No constitution violations are required for this feature.