TenantAtlas/specs/049-backup-restore-job-orchestration/data-model.md
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

95 lines
3.3 KiB
Markdown

# Data Model: Backup/Restore Job Orchestration (049)
This feature relies on existing “run record” models/tables and (optionally) extends them to meet the orchestration requirements.
## Entities
## 1) RestoreRun (`restore_runs`)
**Purpose:** Run record for restore executions and dry-run/preview workflows.
**Model:** `App\Models\RestoreRun`
**Key fields (existing):**
- `id` (PK)
- `tenant_id` (FK → tenants)
- `backup_set_id` (FK → backup_sets)
- `requested_by` (string|null)
- `is_dry_run` (bool)
- `status` (string)
- `requested_items` (json|null)
- `preview` (json|null) — persisted preview output
- `results` (json|null) — persisted execution output (may include per-item outcomes)
- `failure_reason` (text|null)
- `started_at` / `completed_at` (timestamp|null)
- `metadata` (json|null)
**Relationships:**
- `RestoreRun belongsTo Tenant`
- `RestoreRun belongsTo BackupSet`
**State transitions (target):**
- `queued → running → succeeded|failed|partial`
**Validation constraints (creation/dispatch):**
- tenant-scoped access required
- `backup_set_id` must belong to tenant
- preview/dry-run must not perform writes (constitution Read/Write Separation)
---
## 2) BulkOperationRun (`bulk_operation_runs`)
**Purpose:** Run record for background operations that process many internal items, including backup-set capture-like actions.
**Model:** `App\Models\BulkOperationRun`
**Key fields (existing):**
- `id` (PK)
- `tenant_id` (FK → tenants)
- `user_id` (FK → users)
- `resource` (string) — e.g. `policy`, `backup_set`
- `action` (string) — e.g. `export`, `add_policies`
- `status` (string) — `pending`, `running`, `completed`, `completed_with_errors`, `failed`, `aborted`
- `total_items`, `processed_items`, `succeeded`, `failed`, `skipped`
- `item_ids` (jsonb)
- `failures` (jsonb|null) — safe per-item error summaries
- `audit_log_id` (FK → audit_logs|null)
**Relationships:**
- `BulkOperationRun belongsTo Tenant`
- `BulkOperationRun belongsTo User`
**Recommended additions (to satisfy FR-002/FR-004 cleanly):**
- `idempotency_key` (string, indexed; uniqueness enforced for active statuses via partial index)
- `started_at` / `finished_at` (timestampTz)
- `error_code` (string|null)
- `error_context` (jsonb|null)
**State transitions (target):**
- `queued → running → succeeded|failed|partial`
- `pending` maps to `queued`
- `completed_with_errors` maps to `partial`
---
## 3) Notification Event (DB notifications)
**Purpose:** Persist state transitions and completion notices for the initiating user.
**Storage:** Laravel Notifications (DB channel).
**Payload shape (target):**
- `tenant_id`
- `run_type` (restore_run / bulk_operation_run)
- `run_id`
- `status` (queued/running/succeeded/failed/partial)
- `counts` (optional)
- `safe_error_code` + `safe_error_context` (optional)
## Notes on “per-item outcomes” (FR-005)
- For restore workflows, per-item outcomes can initially be stored in `restore_runs.results` as a structured JSON array/object keyed by internal item identifiers.
- For bulk operations, per-item outcomes are already persisted as `bulk_operation_runs.failures` plus the counter columns.
- If Phase 1 needs relational per-item tables for querying/filtering, introduce a dedicated “run item results” table per run type (Phase 2+ preferred).