TenantAtlas/specs/049-backup-restore-job-orchestration/data-model.md
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

3.3 KiB

Data Model: Backup/Restore Job Orchestration (049)

This feature relies on existing “run record” models/tables and (optionally) extends them to meet the orchestration requirements.

Entities

1) RestoreRun (restore_runs)

Purpose: Run record for restore executions and dry-run/preview workflows.

Model: App\Models\RestoreRun

Key fields (existing):

  • id (PK)
  • tenant_id (FK → tenants)
  • backup_set_id (FK → backup_sets)
  • requested_by (string|null)
  • is_dry_run (bool)
  • status (string)
  • requested_items (json|null)
  • preview (json|null) — persisted preview output
  • results (json|null) — persisted execution output (may include per-item outcomes)
  • failure_reason (text|null)
  • started_at / completed_at (timestamp|null)
  • metadata (json|null)

Relationships:

  • RestoreRun belongsTo Tenant
  • RestoreRun belongsTo BackupSet

State transitions (target):

  • queued → running → succeeded|failed|partial

Validation constraints (creation/dispatch):

  • tenant-scoped access required
  • backup_set_id must belong to tenant
  • preview/dry-run must not perform writes (constitution Read/Write Separation)

2) BulkOperationRun (bulk_operation_runs)

Purpose: Run record for background operations that process many internal items, including backup-set capture-like actions.

Model: App\Models\BulkOperationRun

Key fields (existing):

  • id (PK)
  • tenant_id (FK → tenants)
  • user_id (FK → users)
  • resource (string) — e.g. policy, backup_set
  • action (string) — e.g. export, add_policies
  • status (string) — pending, running, completed, completed_with_errors, failed, aborted
  • total_items, processed_items, succeeded, failed, skipped
  • item_ids (jsonb)
  • failures (jsonb|null) — safe per-item error summaries
  • audit_log_id (FK → audit_logs|null)

Relationships:

  • BulkOperationRun belongsTo Tenant
  • BulkOperationRun belongsTo User

Recommended additions (to satisfy FR-002/FR-004 cleanly):

  • idempotency_key (string, indexed; uniqueness enforced for active statuses via partial index)
  • started_at / finished_at (timestampTz)
  • error_code (string|null)
  • error_context (jsonb|null)

State transitions (target):

  • queued → running → succeeded|failed|partial
    • pending maps to queued
    • completed_with_errors maps to partial

3) Notification Event (DB notifications)

Purpose: Persist state transitions and completion notices for the initiating user.

Storage: Laravel Notifications (DB channel).

Payload shape (target):

  • tenant_id
  • run_type (restore_run / bulk_operation_run)
  • run_id
  • status (queued/running/succeeded/failed/partial)
  • counts (optional)
  • safe_error_code + safe_error_context (optional)

Notes on “per-item outcomes” (FR-005)

  • For restore workflows, per-item outcomes can initially be stored in restore_runs.results as a structured JSON array/object keyed by internal item identifiers.
  • For bulk operations, per-item outcomes are already persisted as bulk_operation_runs.failures plus the counter columns.
  • If Phase 1 needs relational per-item tables for querying/filtering, introduce a dedicated “run item results” table per run type (Phase 2+ preferred).