Summary This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows. Why We want predictable UX and operations at MSP scale: • no timeouts / long-running requests • reproducible run state + per-item results • safe error persistence (no secrets / no token leakage) • strict tenant isolation + auditability for write paths What changed Foundational (Runs + Idempotency + Observability) • Added a shared RunIdempotency helper (dedupe while queued/running). • Added a read-only BulkOperationRuns surface (list + view) for status/progress. • Added DB notifications for run status changes (with “View run” link). US1 – Policy “Capture snapshot” is job-only • Policy detail “Capture snapshot” now: • creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id) • dispatches a queued job • returns immediately with notification + link to run detail • Graph capture work moved fully into the job; request path stays Graph-free. US3 – Restore runs orchestration is job-only + safe • Live restore execution is queued and updates RestoreRun status/progress. • Per-item outcomes are persisted deterministically (per internal DB record). • Audit logging is written for live restore. • Preview/dry-run is enforced as read-only (no writes). Tenant isolation / authorization (non-negotiable) • Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404). • Explicit Pest tests cover cross-tenant denial and start authorization. Tests / Verification • ./vendor/bin/pint --dirty • Targeted suite (examples): • policy capture snapshot queued + idempotency tests • restore orchestration + audit logging + preview read-only tests • run authorization / tenant isolation tests Notes / Scope boundaries • Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge. • Resilience/backoff is tracked in tasks but can be iterated further after merge. Review focus • Dedupe behavior for queued/running runs (reuse vs create-new) • Tenant scoping & policy gates for all run surfaces • Restore safety: audit event + preview no-writes Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #56
95 lines
3.3 KiB
Markdown
95 lines
3.3 KiB
Markdown
# Data Model: Backup/Restore Job Orchestration (049)
|
|
|
|
This feature relies on existing “run record” models/tables and (optionally) extends them to meet the orchestration requirements.
|
|
|
|
## Entities
|
|
|
|
## 1) RestoreRun (`restore_runs`)
|
|
|
|
**Purpose:** Run record for restore executions and dry-run/preview workflows.
|
|
|
|
**Model:** `App\Models\RestoreRun`
|
|
|
|
**Key fields (existing):**
|
|
- `id` (PK)
|
|
- `tenant_id` (FK → tenants)
|
|
- `backup_set_id` (FK → backup_sets)
|
|
- `requested_by` (string|null)
|
|
- `is_dry_run` (bool)
|
|
- `status` (string)
|
|
- `requested_items` (json|null)
|
|
- `preview` (json|null) — persisted preview output
|
|
- `results` (json|null) — persisted execution output (may include per-item outcomes)
|
|
- `failure_reason` (text|null)
|
|
- `started_at` / `completed_at` (timestamp|null)
|
|
- `metadata` (json|null)
|
|
|
|
**Relationships:**
|
|
- `RestoreRun belongsTo Tenant`
|
|
- `RestoreRun belongsTo BackupSet`
|
|
|
|
**State transitions (target):**
|
|
- `queued → running → succeeded|failed|partial`
|
|
|
|
**Validation constraints (creation/dispatch):**
|
|
- tenant-scoped access required
|
|
- `backup_set_id` must belong to tenant
|
|
- preview/dry-run must not perform writes (constitution Read/Write Separation)
|
|
|
|
---
|
|
|
|
## 2) BulkOperationRun (`bulk_operation_runs`)
|
|
|
|
**Purpose:** Run record for background operations that process many internal items, including backup-set capture-like actions.
|
|
|
|
**Model:** `App\Models\BulkOperationRun`
|
|
|
|
**Key fields (existing):**
|
|
- `id` (PK)
|
|
- `tenant_id` (FK → tenants)
|
|
- `user_id` (FK → users)
|
|
- `resource` (string) — e.g. `policy`, `backup_set`
|
|
- `action` (string) — e.g. `export`, `add_policies`
|
|
- `status` (string) — `pending`, `running`, `completed`, `completed_with_errors`, `failed`, `aborted`
|
|
- `total_items`, `processed_items`, `succeeded`, `failed`, `skipped`
|
|
- `item_ids` (jsonb)
|
|
- `failures` (jsonb|null) — safe per-item error summaries
|
|
- `audit_log_id` (FK → audit_logs|null)
|
|
|
|
**Relationships:**
|
|
- `BulkOperationRun belongsTo Tenant`
|
|
- `BulkOperationRun belongsTo User`
|
|
|
|
**Recommended additions (to satisfy FR-002/FR-004 cleanly):**
|
|
- `idempotency_key` (string, indexed; uniqueness enforced for active statuses via partial index)
|
|
- `started_at` / `finished_at` (timestampTz)
|
|
- `error_code` (string|null)
|
|
- `error_context` (jsonb|null)
|
|
|
|
**State transitions (target):**
|
|
- `queued → running → succeeded|failed|partial`
|
|
- `pending` maps to `queued`
|
|
- `completed_with_errors` maps to `partial`
|
|
|
|
---
|
|
|
|
## 3) Notification Event (DB notifications)
|
|
|
|
**Purpose:** Persist state transitions and completion notices for the initiating user.
|
|
|
|
**Storage:** Laravel Notifications (DB channel).
|
|
|
|
**Payload shape (target):**
|
|
- `tenant_id`
|
|
- `run_type` (restore_run / bulk_operation_run)
|
|
- `run_id`
|
|
- `status` (queued/running/succeeded/failed/partial)
|
|
- `counts` (optional)
|
|
- `safe_error_code` + `safe_error_context` (optional)
|
|
|
|
## Notes on “per-item outcomes” (FR-005)
|
|
|
|
- For restore workflows, per-item outcomes can initially be stored in `restore_runs.results` as a structured JSON array/object keyed by internal item identifiers.
|
|
- For bulk operations, per-item outcomes are already persisted as `bulk_operation_runs.failures` plus the counter columns.
|
|
- If Phase 1 needs relational per-item tables for querying/filtering, introduce a dedicated “run item results” table per run type (Phase 2+ preferred).
|