TenantAtlas/specs/049-backup-restore-job-orchestration/contracts/admin-runs.openapi.yaml
ahmido bcf4996a1e feat/049-backup-restore-job-orchestration (#56)
Summary

This PR implements Spec 049 – Backup/Restore Job Orchestration: all critical Backup/Restore execution paths are job-only, idempotent, tenant-scoped, and observable via run records + DB notifications (Phase 1). The UI no longer performs heavy Graph work inside request/Filament actions for these flows.

Why

We want predictable UX and operations at MSP scale:
	•	no timeouts / long-running requests
	•	reproducible run state + per-item results
	•	safe error persistence (no secrets / no token leakage)
	•	strict tenant isolation + auditability for write paths

What changed

Foundational (Runs + Idempotency + Observability)
	•	Added a shared RunIdempotency helper (dedupe while queued/running).
	•	Added a read-only BulkOperationRuns surface (list + view) for status/progress.
	•	Added DB notifications for run status changes (with “View run” link).

US1 – Policy “Capture snapshot” is job-only
	•	Policy detail “Capture snapshot” now:
	•	creates/reuses a run (dedupe key: tenant + policy.capture_snapshot + policy DB id)
	•	dispatches a queued job
	•	returns immediately with notification + link to run detail
	•	Graph capture work moved fully into the job; request path stays Graph-free.

US3 – Restore runs orchestration is job-only + safe
	•	Live restore execution is queued and updates RestoreRun status/progress.
	•	Per-item outcomes are persisted deterministically (per internal DB record).
	•	Audit logging is written for live restore.
	•	Preview/dry-run is enforced as read-only (no writes).

Tenant isolation / authorization (non-negotiable)
	•	Run list/view/start are tenant-scoped and policy-guarded (cross-tenant access => 403, not 404).
	•	Explicit Pest tests cover cross-tenant denial and start authorization.

Tests / Verification
	•	./vendor/bin/pint --dirty
	•	Targeted suite (examples):
	•	policy capture snapshot queued + idempotency tests
	•	restore orchestration + audit logging + preview read-only tests
	•	run authorization / tenant isolation tests

Notes / Scope boundaries
	•	Phase 1 UX = DB notifications + run detail page. A global “progress widget” is tracked as Phase 2 and not required for merge.
	•	Resilience/backoff is tracked in tasks but can be iterated further after merge.

Review focus
	•	Dedupe behavior for queued/running runs (reuse vs create-new)
	•	Tenant scoping & policy gates for all run surfaces
	•	Restore safety: audit event + preview no-writes

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #56
2026-01-11 15:59:06 +00:00

170 lines
4.2 KiB
YAML

openapi: 3.0.3
info:
title: TenantPilot Admin Run Orchestration (049)
version: 0.1.0
description: |
Internal admin contracts for starting long-running backup/restore operations
and reading run status/progress. These endpoints are tenant-scoped.
servers:
- url: /admin
paths:
/t/{tenantExternalId}/runs/{runType}:
post:
operationId: startRun
summary: Start a background run
description: |
Starts an operation by creating (or reusing) a Run Record and enqueueing
background work. Must return quickly.
parameters:
- in: path
name: tenantExternalId
required: true
schema:
type: string
- in: path
name: runType
required: true
schema:
type: string
enum:
- backup_set_add_policies
- restore_execute
- restore_preview
- snapshot_capture
requestBody:
required: false
content:
application/json:
schema:
$ref: '#/components/schemas/RunStartRequest'
responses:
'201':
description: Run created and queued
content:
application/json:
schema:
$ref: '#/components/schemas/RunStartResponse'
'200':
description: Existing active run reused
content:
application/json:
schema:
$ref: '#/components/schemas/RunStartResponse'
'403':
description: Forbidden
/t/{tenantExternalId}/runs/{runType}/{runId}:
get:
operationId: getRun
summary: Get run status and progress
parameters:
- in: path
name: tenantExternalId
required: true
schema:
type: string
- in: path
name: runType
required: true
schema:
type: string
- in: path
name: runId
required: true
schema:
type: string
responses:
'200':
description: Run record
content:
application/json:
schema:
$ref: '#/components/schemas/RunRecord'
'404':
description: Not found
components:
schemas:
RunStartRequest:
type: object
additionalProperties: false
properties:
targetObjectId:
type: string
nullable: true
description: Operation target used for de-duplication.
payloadHash:
type: string
nullable: true
description: Optional stable hash of relevant payload to strengthen idempotency.
itemIds:
type: array
items:
type: string
nullable: true
description: Optional internal item ids to process.
RunStartResponse:
type: object
required: [run]
properties:
reused:
type: boolean
default: false
run:
$ref: '#/components/schemas/RunRecord'
RunRecord:
type: object
required:
- id
- tenantExternalId
- type
- status
properties:
id:
type: string
tenantExternalId:
type: string
type:
type: string
status:
type: string
enum: [queued, running, succeeded, failed, partial]
createdAt:
type: string
format: date-time
startedAt:
type: string
format: date-time
nullable: true
finishedAt:
type: string
format: date-time
nullable: true
counts:
type: object
additionalProperties: false
properties:
total:
type: integer
minimum: 0
succeeded:
type: integer
minimum: 0
failed:
type: integer
minimum: 0
safeError:
type: object
nullable: true
additionalProperties: false
properties:
code:
type: string
context:
type: object
additionalProperties: true