Summary Kurz: Implementiert Feature 054 — canonical OperationRun-flow, Monitoring UI, dispatch-safety, notifications, dedupe, plus small UX safety clarifications (RBAC group search delegated; Restore group mapping DB-only). What Changed Core service: OperationRun lifecycle, dedupe and dispatch helpers — OperationRunService.php. Model + migration: OperationRun model and migration — OperationRun.php, 2026_01_16_180642_create_operation_runs_table.php. Notifications: queued + terminal DB notifications (initiator-only) — OperationRunQueued.php, OperationRunCompleted.php. Monitoring UI: Filament list/detail + Livewire pieces (DB-only render) — OperationRunResource.php and related pages/views. Start surfaces / Jobs: instrumented start surfaces, job middleware, and job updates to use canonical runs — multiple app/Jobs/* and app/Filament/* updates (see tests for full coverage). RBAC + Restore UX clarifications: RBAC group search is delegated-Graph-based and disabled without delegated token; Restore group mapping remains DB-only (directory cache) and helper text always visible — TenantResource.php, RestoreRunResource.php. Specs / Constitution: updated spec & quickstart and added one-line constitution guideline about Graph usage: spec.md quickstart.md constitution.md Tests & Verification Unit / Feature tests added/updated for run lifecycle, notifications, idempotency, and UI guards: see tests/Feature/* (notably OperationRunServiceTest, MonitoringOperationsTest, OperationRunNotificationTest, and various Filament feature tests). Full test run locally: ./vendor/bin/sail artisan test → 587 passed, 5 skipped. Migrations Adds create_operation_runs_table migration; run php artisan migrate in staging after review. Notes / Rationale Monitoring pages are explicitly DB-only at render time (no Graph calls). Start surfaces enqueue work only and return a “View run” link. Delegated Graph access is used only for explicit user actions (RBAC group search); restore mapping intentionally uses cached DB data only to avoid render-time Graph calls. Dispatch wrapper marks runs failed immediately if background dispatch throws synchronously to avoid misleading “queued” states. Upgrade / Deploy Considerations Run migrations: ./vendor/bin/sail artisan migrate. Background workers should be running to process queued jobs (recommended to monitor queue health during rollout). No secret or token persistence changes. PR checklist Tests updated/added for changed behavior Specs updated: 054-unify-runs-suitewide docs + quickstart Constitution note added (.specify) Pint formatting applied Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #63
3.9 KiB
Research: Backup/Restore Job Orchestration (049)
This document resolves Phase 0 open questions and records design choices.
Decisions
1) Run Record storage strategy
Decision: Reuse existing run-record primitives instead of introducing a brand-new “unified run” subsystem in Phase 1.
- Restore + re-run restore + dry-run/preview: use the existing
restore_runstable /App\Models\RestoreRun. - Backup set capture-like operations (e.g., “add policies and capture”): reuse
bulk_operation_runs/App\Models\BulkOperationRun(already used for long-running background work like bulk exports) and (if needed) extend it to satisfy FR-002 fields.
Rationale:
- The codebase already has multiple proven “run tables” (
restore_runs,inventory_sync_runs,backup_schedule_runs,bulk_operation_runs). - Minimizes migration risk and avoids broad refactors.
- Lets Phase 1 focus on eliminating inline heavy work while keeping UX consistent.
Alternatives considered:
- Create a new generic
operation_runs+operation_run_itemsdata model for all queued automation.- Rejected (Phase 1): higher migration + backfill cost; high coordination risk across many features.
2) Status lifecycle mapping
Decision: Standardize at the UI + plan level on queued → running → (succeeded | failed | partial) while allowing underlying storage to keep its existing status vocabulary.
BulkOperationRun.statusmapping:pending→queued,running→running,completed→succeeded,completed_with_errors→partial,failed/aborted→failed.RestoreRun.statusmapping will be aligned (e.g.,pending→queued,running→running, etc.) as part of implementation.
Rationale:
- Keeps the spec’s lifecycle consistent without forcing an immediate cross-table refactor.
Alternatives considered:
- Rename and normalize all run statuses across all run tables.
- Rejected (Phase 1): touches many workflows and tests.
3) Idempotency & de-duplication
Decision: Enforce de-duplication for active runs via a deterministic key and a DB query gate, with an optional lock for race reduction.
- Dedupe key format:
tenant_id + operation_type + target_object_id(plus a stable hash of relevant payload if needed). - Behavior: if an identical run is
queued/running, reuse it and return/link to it; allow a new run only after terminal.
Rationale:
- Matches the constitution (“Operations / Run Observability Standard”) and aligns with existing patterns (inventory selection hash + schedule locks).
Alternatives considered:
- Cache-only locks (
Cache::lock(...)) without persisted keys.- Rejected: harder to reason about after restarts; less observable.
4) Restore preview must be asynchronous
Decision: Move restore preview generation (“Generate preview” in the wizard) into a queued job which persists preview outputs to the run record.
Rationale:
- Preview can require Graph calls and normalization work; it should never block an interactive request.
Alternatives considered:
- Keep preview synchronous and increase timeouts.
- Rejected: timeouts, poor UX, and violates FR-001.
5) Notifications for progress visibility
Decision: Use DB notifications for state transitions (queued/running/terminal) and keep a Run detail view as the primary progress surface in Phase 1.
Rationale:
- Inventory sync + backup schedule runs already use this pattern.
- Survives page reloads and doesn’t require the user to keep the page open.
Alternatives considered:
- Frontend polling only (no DB notifications).
- Rejected: weaker UX and weaker observability.
Clarifications resolved
- SC-003 includes “canceled” while Phase 1 explicitly has “no cancel”.
- Resolution for Phase 1 planning: treat “canceled” as out-of-scope (Phase 2+) and map “aborted” (if present) into the
failedbucket for SC accounting.
- Resolution for Phase 1 planning: treat “canceled” as out-of-scope (Phase 2+) and map “aborted” (if present) into the