From 11f7209783a7636c610e37f8bf1fda2d00a67e27 Mon Sep 17 00:00:00 2001 From: Ahmed Darrazi Date: Tue, 10 Feb 2026 01:35:24 +0100 Subject: [PATCH] spec(086): retire legacy runs into operation runs --- .github/agents/copilot-instructions.md | 4 +- .../checklists/requirements.md | 34 ++++ .../contracts/README.md | 41 +++++ .../data-model.md | 131 ++++++++++++++ .../plan.md | 109 ++++++++++++ .../quickstart.md | 42 +++++ .../research.md | 87 ++++++++++ .../spec.md | 160 ++++++++++++++++++ .../tasks.md | 141 +++++++++++++++ 9 files changed, 747 insertions(+), 2 deletions(-) create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/checklists/requirements.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/contracts/README.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/data-model.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/plan.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/quickstart.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/research.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/spec.md create mode 100644 specs/086-retire-legacy-runs-into-operation-runs/tasks.md diff --git a/.github/agents/copilot-instructions.md b/.github/agents/copilot-instructions.md index 1af0594..cdbdd4f 100644 --- a/.github/agents/copilot-instructions.md +++ b/.github/agents/copilot-instructions.md @@ -43,9 +43,9 @@ ## Code Style PHP 8.4.15: Follow standard conventions ## Recent Changes +- 086-retire-legacy-runs-into-operation-runs: Spec docs updated (PHP 8.4.15 + Laravel 12, Filament v5, Livewire v4) - 084-verification-surfaces-unification: Added PHP 8.4 (Laravel 12) + Filament v5 (Livewire v4), Queue/Jobs (Laravel), Microsoft Graph via `GraphClientInterface` - 082-action-surface-contract: Added PHP 8.4.x + Laravel 12, Filament v5, Livewire v4 - - + diff --git a/specs/086-retire-legacy-runs-into-operation-runs/checklists/requirements.md b/specs/086-retire-legacy-runs-into-operation-runs/checklists/requirements.md new file mode 100644 index 0000000..6e4ec74 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/checklists/requirements.md @@ -0,0 +1,34 @@ +# Specification Quality Checklist: Retire Legacy Runs Into Operation Runs + +**Purpose**: Validate specification completeness and quality before proceeding to planning +**Created**: 2026-02-09 +**Feature**: [../spec.md](../spec.md) + +## Content Quality + +- [x] No implementation details (languages, frameworks, APIs) +- [x] Focused on user value and business needs +- [x] Written for non-technical stakeholders +- [x] All mandatory sections completed + +## Requirement Completeness + +- [x] No [NEEDS CLARIFICATION] markers remain +- [x] Requirements are testable and unambiguous +- [x] Success criteria are measurable +- [x] Success criteria are technology-agnostic (no implementation details) +- [x] All acceptance scenarios are defined +- [x] Edge cases are identified +- [x] Scope is clearly bounded +- [x] Dependencies and assumptions identified + +## Feature Readiness + +- [x] All functional requirements have clear acceptance criteria +- [x] User scenarios cover primary flows +- [x] Feature meets measurable outcomes defined in Success Criteria +- [x] No implementation details leak into specification + +## Notes + +- The spec uses product-specific terms (e.g., 404 vs 403 semantics) to make authorization behavior testable, but avoids naming specific frameworks or code-level implementation choices. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/contracts/README.md b/specs/086-retire-legacy-runs-into-operation-runs/contracts/README.md new file mode 100644 index 0000000..e5f4ee2 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/contracts/README.md @@ -0,0 +1,41 @@ +# Contracts (Spec 086) + +This spec does not introduce a new public HTTP API surface. + +## Canonical OperationRun contract (internal) + +Spec 086 tightens and standardizes the internal contract for how operations are created, identified, and displayed. + +### Run creation contract + +- Start surfaces must create the `operation_runs` row **before** dispatching asynchronous work. +- Jobs must receive the `OperationRun` (or its id) and must **not** attempt a fallback-create. + +### Identity / idempotency contract + +Operation run identity is enforced by a partial unique index for active states. + +Planned identity rules by type: +- `inventory.sync` and `directory_groups.sync`: deterministic identity (while-active dedupe) +- `backup_schedule.run_now` and `backup_schedule.retry`: unique-per-click identity (nonce) +- `backup_schedule.scheduled`: deterministic identity by `(backup_schedule_id, scheduled_for)` (strict) + +### Context contract (selected keys) + +The `operation_runs.context` JSON is used for: +- “Target” display (via `target_scope`) +- “Related” deep links (via `OperationRunLinks::related`) +- provenance (trigger source, schedule id, initiating user) + +Keys referenced in existing UI code: +- `provider_connection_id` +- `backup_schedule_id` +- `backup_schedule_run_id` +- `restore_run_id` +- `target_scope` + +## Graph Contract Registry + +All Microsoft Graph calls remain required to go through `GraphClientInterface` and be modeled in `config/graph_contracts.php`. + +Spec 086 removes Graph calls from Filament render/search/label callbacks (DB-only rendering), and moves those lookups behind cached tables + asynchronous sync operations. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/data-model.md b/specs/086-retire-legacy-runs-into-operation-runs/data-model.md new file mode 100644 index 0000000..9084d5f --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/data-model.md @@ -0,0 +1,131 @@ +# Data Model (Spec 086) + +This feature consolidates execution tracking into `operation_runs` while keeping legacy run tables as read-only history. + +## Entities + +### 1) OperationRun (canonical) + +**Table:** `operation_runs` + +**Purpose:** Single source of truth for execution tracking: status/progress, results (counts), failures, provenance/context. + +**Fields (current):** +- `id` +- `workspace_id` (FK, required) +- `tenant_id` (FK, nullable) +- `user_id` (FK, nullable) +- `initiator_name` (string) +- `type` (string; see OperationRunType registry) +- `status` (string; queued|running|completed) +- `outcome` (string; pending|succeeded|partially_succeeded|failed|blocked…) +- `run_identity_hash` (string; deterministic hash for idempotency) +- `summary_counts` (json/array; normalized counts + key metadata) +- `failure_summary` (json/array; structured failures, sanitized) +- `context` (json/array; provenance + inputs + target scope) +- `started_at`, `completed_at`, `created_at`, `updated_at` + +**Indexes / constraints (current):** +- `(workspace_id, type, created_at)` and `(workspace_id, created_at)` +- `(tenant_id, type, created_at)` and `(tenant_id, created_at)` +- Partial unique indexes for active runs: + - tenant-scoped: unique `(tenant_id, run_identity_hash)` where `tenant_id IS NOT NULL` and `status IN ('queued','running')` + - workspace-scoped: unique `(workspace_id, run_identity_hash)` where `tenant_id IS NULL` and `status IN ('queued','running')` + +**Context contract (current patterns):** +The `context` JSON is used for “related links” and display. Existing keys include (non-exhaustive): +- `provider_connection_id` +- `backup_schedule_id` +- `backup_schedule_run_id` +- `backup_set_id` +- `policy_id` +- `restore_run_id` +- `target_scope` (nested object) +- `selection` and `idempotency` objects for bulk operations + +**Required additions for Spec 086 (planned):** +- New `type` values: + - `backup_schedule.scheduled` + - `directory_role_definitions.sync` +- Scheduled backup context keys: + - `backup_schedule_id` + - `scheduled_for` (UTC timestamp/minute) + - Optional `backup_schedule_run_id` if the legacy table remains for history during transition + +### 2) InventorySyncRun (legacy) + +**Table:** `inventory_sync_runs` + +**Purpose:** Historical record (read-only) for pre-cutover tracking. + +**Key fields:** +- `tenant_id` +- `selection_hash` +- `selection_payload` (nullable) +- status + timestamps + counters + +**Planned optional mapping:** +- Add nullable `operation_run_id` FK to enable deterministic redirect to canonical viewer when present. No backfill required. + +### 3) EntraGroupSyncRun (legacy) + +**Table:** `entra_group_sync_runs` + +**Purpose:** Historical record (read-only) for pre-cutover group sync tracking. + +**Key fields:** +- `tenant_id` +- `selection_key`, `slot_key` +- status + error fields + counters + +**Planned optional mapping:** +- Add nullable `operation_run_id` FK to enable deterministic redirect when present. + +### 4) BackupScheduleRun (legacy) + +**Table:** `backup_schedule_runs` + +**Purpose:** Historical record of backup schedule executions. + +**Planned behavior change:** +- Distinguish scheduled fires vs manual/retry at the OperationRun level by introducing `backup_schedule.scheduled` type. + +**Planned optional mapping:** +- Add nullable `operation_run_id` FK to enable deterministic redirect when present. + +### 5) RestoreRun (domain) + +**Table:** `restore_runs` + +**Purpose:** Domain workflow record (requested items, dry-run, preview/results). Execution tracking and “View run” uses `operation_runs`. + +**Current linkage approach:** +- Canonical runs store `restore_run_id` in `operation_runs.context` (used by `OperationRunLinks::related`). + +## Enumerations / Registries + +### OperationRunType + +**Location:** `app/Support/OperationRunType.php` + +**Planned additions:** +- `BackupScheduleScheduled = 'backup_schedule.scheduled'` +- `DirectoryRoleDefinitionsSync = 'directory_role_definitions.sync'` + +### OperationCatalog + +**Location:** `app/Support/OperationCatalog.php` + +**Planned additions:** +- Human label for `backup_schedule.scheduled` +- Human label for `directory_role_definitions.sync` +- Optional expected durations (if known) + +## State transitions + +### OperationRun + +- `queued` → `running` → `completed` +- `outcome` starts as `pending`, transitions to one of: `succeeded`, `partially_succeeded`, `failed`, `blocked`. + +The canonical update surface is `OperationRunService` (`dispatchOrFail`, `updateRun`, `appendFailures`, `incrementSummaryCounts`, etc.). diff --git a/specs/086-retire-legacy-runs-into-operation-runs/plan.md b/specs/086-retire-legacy-runs-into-operation-runs/plan.md new file mode 100644 index 0000000..2730ac7 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/plan.md @@ -0,0 +1,109 @@ +# Implementation Plan: Retire Legacy Runs Into Operation Runs + +**Branch**: `086-retire-legacy-runs-into-operation-runs` | **Date**: 2026-02-10 | **Spec**: `specs/086-retire-legacy-runs-into-operation-runs/spec.md` +**Input**: Feature specification from `specs/086-retire-legacy-runs-into-operation-runs/spec.md` + +## Summary + +Retire legacy “run tracking” tables as the primary execution tracker for in-scope operations (inventory sync, directory groups sync, backup schedule runs, restore execution, and directory role definitions sync) and make `operation_runs` the canonical source of truth. + +Key implementation approach: +- Use the existing tenantless canonical viewer `/admin/operations/{run}` (Filament page `TenantlessOperationRunViewer`) and ensure it remains DB-only at render time. +- Enforce the clarified 404/403 semantics for run viewing: non-members 404, members missing capability 403, where the view capability equals the start capability. +- Enforce dispatch-time OperationRun creation for every start surface; jobs never fallback-create. +- Apply explicit run identity rules per operation type (dedupe vs unique-per-click vs strict schedule dedupe), including strict scheduled backup idempotency: at most one canonical run ever per (schedule_id, intended fire-time). +- Remove Graph calls from UI render/search/label callbacks by using cached directory data (groups + role definitions) and “Sync now” operations. + +## Technical Context + +**Language/Version**: PHP 8.4.15 +**Primary Dependencies**: Laravel 12, Filament v5, Livewire v4 +**Storage**: PostgreSQL (via Sail) +**Testing**: Pest v4 (PHPUnit v12 runner) +**Target Platform**: Web application (Laravel + Filament admin panel) +**Project Type**: web +**Performance Goals**: Operations viewer + Monitoring pages render from DB state only; canonical viewer loads in ~2s under normal conditions +**Constraints**: No outbound HTTP in Monitoring/Operations rendering/search/label callbacks (OPS-EX-AUTH-001); dispatch-time OperationRun creation; jobs must never fallback-create; strict 404/403 isolation semantics +**Scale/Scope**: TenantPilot admin workflows; multiple operation families; staged cutover with legacy history preserved + +## Constitution Check + +*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* + +- Inventory-first: No change to the meaning of inventory vs snapshots/backups; this spec only changes execution tracking. +- Read/write separation: Start surfaces remain enqueue-only; destructive-like actions are not added here; audit logging remains required for mutations. +- Graph contract path: UI render/search/label callbacks must become DB-only; any remaining Graph calls stay behind `GraphClientInterface` + `config/graph_contracts.php`. +- Deterministic capabilities: Run viewing must be capability-gated using the existing capability registry (no raw strings). +- RBAC-UX: Enforce clarified semantics for run viewing: non-members 404, members missing capability 403; authorization enforced server-side via Policy/Gate. +- Workspace isolation: Canonical tenantless `/admin/operations/{run}` continues to enforce workspace membership (deny-as-not-found). +- Global search: `OperationRunResource` stays non-globally-searchable; no new global-search surfaces introduced. +- Run observability: All in-scope long-running/scheduled/remote operations are tracked via `OperationRun`; Monitoring pages remain DB-only. +- Automation: Scheduled backup run creation uses strict idempotency per schedule + intended fire-time. +- Badge semantics (BADGE-001): Run status/outcome badges already use `BadgeRenderer`; do not introduce ad-hoc mappings. +- Filament UI Action Surface Contract: Legacy resources remain read-only; canonical operations pages already define inspection affordances. + +## Project Structure + +### Documentation (this feature) + +```text +specs/086-retire-legacy-runs-into-operation-runs/ +├── plan.md +├── research.md +├── data-model.md +├── quickstart.md +├── contracts/ +│ └── README.md +└── tasks.md # To be created by /speckit.tasks +``` + +### Source Code (repository root) + +```text +app/ +├── Filament/ +│ ├── Pages/ +│ │ ├── Monitoring/ +│ │ └── Operations/ +│ └── Resources/ +├── Http/ +│ └── Middleware/ +├── Jobs/ +├── Models/ +├── Policies/ +├── Services/ +└── Support/ + +config/ +├── graph.php +└── graph_contracts.php + +database/ +└── migrations/ + +tests/ +├── Feature/ +└── Unit/ +``` + +**Structure Decision**: Laravel web application (monolith) with Filament admin panel. + +## Complexity Tracking + +No constitution violations are required for this feature. + +## Phase Plan + +Phase 0/1 deliverables are already captured in: +- `specs/086-retire-legacy-runs-into-operation-runs/research.md` +- `specs/086-retire-legacy-runs-into-operation-runs/data-model.md` +- `specs/086-retire-legacy-runs-into-operation-runs/contracts/README.md` +- `specs/086-retire-legacy-runs-into-operation-runs/quickstart.md` + +Phase 2 (tasks) will be produced via `/speckit.tasks` and should slice work by operation family: +1) Authorization: capability-gate canonical run viewing (404 vs 403 semantics). +2) Backup schedules: add `backup_schedule.scheduled` + strict idempotency; make manual runs unique-per-click. +3) Directory groups: stop writing legacy rows; keep legacy pages read-only; ensure dispatch-time OperationRun creation. +4) Inventory sync: stop writing legacy rows; ensure dispatch-time OperationRun creation and no UI Graph calls. +5) Tenant configuration: remove Graph calls from render/search/labels; add role definitions cache + “Sync now” operation. +6) Restore: ensure execution tracking uses OperationRun only; legacy restore domain records remain as domain entities. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/quickstart.md b/specs/086-retire-legacy-runs-into-operation-runs/quickstart.md new file mode 100644 index 0000000..89b6e92 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/quickstart.md @@ -0,0 +1,42 @@ +# Quickstart (Spec 086) + +This quickstart is for validating Spec 086 changes locally using Sail. + +## Prereqs + +- `vendor/bin/sail up -d` + +## Run formatting + +- `vendor/bin/sail bin pint --dirty` + +## Run targeted tests + +Use the minimal test subset relevant to the PR slice you are working on: + +- `vendor/bin/sail artisan test --compact --filter=OperationRun` +- `vendor/bin/sail artisan test --compact tests/Feature` (narrow further to the new/changed files) + +## Manual verification checklist + +### Canonical viewer + +- Trigger an operation that creates an `OperationRun`. +- Open the canonical URL (from notification action): `/admin/operations/{runId}`. +- Confirm the viewer renders from persisted DB state only. + +### Authorization semantics + +- As a non-workspace-member user, opening `/admin/operations/{runId}` returns 404. +- As a workspace member without the required capability for that run type, opening the viewer returns 403. + +### Dedupe semantics + +- Inventory sync / directory group sync: attempting to start while active reuses the existing active run and links to it. +- Manual backup schedule run now/retry: each click produces a distinct `OperationRun`. +- Scheduled backup: double-fire for the same schedule + intended minute produces at most one `OperationRun`. + +### DB-only forms + +- Tenant configuration selectors (directory groups, role definitions) render and search without outbound HTTP calls. +- “Sync now” actions enqueue operations and provide “View run” link. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/research.md b/specs/086-retire-legacy-runs-into-operation-runs/research.md new file mode 100644 index 0000000..9473199 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/research.md @@ -0,0 +1,87 @@ +# Research (Spec 086) + +This document resolves the unknowns needed to write an implementation plan for “Retire Legacy Runs Into Operation Runs”. It is based on repository inspection (no new external dependencies). + +## Decisions + +### 1) Canonical run viewer is already implemented; keep the route shape + +- **Decision:** Use the existing tenantless canonical viewer route `admin.operations.view` (path pattern `/admin/operations/{run}`) implemented by the Filament page `TenantlessOperationRunViewer`. +- **Rationale:** This already enforces “tenantless deep link” while still doing workspace / tenant entitlement checks server-side through `Gate::authorize('view', $run)`. +- **Alternatives considered:** Create a second viewer page or route. Rejected because it would introduce duplicate UX and increase the chance of policy drift. + +Repository anchors: +- Canonical viewer page: `app/Filament/Pages/Operations/TenantlessOperationRunViewer.php` +- Link helper: `app/Support/OperationRunLinks.php` +- Workspace selection middleware explicitly treats `/admin/operations/{id}` as workspace-optional: `app/Http/Middleware/EnsureWorkspaceSelected.php` + +### 2) OperationRun is persisted and DB-rendered; schema supports workspace-tenant and workspace-only runs + +- **Decision:** Treat `operation_runs` as the canonical persistence format for status/progress/results. +- **Rationale:** The schema already includes `workspace_id` (required) and `tenant_id` (nullable), enabling both tenant-plane and workspace-plane operations. +- **Alternatives considered:** Separate tables per operation family. Rejected because it breaks the Monitoring → Operations single source of truth principle. + +Repository anchors: +- Migrations: `database/migrations/2026_01_16_180642_create_operation_runs_table.php`, `database/migrations/2026_02_04_090030_add_workspace_id_to_operation_runs_table.php` +- Model: `app/Models/OperationRun.php` + +### 3) View authorization must be capability-gated per operation type (in addition to membership) + +- **Decision:** Extend run viewing authorization to require the same capability used to start the operation type. +- **Rationale:** Spec 086 clarifications require: non-members get 404; members without capability get 403; and the “view” capability equals the “start” capability. +- **Implementation approach (planned):** Update `OperationRunPolicy::view()` to: + 1) Keep existing workspace membership and tenant entitlement checks (deny-as-not-found). + 2) Resolve required capability from `OperationRun->type` using a centralized mapping helper. + 3) If capability is known and tenant-scoped, enforce `403` when the member lacks it. + +Repository anchors: +- Current policy (membership + tenant entitlement only): `app/Policies/OperationRunPolicy.php` +- Existing capability enforcement in start surfaces (examples): + - Inventory sync start: `Capabilities::TENANT_INVENTORY_SYNC_RUN` in `app/Filament/Resources/InventoryItemResource/Pages/ListInventoryItems.php` + - Directory groups sync start: `Capabilities::TENANT_SYNC` in `app/Filament/Resources/EntraGroupResource/Pages/ListEntraGroups.php` + - Backup schedule run/retry: `Capabilities::TENANT_BACKUP_SCHEDULES_RUN` in `app/Filament/Resources/BackupScheduleResource.php` + +### 4) Run identity / dedupe strategy varies by operation type + +- **Decision:** Use existing `OperationRunService` helpers but apply type-specific identity rules: + - `inventory.sync` and `directory_groups.sync`: **while-active dedupe** based on deterministic inputs (continue using `ensureRun(...)`-style identity). + - `backup_schedule.run_now` and `backup_schedule.retry`: **unique per click** (no dedupe). Create a new run each time by including a nonce in identity inputs (e.g., UUID). + - `backup_schedule.scheduled`: **strict dedupe** per `(backup_schedule_id, scheduled_for)`; create a new operation type `backup_schedule.scheduled` and use `ensureRunWithIdentity(...)` keyed by schedule + intended fire-time. +- **Rationale:** Matches explicit spec clarifications and protects against scheduler double-fire. +- **Alternatives considered:** + - Keep using `ensureRun(...)` for manual runs → rejected (dedupes while active). + - Use legacy table unique constraints as idempotency → rejected (spec requires OperationRun is canonical). + +Repository anchors: +- `ensureRun(...)` and `ensureRunWithIdentity(...)`: `app/Services/OperationRunService.php` +- Existing partial unique index for active runs: `operation_runs_active_unique_*` in the migrations above. + +### 5) Legacy run tables are real and currently written to; deterministic redirect requires an explicit mapping field + +- **Decision:** Legacy tables remain viewable and read-only, but should not be relied on for current execution tracking. +- **Rationale:** Spec requires “no new legacy rows” for in-scope operations. Today, some start surfaces still create legacy rows (e.g., inventory/group sync, backup schedule runs). +- **Planned design:** + - Stop creating new legacy rows as part of the cutover PRs. + - Implement legacy “view” redirect behavior only when a record has a canonical mapping. + - To make redirects deterministic without a backfill, add an optional `operation_run_id` FK column to legacy tables that we intend to redirect (only populated for rows created after the migration; older rows remain legacy-only view). +- **Alternatives considered:** Derive mapping by recomputing hashes and searching by time window. Rejected as non-deterministic and likely to pick the wrong run when identities collide historically. + +Repository anchors (legacy tables): +- Inventory sync runs: `database/migrations/2026_01_07_142719_create_inventory_sync_runs_table.php` +- Directory group sync runs: `database/migrations/2026_01_11_120004_create_entra_group_sync_runs_table.php` +- Backup schedule runs: `database/migrations/**create_backup_schedule_runs**` (used in `BackupScheduleResource`) +- Restore runs (domain): `database/migrations/2025_12_10_000150_create_restore_runs_table.php` + +### 6) DB-only rendering constraint is already enforced in Monitoring pages, but Tenant configuration forms still call Graph + +- **Decision:** Remove outbound Graph calls from configuration-form search/labels by introducing cached directory role definitions and using cached directory groups. +- **Rationale:** Constitution OPS-EX-AUTH-001 + Spec 086 FR-006/FR-015 require render/search/label resolution to be DB-only. +- **Repository finding:** `TenantResource` currently queries Graph for role definitions in selector callbacks. + +Repository anchors: +- Graph call sites inside UI callbacks: `app/Filament/Resources/TenantResource.php` (roleDefinitions search/label methods) + +## Open items (resolved enough for planning) + +- Exact schema for the new role definition cache tables and the sync job contract will be specified in `data-model.md` and implemented in Phase PR(s). +- The capability mapping for run viewing will be implemented via a centralized helper; the plan will enumerate required capabilities per in-scope operation type. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/spec.md b/specs/086-retire-legacy-runs-into-operation-runs/spec.md new file mode 100644 index 0000000..95ec441 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/spec.md @@ -0,0 +1,160 @@ +# Feature Specification: Retire Legacy Runs Into Operation Runs + +**Feature Branch**: `086-retire-legacy-runs-into-operation-runs` +**Created**: 2026-02-09 +**Status**: Draft +**Input**: User description: "Retire legacy run tracking into canonical operation runs, with DB-only rendering and dispatch-time run creation. Legacy run tables remain read-only history." + +## Clarifications + +### Session 2026-02-10 + +- Q: For manual backup schedule runs (`backup_schedule.run_now`) and retries (`backup_schedule.retry`), should the system dedupe while a run is active, or always create a new run per click? → A: Always create a new run per click (no dedupe). +- Q: Who may view the canonical run detail page (“View run”)? → A: Workspace members may view runs only if they also have the required capability for that operation type; non-members get 404, members without capability get 403. +- Q: Which capability should be required to view a run (“View run”)? → A: Use the same capability as starting that operation type. +- Q: For `backup_schedule.scheduled`, how should dedupe work? → A: Strict dedupe per schedule and intended fire-time (at most one run). +- Q: For the role definitions cache “Sync now” operation, should it use a new dedicated operation type or reuse an existing one? → A: Use a new dedicated operation type. + +## User Scenarios & Testing *(mandatory)* + +### User Story 1 - Start an operation with an immediate canonical run link (Priority: P1) + +As a workspace member, I can start long-running operations (inventory sync, directory groups sync, scheduled backups, restore execution, directory role definitions sync) and immediately receive a stable “View run” link that I can open and share. + +**Why this priority**: This removes the “run link appears later / changes” ambiguity, improves auditability, and prevents duplicate tracking paths. + +**Independent Test**: Trigger each supported operation start surface and verify a canonical run record exists before work begins, and that the canonical viewer loads from persisted state. + +**Acceptance Scenarios**: + +1. **Given** a workspace member with the required capability, **When** they start an inventory sync, **Then** a canonical run exists immediately and the UI shows a stable “View run” link. +2. **Given** a scheduled backup fire event, **When** the scheduler dispatches work, **Then** a canonical run exists immediately and the same fire event cannot create duplicates. +3. **Given** a workspace member without the required capability, **When** they attempt to start the operation, **Then** the request is rejected with a capability error (403) and no run is created. + +--- + +### User Story 2 - Monitor executions from a single canonical viewer (Priority: P2) + +As a workspace member, I can open an operations viewer link for any run and see status, progress, results, and errors without the page triggering outbound calls. + +Legacy “run history” pages remain available for older historical rows but cannot start or retry anything. + +**Why this priority**: A single viewer reduces support load, enables consistent deep linking, and avoids UI latency and rate-limiting from outbound calls. + +**Independent Test**: Load the canonical viewer and legacy history pages using outbound client fakes/mocks and assert no outbound calls occur during rendering/search. + +**Acceptance Scenarios**: + +1. **Given** a run exists, **When** a user opens its canonical operations link, **Then** the page renders only from persisted state and performs no outbound calls. +2. **Given** a legacy run history record that has a known canonical mapping, **When** a user opens the legacy “view” page, **Then** they are redirected to the canonical operations viewer. +3. **Given** a legacy run history record without a canonical mapping, **When** a user opens the legacy “view” page, **Then** they see a read-only historical record and no new canonical run is created. + +--- + +### User Story 3 - Use cached directory data in forms without blocking calls (Priority: P3) + +As a workspace member configuring tenant-related settings, I can search/select directory groups and role definitions using cached data. If cached data is missing or stale, I can trigger an asynchronous sync (“Sync now”) without the form making outbound calls. + +**Why this priority**: Prevents slow, flaky UI and rate-limits from inline lookups, while keeping the configuration flow usable. + +**Independent Test**: Render the configuration form and exercise search/label rendering while asserting outbound clients are not called. + +**Acceptance Scenarios**: + +1. **Given** cached directory groups exist, **When** the user searches for groups, **Then** results and labels come from cached data. +2. **Given** cached role definitions are missing, **When** the user opens the role definition selector, **Then** the UI indicates “data not available yet” and offers a non-destructive “Sync now” action. +3. **Given** the user triggers “Sync now”, **When** the sync starts, **Then** a canonical run is created immediately and the user can open its canonical “View run” link. + +### Edge Cases + +- A scheduler fires the same scheduled backup more than once for the same intended time. +- A user triggers the same sync while an identical sync is still active (dedupe/while-active semantics). +- A job fails before writing progress; the canonical run still exists and shows a clear failure state. +- A legacy history row exists but has no canonical mapping; it must remain viewable without creating new canonical runs. +- A non-member attempts to access a canonical operations link; response must be deny-as-not-found (404). +- A member lacks capability: start surfaces must reject (403) and the UI must reflect disabled affordances. +- Cached directory data is empty or stale; UI must not block on outbound calls and must provide a safe way to sync. + +## Requirements *(mandatory)* + +**Constitution alignment (required):** This feature includes long-running/queued/scheduled work. The spec MUST describe tenant isolation, run observability (type/identity/visibility), and tests. + +**Constitution alignment (RBAC-UX):** This feature changes authorization behavior and navigation paths. It MUST define 404 vs 403 semantics and ensure server-side enforcement for operation-start flows. + +**Constitution alignment (OPS-EX-AUTH-001):** Outbound HTTP without a canonical run is not allowed on Monitoring/Operations pages. + +**Constitution alignment (BADGE-001):** Any new/changed status presentation for runs MUST remain centralized and covered by tests. + +**Constitution alignment (Admin UI Action Surfaces):** This feature changes multiple admin UI surfaces and MUST satisfy the UI Action Surface Contract (see matrix below). + +### Functional Requirements + +- **FR-001 (Canonical tracking)**: The system MUST treat the canonical run record as the single source of truth for execution tracking (status, progress, results, errors) for the in-scope operations. +- **FR-002 (Dispatch-time creation)**: Every start surface (UI action, console command, scheduler, internal service) MUST create the canonical run record before dispatching any asynchronous work. +- **FR-003 (No job fallback-create)**: Background workers MUST NOT create canonical run records as a fallback; missing run identifiers are treated as a fatal contract violation. +- **FR-004 (Canonical deep-link)**: The system MUST support exactly one canonical deep-link format for viewing runs which is tenantless and stable. + +- **FR-005 (Membership + capability rules)**: Access to operation runs MUST follow these rules: + - Non-members of the workspace scope MUST receive deny-as-not-found (404). + - Workspace members who lack the required capability for the operation type MUST receive 403. +- **FR-005a (View capability mapping)**: “View run” MUST require the same capability as “Start” for the corresponding operation type. +- **FR-006 (DB-only rendering)**: Operations/monitoring and run viewer pages MUST render solely from persisted data and MUST NOT perform outbound calls during rendering/search/label resolution. + +- **FR-007 (Legacy history read-only)**: Legacy run history records MUST remain viewable as historical data, but MUST be strictly read-only (no start/retry/execute actions). +- **FR-008 (Legacy redirects)**: If a legacy history record includes a canonical mapping, the legacy “view” page MUST redirect deterministically to the canonical viewer; otherwise it MUST display legacy-only history. +- **FR-009 (No new legacy rows)**: For the in-scope operations, the system MUST stop writing new legacy run history rows. Existing legacy history remains unchanged. + +- **FR-010 (Scheduled backup classification)**: Scheduled backup executions MUST be represented with a distinct operation type (not conflated with manual runs). +- **FR-011 (Run identity & dedupe)**: The system MUST compute deterministic run identities for dedupe and scheduler double-fire protection, and MUST define whether each type dedupes “while active” or is strictly unique. +- **FR-011b (Scheduled backups are strict)**: Scheduled backup executions MUST use strict dedupe per schedule and intended fire-time (at most one canonical run ever per schedule per intended fire-time). +- **FR-011a (Backup manual runs are unique)**: Manual backup schedule runs (“run now”) and retries MUST be unique per user action (no while-active dedupe). +- **FR-012 (Inputs & provenance)**: The system MUST store operation inputs and provenance (target tenant/schedule, trigger source, optional initiating user) on the canonical run record. + +- **FR-013 (Structured results)**: The system MUST store a standard, structured summary of results (counts) and failures (structured error entries) on the canonical run record. +- **FR-014 (Restore domain vs execution)**: Restore workflow domain records may remain as domain entities, but execution tracking and “View run” affordances MUST use the canonical run record exclusively. + +- **FR-015 (Cached directory data)**: The system MUST provide cached directory group data and cached role definition data to support search and label rendering in configuration forms without outbound calls. +- **FR-015a (Role definitions sync type)**: The role definitions cache sync MUST use a dedicated operation type (e.g., `directory_role_definitions.sync`) to keep identities, results, and auditability distinct from other sync operations. +- **FR-016 (Safe “Sync now”)**: When cached directory data is missing, the UI MUST provide a non-destructive “Sync now” action that starts an asynchronous sync and immediately exposes the canonical run link. + +#### Assumptions + +- A canonical run model/viewer already exists and is suitable for monitoring long-running operations. +- Outbound calls to external services are permitted only in asynchronous execution paths and are observable via the canonical run record. + +#### Out of Scope + +- Backfilling legacy history into canonical runs. +- Dropping/removing legacy run history tables. +- Introducing new cross-workspace analytics. + +## UI Action Matrix *(mandatory when admin UI is changed)* + +| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions | +|---|---|---|---|---|---|---|---|---|---|---| +| Operations viewer | Canonical run viewer route | None | Open by canonical link | None | None | None | None | N/A | Yes (canonical run record metadata) | Must be DB-only rendering; non-member is 404 | +| Inventory sync start | Inventory admin UI | Start sync | View run link appears after start | View run | None | None | N/A | N/A | Yes | Capability-gated; creates canonical run before dispatch | +| Directory groups sync start | Directory groups admin UI & console | Sync now | View run link appears after start | View run | None | Sync now (when cache empty) | N/A | N/A | Yes | Single dispatcher entry; legacy start actions removed | +| Backup schedule runs list | Backup schedule detail | None | List links open canonical viewer | View run | None | None | N/A | N/A | Yes | Includes scheduled/manual/retry runs; scheduled has distinct type | +| Tenant configuration selectors | Tenant settings forms | Sync now (when cache empty) | Search from cached data | None | None | Sync now | N/A | Save/Cancel | Yes | No outbound calls in search/label resolution | +| Legacy run history pages | Archive/history areas | None | View (read-only) | View only | None | None | None | N/A | Yes (historical) | No Start/Retry; redirect only if canonical mapping exists | + +### Key Entities *(include if feature involves data)* + +- **Canonical Run**: A single, shareable execution record containing type, identity, provenance, status, progress, results, and errors. +- **Legacy Run History Record**: A historical record for prior run-tracking paths; viewable but not mutable. +- **Managed Tenant**: The tenant context targeted by operations. +- **Backup Schedule**: A schedule configuration that can trigger executions automatically. +- **Restore Run (Domain Record)**: The domain workflow record for restore; links to canonical execution runs. +- **Directory Group Cache**: Cached group metadata used for searching/label rendering in forms. +- **Role Definition Cache**: Cached role definition metadata used for searching/label rendering in forms. + +## Success Criteria *(mandatory)* + +### Measurable Outcomes + +- **SC-001**: 100% of newly started in-scope operations create a canonical run record before any asynchronous work is dispatched. +- **SC-002**: Over a 30-day staging observation window, 0 new legacy run history rows are created for in-scope operations. +- **SC-003**: Operations viewer and monitoring pages perform 0 outbound calls during rendering/search/label resolution (verified by automated tests). +- **SC-004**: For scheduled backups, duplicate scheduler fires for the same schedule and intended fire-time result in at most 1 canonical run. +- **SC-005**: Users can open a canonical “View run” link and see status/progress within 2 seconds in typical conditions. diff --git a/specs/086-retire-legacy-runs-into-operation-runs/tasks.md b/specs/086-retire-legacy-runs-into-operation-runs/tasks.md new file mode 100644 index 0000000..0a86968 --- /dev/null +++ b/specs/086-retire-legacy-runs-into-operation-runs/tasks.md @@ -0,0 +1,141 @@ +--- + +description: "Task list for Spec 086 implementation" +--- + +# Tasks: Retire Legacy Runs Into Operation Runs (086) + +**Input**: Design documents from `specs/086-retire-legacy-runs-into-operation-runs/` +**Prerequisites**: plan.md (required), spec.md (required for user stories), research.md, data-model.md, contracts/, quickstart.md + +**Tests**: REQUIRED (Pest) — runtime behavior changes must be covered. + +## Phase 1: Setup (Shared Infrastructure) + +- [ ] T001 Confirm baseline green test subset via `tests/Feature/Operations/TenantlessOperationRunViewerTest.php`, `tests/Feature/Monitoring/OperationsDbOnlyTest.php`, and `tests/Feature/Monitoring/OperationsCanonicalUrlsTest.php` +- [ ] T002 Confirm Filament v5 + Livewire v4 constraints are respected for any touched pages/resources in `app/Filament/**` + +--- + +## Phase 2: Foundational (Blocking Prerequisites) + +**Purpose**: Shared primitives required by all stories. + +- [ ] T003 Add centralized “run type → required capability” resolver in `app/Support/Operations/OperationRunCapabilityResolver.php` +- [ ] T004 Update `app/Policies/OperationRunPolicy.php` to enforce clarified 404/403 semantics (non-member 404; member missing capability 403) using T003 +- [ ] T005 [P] Add/extend operation type registry for new types in `app/Support/OperationRunType.php` +- [ ] T006 [P] Add/extend operation labels/catalog entries in `app/Support/OperationCatalog.php` +- [ ] T007 Add tests covering view authorization semantics in `tests/Feature/Operations/TenantlessOperationRunViewerTest.php` (404 vs 403 + capability-gated view) + +**Checkpoint**: Canonical viewer authorization matches spec; new run types exist in registries. + +--- + +## Phase 3: User Story 1 — Start an operation with an immediate canonical run link (Priority: P1) + +**Goal**: All start surfaces create an `operation_runs` record at dispatch time; no job fallback-create; “View run” link is stable. + +**Independent Test**: Start each in-scope operation and assert the `operation_runs` row exists before work begins, with correct type/identity/context and a stable tenantless view URL. + +### Tests (US1) + +- [ ] T008 [P] [US1] Add/extend tests for OperationRun dispatch-time creation in `tests/Feature/OperationRunServiceTest.php` +- [ ] T009 [P] [US1] Add/extend tests for start-surface authorization (403 prevents run creation) in `tests/Feature/RunStartAuthorizationTest.php` + +### Implementation (US1) + +- [ ] T010 [US1] Ensure inventory sync start surface creates OperationRun before dispatch and uses canonical link in `app/Filament/Resources/InventoryItemResource/Pages/ListInventoryItems.php` +- [ ] T011 [US1] Ensure directory groups sync start surface creates OperationRun before dispatch and uses canonical link in `app/Filament/Resources/EntraGroupResource/Pages/ListEntraGroups.php` +- [ ] T012 [US1] Ensure backup schedule manual run-now creates OperationRun before dispatch with unique-per-click identity (nonce) in `app/Filament/Resources/BackupScheduleResource.php` +- [ ] T013 [US1] Ensure backup schedule retry creates OperationRun before dispatch with unique-per-click identity (nonce) in `app/Filament/Resources/BackupScheduleResource.php` +- [ ] T014 [US1] Ensure scheduled backup dispatcher creates OperationRun before dispatch with strict identity by (schedule_id, scheduled_for) and type `backup_schedule.scheduled` in `app/Services/BackupScheduling/BackupScheduleDispatcher.php` +- [ ] T014a [US1] Enforce strict scheduled backup idempotency (at most one canonical run ever per schedule_id + intended fire-time), using an explicit DB constraint and/or lock strategy aligned with `OperationRunService` identities +- [ ] T015 [US1] Enforce “no job fallback-create” by validating required OperationRun context is present; fail fast if missing in `app/Jobs/RunInventorySyncJob.php`, `app/Jobs/EntraGroupSyncJob.php`, and `app/Jobs/RunBackupScheduleJob.php` + +### Restore (US1) + +- [ ] T015a [P] [US1] Add/extend tests that starting a restore execution creates an OperationRun at dispatch time (target existing restore execution tests under `tests/Feature/RestoreRunWizardExecuteTest.php` and/or `tests/Feature/ExecuteRestoreRunJobTest.php`) +- [ ] T015b [US1] Ensure the restore execution start surface creates OperationRun before dispatch and surfaces the stable canonical “View run” link (adjust the Filament restore execution action/page used in the wizard flow) +- [ ] T015c [US2] Ensure restore domain records link to canonical OperationRuns for observability (align with FR-014; no legacy fallback-create) + +**Checkpoint**: Starting operations always yields a stable `/admin/operations/{run}` link immediately. + +--- + +## Phase 4: User Story 2 — Monitor executions from a single canonical viewer (Priority: P2) + +**Goal**: Canonical viewer and Monitoring pages remain DB-only; legacy run history pages are read-only and redirect only when a deterministic mapping exists. + +**Independent Test**: Load canonical viewer and legacy view pages while asserting no outbound Graph calls occur during render/search/label callbacks. + +### Tests (US2) + +- [ ] T016 [P] [US2] Add tests asserting Monitoring pages render DB-only (no Graph calls) in `tests/Feature/Monitoring/MonitoringOperationsTest.php` +- [ ] T017 [P] [US2] Add tests for legacy-to-canonical redirect when mapping exists and no redirect when mapping absent in `tests/Feature/Operations/` (new file: `tests/Feature/Operations/LegacyRunRedirectTest.php`) + +### Implementation (US2) + +- [ ] T018 [US2] Add nullable `operation_run_id` mapping column + FK/index to legacy table `inventory_sync_runs` (new migration in `database/migrations/**_add_operation_run_id_to_inventory_sync_runs_table.php`) +- [ ] T019 [US2] Add nullable `operation_run_id` mapping column + FK/index to legacy table `entra_group_sync_runs` (new migration in `database/migrations/**_add_operation_run_id_to_entra_group_sync_runs_table.php`) +- [ ] T020 [US2] Add nullable `operation_run_id` mapping column + FK/index to legacy table `backup_schedule_runs` (new migration in `database/migrations/**_add_operation_run_id_to_backup_schedule_runs_table.php`) +- [ ] T021 [US2] Stop writing NEW legacy run rows for inventory sync and use `operation_runs` only for execution tracking (adjust service + callers in `app/Services/Inventory/InventorySyncService.php` and any start surfaces) +- [ ] T022 [US2] Stop writing NEW legacy run rows for directory group sync and use `operation_runs` only for execution tracking (adjust service + callers in `app/Services/Directory/EntraGroupSyncService.php` and any start surfaces) +- [ ] T023 [US2] Stop writing NEW legacy run rows for backup schedule executions and use `operation_runs` only for execution tracking; keep legacy table strictly read-only history for existing rows (adjust dispatcher and UI surfaces in `app/Services/BackupScheduling/BackupScheduleDispatcher.php` and `app/Filament/Resources/BackupScheduleResource.php`) +- [ ] T023a [US2] Update Backup Schedule UI to show new executions from `operation_runs` (query by type + context like schedule_id) and link to canonical viewer; legacy runs list remains history-only +- [ ] T024 [US2] Implement deterministic redirect on legacy “view” pages when `operation_run_id` exists in `app/Filament/Resources/InventorySyncRunResource/Pages/ViewInventorySyncRun.php` and `app/Filament/Resources/EntraGroupSyncRunResource/Pages/ViewEntraGroupSyncRun.php` +- [ ] T025 [US2] Ensure legacy run history pages remain strictly read-only (remove/disable start/retry actions) in `app/Filament/Resources/InventorySyncRunResource.php`, `app/Filament/Resources/EntraGroupSyncRunResource.php`, and `app/Filament/Resources/BackupScheduleResource/RelationManagers/BackupScheduleRunsRelationManager.php` + +**Checkpoint**: Canonical viewer is the only execution-tracker UI; legacy is view-only and redirects only when mapped. + +--- + +## Phase 5: User Story 3 — Use cached directory data in forms without blocking calls (Priority: P3) + +**Goal**: Tenant configuration selectors use cached directory groups + cached role definitions; “Sync now” triggers async sync with an immediate canonical run link; no outbound calls during render/search/label callbacks. + +**Independent Test**: Render Tenant configuration forms and exercise search/label callbacks while asserting Graph client is not called. + +### Tests (US3) + +- [ ] T026 [P] [US3] Add tests that TenantResource role definition selectors render/search DB-only (no Graph calls) in `tests/Feature/Filament/` (new file: `tests/Feature/Filament/TenantRoleDefinitionsSelectorDbOnlyTest.php`) +- [ ] T027 [P] [US3] Add tests that “Sync now” creates an OperationRun and returns a canonical view link in `tests/Feature/DirectoryGroups/` or `tests/Feature/TenantRBAC/` (choose closest existing folder) +- [ ] T027a [P] [US3] Add tests that directory group selectors render/search DB-only (no Graph calls) and use cached DB tables (new file under `tests/Feature/DirectoryGroups/` or `tests/Feature/Filament/`) + +### Implementation (US3) + +- [ ] T028 [US3] Create cached role definitions table + model + factory (new migration in `database/migrations/**_create_entra_role_definitions_table.php`, model in `app/Models/EntraRoleDefinition.php`, factory in `database/factories/EntraRoleDefinitionFactory.php`) +- [ ] T029 [US3] Add “role definitions sync” operation type `directory_role_definitions.sync` to `app/Support/OperationRunType.php` and label in `app/Support/OperationCatalog.php` (if not already completed in T005/T006) +- [ ] T030 [US3] Implement role definitions sync service + job that updates the cache and records progress/failures on the OperationRun (service in `app/Services/Directory/RoleDefinitionsSyncService.php`, job in `app/Jobs/SyncRoleDefinitionsJob.php`) +- [ ] T030a [US3] Register/verify Graph contract entries required for role definitions sync in `config/graph_contracts.php` and ensure the sync uses `GraphClientInterface` only (no ad-hoc endpoints) +- [ ] T031 [US3] Update `app/Filament/Resources/TenantResource.php` roleDefinitions search/label callbacks to query cached DB tables only (remove Graph calls from callbacks) +- [ ] T032 [US3] Add a non-destructive “Sync now” Filament action that dispatches `directory_role_definitions.sync` and provides a canonical “View run” link (in `app/Filament/Resources/TenantResource.php`) + +**Checkpoint**: Tenant configuration selectors are DB-only; cache sync is async and observable via canonical run. + +--- + +## Phase 6: Polish & Cross-Cutting Concerns + +- [ ] T033 Ensure new/modified destructive-like actions (if any) use `Action::make(...)->action(...)->requiresConfirmation()` and are authorized server-side (audit existing touched Filament actions under `app/Filament/**`) +- [ ] T034 Run Pint on changed files via `vendor/bin/sail bin pint --dirty` +- [ ] T035 Run targeted test subset per quickstart: `vendor/bin/sail artisan test --compact --filter=OperationRun` and the new/changed test files + +--- + +## Dependencies & Execution Order + +### Phase Dependencies + +- Setup (Phase 1) → Foundational (Phase 2) → US1 (Phase 3) → US2 (Phase 4) → US3 (Phase 5) → Polish (Phase 6) + +### User Story Dependencies + +- US1 is the MVP: it enables stable canonical run creation + links. +- US2 depends on Foundational + US1 (viewer/auth semantics), but can be implemented in parallel once viewer auth is stable. +- US3 depends on Foundational + cache primitives, but can proceed after Foundational even if US2 is in progress. + +### Parallel Execution Examples + +- US1 parallelizable: T008 + T009 (tests) can be written in parallel; start-surface patches T010–T014 can be split across different files. +- US2 parallelizable: migrations T018–T020 can be done in parallel; legacy resource updates T024–T025 can be split by resource. +- US3 parallelizable: schema/model/factory T028 can be done while tests T026–T027 are being drafted.