TenantAtlas/specs/086-retire-legacy-runs-into-operation-runs/spec.md

# Feature Specification: Retire Legacy Runs Into Operation Runs

**Feature Branch**: `086-retire-legacy-runs-into-operation-runs`
**Created**: 2026-02-09
**Status**: Draft
**Input**: User description: "Retire legacy run tracking into canonical operation runs, with DB-only rendering and dispatch-time run creation. Legacy run tables remain read-only history."

## Clarifications

### Session 2026-02-10

- Q: For manual backup schedule runs (`backup_schedule.run_now`) and retries (`backup_schedule.retry`), should the system dedupe while a run is active, or always create a new run per click? → A: Always create a new run per click (no dedupe).
- Q: Who may view the canonical run detail page (“View run”)? → A: Workspace members may view runs only if they also have the required capability for that operation type; non-members get 404, members without capability get 403.
- Q: Which capability should be required to view a run (“View run”)? → A: Use the same capability as starting that operation type.
- Q: For `backup_schedule.scheduled`, how should dedupe work? → A: Strict dedupe per schedule and intended fire-time (at most one run).
- Q: For the role definitions cache “Sync now” operation, should it use a new dedicated operation type or reuse an existing one? → A: Use a new dedicated operation type.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Start an operation with an immediate canonical run link (Priority: P1)

As a workspace member, I can start long-running operations (inventory sync, directory groups sync, scheduled backups, restore execution, directory role definitions sync) and immediately receive a stable “View run” link that I can open and share.

**Why this priority**: This removes the “run link appears later / changes” ambiguity, improves auditability, and prevents duplicate tracking paths.

**Independent Test**: Trigger each supported operation start surface and verify a canonical run record exists before work begins, and that the canonical viewer loads from persisted state.

**Acceptance Scenarios**:

1. **Given** a workspace member with the required capability, **When** they start an inventory sync, **Then** a canonical run exists immediately and the UI shows a stable “View run” link.
2. **Given** a scheduled backup fire event, **When** the scheduler dispatches work, **Then** a canonical run exists immediately and the same fire event cannot create duplicates.
3. **Given** a workspace member without the required capability, **When** they attempt to start the operation, **Then** the request is rejected with a capability error (403) and no run is created.

---

### User Story 2 - Monitor executions from a single canonical viewer (Priority: P2)

As a workspace member, I can open an operations viewer link for any run and see status, progress, results, and errors without the page triggering outbound calls.

Legacy “run history” pages remain available for older historical rows but cannot start or retry anything.

**Why this priority**: A single viewer reduces support load, enables consistent deep linking, and avoids UI latency and rate-limiting from outbound calls.

**Independent Test**: Load the canonical viewer and legacy history pages using outbound client fakes/mocks and assert no outbound calls occur during rendering/search.

**Acceptance Scenarios**:

1. **Given** a run exists, **When** a user opens its canonical operations link, **Then** the page renders only from persisted state and performs no outbound calls.
2. **Given** a legacy run history record that has a known canonical mapping, **When** a user opens the legacy “view” page, **Then** they are redirected to the canonical operations viewer.
3. **Given** a legacy run history record without a canonical mapping, **When** a user opens the legacy “view” page, **Then** they see a read-only historical record and no new canonical run is created.

---

### User Story 3 - Use cached directory data in forms without blocking calls (Priority: P3)

As a workspace member configuring tenant-related settings, I can search/select directory groups and role definitions using cached data. If cached data is missing or stale, I can trigger an asynchronous sync (“Sync now”) without the form making outbound calls.

**Why this priority**: Prevents slow, flaky UI and rate-limits from inline lookups, while keeping the configuration flow usable.

**Independent Test**: Render the configuration form and exercise search/label rendering while asserting outbound clients are not called.

**Acceptance Scenarios**:

1. **Given** cached directory groups exist, **When** the user searches for groups, **Then** results and labels come from cached data.
2. **Given** cached role definitions are missing, **When** the user opens the role definition selector, **Then** the UI indicates “data not available yet” and offers a non-destructive “Sync now” action.
3. **Given** the user triggers “Sync now”, **When** the sync starts, **Then** a canonical run is created immediately and the user can open its canonical “View run” link.

### Edge Cases

- A scheduler fires the same scheduled backup more than once for the same intended time.
- A user triggers the same sync while an identical sync is still active (dedupe/while-active semantics).
- A job fails before writing progress; the canonical run still exists and shows a clear failure state.
- A legacy history row exists but has no canonical mapping; it must remain viewable without creating new canonical runs.
- A non-member attempts to access a canonical operations link; response must be deny-as-not-found (404).
- A member lacks capability: start surfaces must reject (403) and the UI must reflect disabled affordances.
- Cached directory data is empty or stale; UI must not block on outbound calls and must provide a safe way to sync.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature includes long-running/queued/scheduled work. The spec MUST describe tenant isolation, run observability (type/identity/visibility), and tests.

**Constitution alignment (RBAC-UX):** This feature changes authorization behavior and navigation paths. It MUST define 404 vs 403 semantics and ensure server-side enforcement for operation-start flows.

**Constitution alignment (OPS-EX-AUTH-001):** Outbound HTTP without a canonical run is not allowed on Monitoring/Operations pages.

**Constitution alignment (BADGE-001):** Any new/changed status presentation for runs MUST remain centralized and covered by tests.

**Constitution alignment (Admin UI Action Surfaces):** This feature changes multiple admin UI surfaces and MUST satisfy the UI Action Surface Contract (see matrix below).

### Functional Requirements

- **FR-001 (Canonical tracking)**: The system MUST treat the canonical run record as the single source of truth for execution tracking (status, progress, results, errors) for the in-scope operations.
- **FR-002 (Dispatch-time creation)**: Every start surface (UI action, console command, scheduler, internal service) MUST create the canonical run record before dispatching any asynchronous work.
- **FR-003 (No job fallback-create)**: Background workers MUST NOT create canonical run records as a fallback; missing run identifiers are treated as a fatal contract violation.
- **FR-004 (Canonical deep-link)**: The system MUST support exactly one canonical deep-link format for viewing runs which is tenantless and stable.

- **FR-005 (Membership + capability rules)**: Access to operation runs MUST follow these rules:
	- Non-members of the workspace scope MUST receive deny-as-not-found (404).
	- Workspace members who lack the required capability for the operation type MUST receive 403.
- **FR-005a (View capability mapping)**: “View run” MUST require the same capability as “Start” for the corresponding operation type.
- **FR-006 (DB-only rendering)**: Operations/monitoring and run viewer pages MUST render solely from persisted data and MUST NOT perform outbound calls during rendering/search/label resolution.

- **FR-007 (Legacy history read-only)**: Legacy run history records MUST remain viewable as historical data, but MUST be strictly read-only (no start/retry/execute actions).
- **FR-008 (Legacy redirects)**: If a legacy history record includes a canonical mapping, the legacy “view” page MUST redirect deterministically to the canonical viewer; otherwise it MUST display legacy-only history.
- **FR-009 (No new legacy rows)**: For the in-scope operations, the system MUST stop writing new legacy run history rows. Existing legacy history remains unchanged.

- **FR-010 (Scheduled backup classification)**: Scheduled backup executions MUST be represented with a distinct operation type (not conflated with manual runs).
- **FR-011 (Run identity & dedupe)**: The system MUST compute deterministic run identities for dedupe and scheduler double-fire protection, and MUST define whether each type dedupes “while active” or is strictly unique.
- **FR-011b (Scheduled backups are strict)**: Scheduled backup executions MUST use strict dedupe per schedule and intended fire-time (at most one canonical run ever per schedule per intended fire-time).
- **FR-011a (Backup manual runs are unique)**: Manual backup schedule runs (“run now”) and retries MUST be unique per user action (no while-active dedupe).
- **FR-012 (Inputs & provenance)**: The system MUST store operation inputs and provenance (target tenant/schedule, trigger source, optional initiating user) on the canonical run record.

- **FR-013 (Structured results)**: The system MUST store a standard, structured summary of results (counts) and failures (structured error entries) on the canonical run record.
- **FR-014 (Restore domain vs execution)**: Restore workflow domain records may remain as domain entities, but execution tracking and “View run” affordances MUST use the canonical run record exclusively.

- **FR-015 (Cached directory data)**: The system MUST provide cached directory group data and cached role definition data to support search and label rendering in configuration forms without outbound calls.
- **FR-015a (Role definitions sync type)**: The role definitions cache sync MUST use a dedicated operation type (e.g., `directory_role_definitions.sync`) to keep identities, results, and auditability distinct from other sync operations.
- **FR-016 (Safe “Sync now”)**: When cached directory data is missing, the UI MUST provide a non-destructive “Sync now” action that starts an asynchronous sync and immediately exposes the canonical run link.

#### Assumptions

- A canonical run model/viewer already exists and is suitable for monitoring long-running operations.
- Outbound calls to external services are permitted only in asynchronous execution paths and are observable via the canonical run record.

#### Out of Scope

- Backfilling legacy history into canonical runs.
- Dropping/removing legacy run history tables.
- Introducing new cross-workspace analytics.

## UI Action Matrix *(mandatory when admin UI is changed)*

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Operations viewer | Canonical run viewer route | None | Open by canonical link | None | None | None | None | N/A | Yes (canonical run record metadata) | Must be DB-only rendering; non-member is 404 |
| Inventory sync start | Inventory admin UI | Start sync | View run link appears after start | View run | None | None | N/A | N/A | Yes | Capability-gated; creates canonical run before dispatch |
| Directory groups sync start | Directory groups admin UI & console | Sync now | View run link appears after start | View run | None | Sync now (when cache empty) | N/A | N/A | Yes | Single dispatcher entry; legacy start actions removed |
| Backup schedule runs list | Backup schedule detail | None | List links open canonical viewer | View run | None | None | N/A | N/A | Yes | Includes scheduled/manual/retry runs; scheduled has distinct type |
| Tenant configuration selectors | Tenant settings forms | Sync now (when cache empty) | Search from cached data | None | None | Sync now | N/A | Save/Cancel | Yes | No outbound calls in search/label resolution |
| Legacy run history pages | Archive/history areas | None | View (read-only) | View only | None | None | None | N/A | Yes (historical) | No Start/Retry; redirect only if canonical mapping exists |

### Key Entities *(include if feature involves data)*

- **Canonical Run**: A single, shareable execution record containing type, identity, provenance, status, progress, results, and errors.
- **Legacy Run History Record**: A historical record for prior run-tracking paths; viewable but not mutable.
- **Managed Tenant**: The tenant context targeted by operations.
- **Backup Schedule**: A schedule configuration that can trigger executions automatically.
- **Restore Run (Domain Record)**: The domain workflow record for restore; links to canonical execution runs.
- **Directory Group Cache**: Cached group metadata used for searching/label rendering in forms.
- **Role Definition Cache**: Cached role definition metadata used for searching/label rendering in forms.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: 100% of newly started in-scope operations create a canonical run record before any asynchronous work is dispatched.
- **SC-002**: Over a 30-day staging observation window, 0 new legacy run history rows are created for in-scope operations.
- **SC-003**: Operations viewer and monitoring pages perform 0 outbound calls during rendering/search/label resolution (verified by automated tests).
- **SC-004**: For scheduled backups, duplicate scheduler fires for the same schedule and intended fire-time result in at most 1 canonical run.
- **SC-005**: Users can open a canonical “View run” link and see status/progress within 2 seconds in typical conditions.