TenantAtlas/specs/149-queued-execution-reauthorization/spec.md

# Feature Specification: Queued Execution Reauthorization and Scope Continuity

**Feature Branch**: `149-queued-execution-reauthorization`
**Created**: 2026-03-17
**Status**: Draft
**Input**: User description: "Queued Execution Reauthorization and Scope Continuity"

## Spec Scope Fields *(mandatory)*

- **Scope**: workspace
- **Primary Routes**:
  - `/admin/t/{tenant}/...`
  - `/admin/operations`
  - `/admin/operations/{run}`
  - Tenant-context and admin-plane surfaces that start queued tenant-affecting operations
  - Scheduled and system-triggered operation entry paths that enqueue tenant-affecting work
- **Data Ownership**:
  - `OperationRun` remains the canonical workspace-owned observability record.
  - Queued jobs may act on tenant-owned records, but execution legitimacy must be re-evaluated against current workspace and tenant scope at run time.
  - This feature does not change ownership boundaries and does not introduce a second queue identity model.
- **Implementation Slice**:
  - For this first implementation slice, in-scope queued operation families are provider-backed queued runs, restore or write jobs, inventory or sync jobs, bulk orchestrator or worker families, and scheduled backup runs explicitly adopted by this spec.
  - Other queued families are out of scope for this slice until they are explicitly adopted into the same canonical execution legitimacy contract.
- **RBAC**:
  - Authorization planes involved: admin `/admin` routes, tenant-context admin surfaces, and queued execution that resumes later from those starts.
  - Workspace or tenant non-members remain deny-as-not-found.
  - In-scope members who lack the required capability at execution time remain forbidden in authorization semantics and must be represented to operators as a blocked execution outcome with a clear denial reason rather than a successful run.
  - Canonical Operations surfaces keep their current filter semantics when tenant context is active; this feature changes blocked-outcome meaning and access continuity, not tenant-context prefilter behavior on `/admin/operations` or `/admin/operations/{run}`.
  - Platform `/system` access is not broadened by this feature.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Stop Invalid Queued Mutations Before They Start (Priority: P1)

As a workspace operator, I need a queued operation to prove it is still legitimate when it actually begins, so that a previously allowed click cannot mutate tenant state after access, tenant operability, or scope conditions have changed.

**Why this priority**: This is the core trust boundary. If queued work can outlive the legitimacy of the actor or tenant scope that started it, every downstream governance feature inherits a backend integrity problem.

**Independent Test**: Can be fully tested by enqueueing a tenant-affecting operation, changing authorization or tenant operability before execution starts, and confirming the job is refused before any remote or local mutation work begins.

**Acceptance Scenarios**:

1. **Given** a user queues a tenant-affecting operation and then loses the required capability before the worker starts, **When** execution begins, **Then** the operation is refused before any mutation work occurs.
2. **Given** a user queues a tenant-affecting operation and the tenant becomes non-operable before execution begins, **When** execution begins, **Then** the operation is refused before any mutation work occurs.
3. **Given** a queued operation starts and the actor remains entitled and the tenant remains operable, **When** execution begins, **Then** the operation proceeds normally.

---

### User Story 2 - Understand Why A Queued Operation Was Refused (Priority: P1)

As a workspace operator, I need refused queued work to appear as an explicit blocked outcome with clear audit and operations visibility, so that I can distinguish trust-policy enforcement from ordinary runtime failure.

**Why this priority**: Operators and auditors need to know whether the system protected the tenant intentionally or simply crashed. Without that distinction, monitoring data becomes misleading.

**Independent Test**: Can be fully tested by forcing an execution-time block and verifying that the resulting operation history, audit trail, and operator-facing outcome clearly identify a blocked execution rather than a generic failure.

**Acceptance Scenarios**:

1. **Given** a queued operation is refused at execution time, **When** the operator opens operation history, **Then** the run shows a terminal blocked outcome with a structured denial reason that distinguishes it from ordinary processing failure.
2. **Given** a user-triggered queued operation is refused at execution time, **When** the run becomes terminal, **Then** the initiator receives the normal terminal operation feedback path for the blocked outcome.
3. **Given** a scheduled or system-triggered queued operation is refused at execution time, **When** the run becomes terminal, **Then** the outcome is visible in Monitoring and audit surfaces without creating an initiator-only terminal database notification.

---

### User Story 3 - Enforce One Trust Contract Across Queued Job Families (Priority: P2)

As a product owner, I need queued execution legitimacy to follow one canonical contract across high-risk job families, so that future features do not keep re-implementing local authorization checks with inconsistent results.

**Why this priority**: A local patch in one job family does not solve the architectural gap. The strategic value comes from a reusable execution contract.

**Independent Test**: Can be fully tested by applying the same blocked-path and allowed-path checks to multiple in-scope queued operation families and confirming they produce the same legitimacy and observability behavior.

**Acceptance Scenarios**:

1. **Given** two different queued tenant-affecting operation types are started under the same actor and tenant conditions, **When** execution-time legitimacy changes before both run, **Then** both jobs follow the same blocked-outcome and denial-reason semantics.
2. **Given** an in-scope queued operation is retried after a previous blocked execution, **When** the retry starts, **Then** execution-time legitimacy is checked again instead of inheriting the previous attempt's result.

### Edge Cases

- The actor remains a workspace member but loses tenant membership before execution begins.
- The actor remains a tenant member but loses the specific capability required for the queued operation before execution begins.
- The tenant is archived, discarded, or otherwise becomes non-operable after dispatch but before execution.
- The queued run is retried after an earlier blocked attempt and current legitimacy has changed again.
- The queued run was created by a scheduler or system actor with no interactive initiator.
- The target record still exists, but the provider connection or other execution prerequisite is no longer valid at run time.
- A long queue delay means the selected tenant context in the browser is irrelevant or stale by the time execution starts.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature hardens existing queued and long-running tenant-affecting work. It introduces no new Microsoft Graph domain, but every in-scope execution path that can mutate state or call remote mutation endpoints must revalidate legitimacy before work begins. Existing contract-registry discipline remains unchanged: no direct Graph bypasses are allowed. Safety gates become two-phase: dispatch-time acceptance plus execution-time legitimacy recheck. Tenant isolation remains mandatory at execution time, not only at dispatch. In-scope operations continue to use `OperationRun` as the canonical run record, and blocked execution paths must remain auditable and test-covered.

**Constitution alignment (OPS-UX):** This feature reuses existing `OperationRun` types and must comply with the Ops-UX 3-surface feedback contract. Start surfaces remain intent-only and may only show queued feedback. Progress remains visible only in the active-ops widget and run-detail surfaces. Execution-time blocked outcomes are terminal outcomes and must use the canonical terminal notification path for user-initiated runs. `OperationRun.status` and `OperationRun.outcome` transitions remain service-owned through `OperationRunService`. Any summary counts written for blocked runs must continue to use `OperationSummaryKeys::all()` with flat numeric-only values. Scheduled and system-triggered runs continue to have no initiator terminal database notification.

**Constitution alignment (RBAC-UX):** This feature changes authorization behavior in the admin `/admin` plane and tenant-context admin surfaces by extending authorization continuity from dispatch time to execution time. Cross-plane access remains deny-as-not-found. 404 semantics still mean the actor is not entitled to the workspace or tenant scope. 403 semantics still mean the actor is in scope but lacks the required capability. Execution-time enforcement must be server-side and must rely on the canonical capability registry, current workspace and tenant entitlement, and current tenant operability rather than remembered UI context or dispatch-time assumptions. Global search behavior is unchanged. Existing destructive start actions remain confirmation-protected where already required.

**Constitution alignment (OPS-EX-AUTH-001):** Not applicable. This feature concerns queued tenant operations and Monitoring, not `/auth/*` handshake exceptions.

**Constitution alignment (BADGE-001):** This feature refines blocked execution presentation on operations surfaces so blocked remains the canonical terminal outcome while structured denial reasons explain why execution was refused. Any resulting status or outcome presentation must remain centralized through the existing operation outcome or badge semantics rather than page-local mappings.

**Constitution alignment (UI-NAMING-001):** The target object is the queued operation run. Operator-facing verbs remain `Start`, `View run`, and existing operation-specific verbs such as `Sync`, `Verify`, or `Restore`. New operator-facing outcome and audit language must use consistent domain wording such as `execution blocked`, `authorization changed`, `tenant no longer operable`, or `execution prerequisites no longer valid`. Internal implementation phrases must not become the primary operator vocabulary.

**Constitution alignment (Filament Action Surfaces):** This feature modifies the trust semantics behind existing Filament start actions and Monitoring detail surfaces. The Action Surface Contract remains satisfied because the visible surface change is limited to start behavior and run-outcome semantics; no new destructive action family is introduced.

**Constitution alignment (UX-001 — Layout & Information Architecture):** This feature does not introduce new Create, Edit, or View layouts. Existing start surfaces and operation-detail surfaces keep their current structure. UX impact is limited to clearer blocked-execution outcomes and trust-safe operator messaging.

### Functional Requirements

- **FR-149-001**: The system MUST require execution-time legitimacy revalidation for every in-scope queued operation before the first local mutation, remote mutation, or irreversible external side effect occurs.
- **FR-149-002**: Dispatch-time authorization alone MUST NOT be sufficient to permit later queued execution.
- **FR-149-003**: Execution-time legitimacy revalidation MUST evaluate current workspace entitlement, current tenant entitlement when the run is tenant-bound, current capability requirements for the specific operation type, current tenant operability, and current execution prerequisites required for that operation family.
- **FR-149-004**: If execution-time legitimacy fails, the system MUST refuse the queued operation before any in-scope mutation work begins.
- **FR-149-005**: If execution-time legitimacy fails, the system MUST produce an explicit terminal blocked operation outcome that distinguishes policy enforcement from ordinary runtime failure.
- **FR-149-006**: If execution-time legitimacy fails for a user-initiated run, the system MUST deliver the canonical terminal operation feedback to the initiating user and MUST NOT emit ad-hoc blocked-execution notifications outside the existing operation feedback contract.
- **FR-149-007**: If execution-time legitimacy fails for a scheduled or system-triggered run, the system MUST record the outcome in Monitoring and audit surfaces without creating an initiator-only terminal database notification.
- **FR-149-008**: The system MUST record a structured denial reason that distinguishes at least capability loss, membership or scope loss, tenant non-operability, and execution-prerequisite failure.
- **FR-149-009**: Execution-time legitimacy evaluation MUST use canonical authorization and tenant-context authorities and MUST NOT rely on selected browser tenant context, remembered UI state, or copied dispatch-time role strings.
- **FR-149-010**: In-scope queued job retries MUST perform a fresh execution-time legitimacy recheck for each attempt.
- **FR-149-011**: In-scope queued operations that remain legitimate at execution time MUST continue through the existing operation flow without added operator friction beyond the execution-time recheck.
- **FR-149-012**: This feature MUST define one reusable execution legitimacy contract for in-scope queued operation families instead of requiring job-specific ad-hoc authorization patches.
- **FR-149-013**: The reusable execution legitimacy contract MUST support both actor-initiated runs and scheduled or system-initiated runs without collapsing them into the same identity semantics.
- **FR-149-014**: For actor-initiated runs, execution-time legitimacy MUST remain bound to the current actor's live authorization and scope relationship, not merely to the fact that the actor originally clicked the action.
- **FR-149-015**: For scheduled or system-initiated runs, execution-time legitimacy MUST remain bound to the allowed system execution policy and current tenant operability, even when no interactive actor exists.
- **FR-149-016**: Operation history for blocked execution MUST remain viewable through the canonical operations surfaces and must provide enough explanation for operators to understand what changed.
- **FR-149-017**: Audit logging for blocked execution MUST capture the operation type, affected workspace and tenant context when present, denial class, and acting identity category without revealing secrets.
- **FR-149-018**: This feature MUST preserve existing deny-as-not-found versus forbidden semantics for direct resource access while representing execution-time refusal as a run outcome rather than a silent disappearance of the run.
- **FR-149-019**: In-scope implementations MUST NOT allow execution-time authorization continuity gaps to be resolved only in UI code, Filament visibility logic, or dispatch-time controller or Livewire action logic.
- **FR-149-020**: Regression coverage for this feature MUST include at least one allowed execution path, one lost-capability path, one lost-membership or wrong-scope path, one tenant-non-operable path, one scheduled-or-system path, and one retry path.
- **FR-149-021**: Execution-authority and denial metadata for this feature MUST be stored within existing `OperationRun` context and failure payload structures, and the first implementation slice MUST NOT require a schema migration to represent blocked execution decisions.
- **FR-149-022**: The allowed system execution policy for scheduled or initiator-null runs MUST be resolved from one canonical operation-type allowlist owned by the execution legitimacy gate and populated only by trusted scheduler or system entry paths, not by job-local ad-hoc checks.
- **FR-149-023**: Retryability MUST be determined centrally by the execution legitimacy contract using a stable initial rule set: `scope_denied`, `capability_denied`, and `initiator_invalid` are terminal (`retryable=false`), while `tenant_not_operable` and `prerequisite_invalid` are retryable (`retryable=true`) and must be re-evaluated fresh on each attempt.

## UI Action Matrix *(mandatory when Filament is changed)*

If this feature adds or modifies any Filament Resource / RelationManager / Page, fill out the matrix below.

For each surface, list the exact action labels, whether they are destructive (confirmation? typed confirmation?),
RBAC gating (capability + enforcement helper), and whether the mutation writes an audit log.

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Tenant and admin start surfaces | Existing admin and tenant-context pages that enqueue tenant-affecting work | Existing surface-specific actions unchanged | Existing links or row inspection unchanged | Existing start verbs such as `Sync`, `Verify`, `Restore`, or equivalent in-scope start actions | Existing grouped bulk actions unchanged in this spec | Existing empty-state CTAs unchanged | Not applicable | Not applicable | Yes for blocked execution and existing sensitive mutations | Start actions remain dispatch-time intent surfaces only. This spec adds execution-time legitimacy semantics behind them rather than a new visible action family. |
| Operations index | `/admin/operations` | Existing scope and filter controls unchanged | Clickable row or primary linked run identifier leading to `View run` detail | None | Existing grouped bulk actions unchanged | Existing empty state unchanged | Not applicable | Not applicable | Existing audit model unchanged | Existing inspect behavior is treated as the canonical affordance for this modified surface; no lone `View run` row action is introduced by this spec. Blocked execution outcomes must be legible and distinct from generic failure. |
| Operation run detail | `/admin/operations/{run}` | Existing navigation actions unchanged | Canonical run detail page | None | None | Not applicable | Existing `View` and related follow-up actions unchanged | Not applicable | Existing audit model unchanged | This spec changes outcome semantics and explanation, not page category or layout. |

### Key Entities *(include if feature involves data)*

- **Queued Operation Request**: A user-initiated or system-initiated instruction that has been accepted for later execution but is not yet allowed to mutate anything purely because it was queued.
- **Execution Legitimacy Decision**: The authoritative run-time answer to whether a queued operation may still begin, based on current scope, capability, operability, and prerequisites.
- **Operation Run**: The canonical workspace-owned observability record that tracks queued intent, execution progress, blocked execution, success, or failure.
- **Execution Denial Audit Event**: The audit-trail representation of a queued operation that was intentionally refused before work began.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-149-001**: In focused regression coverage, 100% of in-scope queued operations that lose legitimacy before execution are refused before any mutation work begins.
- **SC-149-002**: In focused regression coverage, 0 blocked execution cases are reported as generic runtime failures when the real outcome is policy enforcement.
- **SC-149-003**: In focused regression coverage, 100% of still-legitimate queued operations proceed successfully through the existing execution path without false blocking.
- **SC-149-004**: In focused regression coverage, 100% of covered retry attempts perform a fresh execution-time legitimacy check.
- **SC-149-005**: In focused regression coverage, 100% of covered scheduled or system-initiated blocked cases remain visible in Monitoring without producing an initiator-only terminal database notification.
- **SC-149-006**: In focused review of in-scope queued job families, every covered family uses the same canonical execution legitimacy contract rather than a new local authorization pattern.
- **SC-149-007**: In focused RBAC regression coverage, 100% of direct access attempts to canonical operations surfaces preserve deny-as-not-found for non-entitled actors and forbidden for in-scope capability denial.

## Assumptions

- Spec 144 and Spec 148 have already established the broader workspace-trust and tenant-operability direction that this feature extends into queued execution.
- Existing queued operation families already create or reuse canonical `OperationRun` records and do not need a second observability model.
- Existing capability registry and tenant-operability authorities are available and should be reused rather than replaced.
- This feature prioritizes tenant-affecting queued work and does not require every read-only background activity in the product to adopt the same contract immediately.

## Dependencies

- Existing operations semantics and `OperationRun` lifecycle rules
- Audit log foundation
- Canonical tenant context and operability hardening work
- In-scope queued job families that can mutate tenant-affecting state or call remote mutation endpoints

## Risks

- Applying the contract only to one or two jobs would leave the architecture vulnerable while creating a false sense of closure.
- Overloading denial reasons with low-value technical detail could make operator messaging noisier instead of clearer.
- Treating tenant operability as optional at execution time would preserve a key class of stale-legitimacy bugs.
- Treating scheduled runs as if they were actor-initiated runs would blur audit meaning and notification behavior.

## Final Direction

Queued work must be legitimate twice: once when the system accepts the intent, and again when the worker is actually about to act. This feature makes execution-time legitimacy a first-class contract for tenant-affecting queued operations so the platform can safely delay work without delaying trust checks.