TenantAtlas/specs/149-queued-execution-reauthorization/plan.md

# Implementation Plan: Queued Execution Reauthorization and Scope Continuity

**Branch**: `149-queued-execution-reauthorization` | **Date**: 2026-03-17 | **Spec**: [specs/149-queued-execution-reauthorization/spec.md](./spec.md)
**Input**: Feature specification from `/specs/149-queued-execution-reauthorization/spec.md`

**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/scripts/` for helper scripts.

## Summary

Introduce one canonical execution-legitimacy contract for queued tenant-affecting operations so work is re-authorized when the worker is actually about to act, not only when the operator clicked Start. The implementation will reuse the existing `OperationRun` observability model, `OperationRunService` blocked outcome flow, `TenantOperabilityService`, and write-hardening seams, but add a shared run-time execution gate and middleware ordering that can fail closed before any side effects.

This is a support-layer and job-orchestration hardening feature, not a new UI surface or persistence redesign. The plan therefore focuses on extending the queue execution path already present in `app/Jobs`, `app/Jobs/Middleware`, `app/Services/OperationRunService.php`, `app/Services/Providers`, `app/Services/Tenants`, and `app/Services/Hardening`, then migrating representative high-risk job families first: provider-backed queued runs, restore or write jobs, inventory or sync jobs, and bulk orchestrator families.

## Technical Context

**Language/Version**: PHP 8.4.15
**Primary Dependencies**: Laravel 12, Filament 5, Livewire 4, Pest 4, existing `OperationRunService`, `TrackOperationRun`, `ProviderOperationStartGate`, `TenantOperabilityService`, `CapabilityResolver`, and `WriteGateInterface` seams
**Storage**: PostgreSQL-backed application data plus queue-serialized `OperationRun` context; no schema migration planned for the first implementation slice
**Testing**: Pest 4 unit and feature coverage run through Laravel Sail
**Target Platform**: Laravel Sail web application with queue workers processing Filament-started and scheduled tenant-affecting operations
**Project Type**: Laravel monolith web application
**Performance Goals**: Execution legitimacy checks must complete synchronously before side effects, add no render-time remote calls, and keep per-job startup overhead limited to current authoritative DB lookups plus existing support-layer evaluation
**Constraints**: Preserve existing Ops-UX run lifecycle ownership, terminal notification rules, centralized badge semantics, Filament v5 plus Livewire v4 compliance, provider registration in `bootstrap/providers.php`, and current route contracts; no new Graph bypasses, no asset or panel changes, and no weakening of 404 versus 403 semantics
**Scale/Scope**: One shared execution-legitimacy contract, one queue-middleware or execution-gate integration path, representative adoption across provider, restore, inventory or sync, and bulk job families, plus focused regression coverage under `tests/Feature` and `tests/Unit`

## Constitution Check

*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*

**Pre-Phase 0 Gate: PASS**

- Inventory-first: PASS. This feature does not change inventory or snapshot ownership and only hardens when queued jobs may act.
- Read/write separation: PASS WITH HARDENING EMPHASIS. In-scope write paths remain queued, auditable, and confirmation-backed where already required. This feature strengthens fail-closed execution rather than broadening write authority.
- Graph contract path: PASS. No new Microsoft Graph domain is introduced, and existing provider or restore flows continue to rely on existing service layers instead of direct endpoint shortcuts.
- Deterministic capabilities: PASS. Capability enforcement remains tied to the canonical capability registry and resolver services; the feature extends when those checks happen.
- RBAC-UX planes: PASS. This feature remains in the admin `/admin` plane and tenant-context admin starts. Platform `/system` is not broadened. Non-members remain 404, members lacking capability remain 403 in authorization semantics.
- Workspace isolation: PASS. Execution legitimacy will re-check workspace and tenant scope using authoritative records, not UI memory.
- Tenant isolation: PASS. Tenant-bound queued work must still prove current tenant entitlement before acting.
- Destructive confirmation: PASS. No new destructive Filament action is introduced; existing start actions keep current confirmation rules.
- Global search safety: PASS. No global search behavior changes are planned.
- Run observability and Ops-UX: PASS. `OperationRun` remains the canonical observability record, start surfaces remain enqueue-only, Monitoring remains DB-only, and denied execution paths remain terminal run outcomes rather than ad-hoc notifications.
- Ops-UX lifecycle ownership: PASS. `OperationRun.status` and `OperationRun.outcome` remain service-owned through `OperationRunService`; the implementation must not let middleware or jobs bypass that rule.
- Ops-UX summary counts: PASS. Denied runs will continue using existing normalized summary counts and failure payload rules.
- Ops-UX guards: PASS WITH EXTENSION. Existing guard philosophy remains correct; this feature will add focused regression tests around execution-time denial rather than weaken current service-ownership rules.
- Ops-UX system runs: PASS. Scheduled or initiator-null runs remain visible in Monitoring without initiator-only terminal DB notifications.
- Automation and idempotency: PASS. Existing queue locks, idempotency, stale-queued handling, and dedupe contracts remain in force and become more reliable when legitimacy is rechecked before work starts.
- Data minimization: PASS. Denial reasons and audit entries will remain sanitized and secret-free.
- Badge semantics (BADGE-001): PASS. Existing blocked versus failed outcome semantics remain centralized through operation outcome helpers.
- UI naming (UI-NAMING-001): PASS. Operator-facing text continues to use domain wording such as `blocked`, `failed`, `queued`, and `View run`.
- Filament Action Surface Contract: PASS. Visible action inventories are unchanged; only their backend trust contract is hardened.
- Filament UX-001: PASS. No layout change is planned.
- Asset strategy: PASS. No new Filament or front-end assets are needed, so deployment guidance for `php artisan filament:assets` remains unchanged.

**Post-Phase 1 Re-check: PASS**

- The design extends existing support seams instead of introducing a second operation-run lifecycle model.
- No database migration, Graph-contract registry change, panel registration change, or asset build change is required for the first implementation slice.
- Livewire v4 and Filament v5 compliance remain intact, and provider registration stays in `bootstrap/providers.php`.
- Existing global-search requirements remain satisfied because no resource search contract is changed.

## Project Structure

### Documentation (this feature)

```text
specs/149-queued-execution-reauthorization/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   ├── execution-legitimacy.schema.json
│   └── no-external-api-changes.md
└── tasks.md
```

### Source Code (repository root)

```text
app/
├── Contracts/
│   └── Hardening/
├── Jobs/
│   ├── Middleware/
│   └── Operations/
├── Models/
│   ├── OperationRun.php
│   ├── ProviderConnection.php
│   ├── Tenant.php
│   └── User.php
├── Notifications/
│   └── OperationRunCompleted.php
├── Policies/
├── Services/
│   ├── Hardening/
│   ├── Inventory/
│   ├── OperationRunService.php
│   ├── Providers/
│   ├── Tenants/
│   └── Verification/
└── Support/
    ├── Auth/
    ├── Badges/
    ├── Operation*/
    ├── OpsUx/
    ├── Providers/
    └── Tenants/
tests/
├── Feature/
│   ├── Operations/
│   ├── Rbac/
│   ├── Restore/
│   └── Verification/
└── Unit/
    ├── Jobs/
    ├── Operations/
    └── Tenants/
```

**Structure Decision**: Use the existing Laravel monolith and harden the queue execution boundary in-place. Shared execution legitimacy should live beside the current job, tenant-operability, provider-start, and `OperationRun` seams rather than in a new standalone subsystem.

## Phase 0 Research Summary

- `OperationRunService` already provides the core observability primitives needed for this feature: canonical queued runs, stale-queued failure handling, blocked terminal outcomes, sanitized failure payloads, and terminal audit plus notification emission.
- `ProviderOperationStartGate` is a strong dispatch-time gate for provider-backed operations, but it only validates legitimacy before enqueue and not again inside the worker. The new feature should extend the same contract to execution time rather than replacing it.
- `TrackOperationRun` currently marks runs as `running` before any legitimacy recheck. That ordering is too early for this spec because a denied-at-execution run must fail closed before side effects and ideally before being treated as a real running operation.
- The codebase already distinguishes a provider-blocked outcome through `OperationRunOutcome::Blocked`, `finalizeBlockedRun()`, and provider reason codes. That is the correct existing vocabulary to reuse for denied execution instead of inventing a parallel terminal state.
- Job families are inconsistent today. Provider Gen2 flows use an explicit start gate and connection resolution, restore or write jobs use localized `WriteGateInterface` checks inside the job, and other queued jobs often resolve tenant and user IDs directly without a canonical actor or scope continuity contract.
- `TenantOperabilityService` already centralizes tenant lifecycle and entitlement-aware decisions, but it has no queued-execution lane or execution-specific question. The first implementation slice should reuse its authority while introducing an execution-oriented decision seam rather than reintroducing local lifecycle checks in jobs.
- The audit-derived candidate notes already narrowed the must-answer questions for this feature: execution identity, non-operable tenant handling, retryable versus terminal denials, and `OperationRun` plus `AuditLog` representation. Those questions are fully resolved in `research.md` and carried into the design below.

## Phase 1 Design

### Implementation Approach

1. Add one shared execution-legitimacy contract ahead of side effects.
   - Introduce a support-layer decision boundary that evaluates whether a queued operation may still begin.
   - Keep `OperationRunService` as the lifecycle owner and treat the new decision layer as the gate that decides whether the run may transition from queued to meaningful execution.

2. Separate dispatch-time acceptance from execution-time legitimacy.
   - Keep existing dispatch gates such as `ProviderOperationStartGate` for enqueue-time checks, dedupe, and blocked preflight outcomes.
   - Add a second revalidation stage in the worker path so queue delay cannot bypass current authorization, scope, operability, or prerequisite truth.

3. Reuse existing blocked outcome semantics.
   - Represent execution-time refusal through `OperationRunOutcome::Blocked` plus structured reason codes and sanitized failure payloads.
   - Do not introduce a second terminal state just for execution reauthorization.

4. Distinguish human-bound authority from system authority explicitly.
   - Human-initiated runs remain actor-bound and must re-check the current actor's membership, entitlement, and capability at execution time.
   - Scheduled or initiator-null runs remain system-authority runs and must re-check allowed system execution plus tenant operability and prerequisites without pretending they are user-authorized actions.
   - The allowed system execution policy comes from one canonical operation-type allowlist owned by the execution legitimacy gate and fed only by trusted scheduler or system entry paths.

5. Move the first legitimacy check before `TrackOperationRun` marks a run as running.
   - Either introduce a new queue middleware that executes before `TrackOperationRun` or refactor the existing middleware flow so legitimacy is evaluated first.
   - The queue worker must not display a denied run as `running` before the denial is known.

6. Scope the first implementation slice to representative high-risk job families.
   - Provider-backed runs already using `ProviderOperationStartGate`.
   - Restore or write jobs currently guarded by `WriteGateInterface` inside the worker.
   - Inventory or sync jobs that resolve tenant and user at execution time and already use `TrackOperationRun`.
   - Bulk orchestrator or worker families that fan out destructive tenant-affecting work.

7. Keep the first slice schema-free and asset-free.
   - Store new authority or denial metadata inside existing `OperationRun.context` and failure payloads.
   - Reuse current Monitoring pages, notifications, and badges.
   - Prove the metadata contract with focused regression coverage so blocked execution remains observable without adding persistence fields.

### Planned Workstreams

- **Workstream A: Execution legitimacy core model**
   Introduce or extend support-layer types for authority mode, execution context, legitimacy decision, denial classification, reason codes, and the initial retryability mapping. Keep them close to the existing operation and tenant support layers.

- **Workstream B: Queue middleware and lifecycle ordering**
  Refactor queue execution entry so legitimacy is evaluated before `TrackOperationRun` marks a run as `running`, while preserving service-owned run transitions and retry-safe behavior.

- **Workstream C: Representative job-family adoption**
  Apply the shared contract to provider-backed jobs, restore or write jobs, inventory or sync jobs, and at least one bulk orchestrator family so the new contract is proven across different execution shapes.

- **Workstream D: Denial observability and audit semantics**
   Normalize execution-time denial into blocked run outcomes, structured reason codes, Monitoring detail messaging, summary-count-safe payloads, and audit events that clearly separate policy refusal from runtime failure.

- **Workstream E: Regression hardening**
   Add focused Pest coverage for allowed paths, lost capability, lost entitlement, tenant non-operability, system-run behavior, retry behavior, representative job-family adoption, and direct-access 404 versus 403 semantics on canonical operations surfaces.

### Testing Strategy

- Add unit tests for the execution-legitimacy decision layer, covering actor-bound and system-authority contexts plus structured denial reasons.
- Add unit tests for the canonical system-authority allowlist and the initial retryability mapping so gate decisions stay deterministic across job families.
- Add unit or integration tests for queue-middleware ordering to prove a run is not marked `running` before legitimacy passes.
- Add focused feature tests for representative provider-backed jobs showing dispatch-time acceptance plus execution-time denial when connection or scope truth changes.
- Add focused feature tests for representative provider-backed, restore, and system-authority flows showing still-legitimate execution continues successfully without false denial.
- Add focused feature tests for restore or write jobs showing existing write-gate checks are folded into the canonical execution contract rather than left as isolated local patterns.
- Add focused feature tests for inventory or sync jobs showing lost capability, lost tenant membership, and non-operable tenant outcomes are blocked before execution.
- Add focused feature tests for at least one bulk orchestrator family showing retries perform a fresh legitimacy evaluation and blocked execution remains observable.
- Add focused tests proving initiator-null runs do not emit initiator-only terminal database notifications while still recording blocked terminal outcomes in Monitoring.
- Add or update run-detail, notification, and canonical operations access tests to prove blocked execution remains distinct from generic failure while preserving 404 versus 403 semantics.
- Add focused tests proving blocked-run summary counts remain normalized through the canonical summary key contract.
- Add focused tests proving authority and denial metadata stays inside existing `OperationRun` context and failure payload structures with no schema change.
- Run the minimum focused Pest suite through Sail; no full-suite run is required for planning artifacts.

## Complexity Tracking

No constitution violations or exceptional complexity are planned at this stage.