Phase 0 Research: Queued Execution Reauthorization and Scope Continuity

Decision: Extend the existing `OperationRun` and queue middleware seam instead of creating a second execution framework

Rationale: The repo already has the right core primitives for observability and queue orchestration: OperationRunService, TrackOperationRun, ProviderOperationStartGate, blocked outcome semantics, and sanitized terminal audit handling. The missing piece is not an entirely new framework but a canonical execution-legitimacy check that runs before jobs start doing real work.

Alternatives considered:

Create a separate execution-orchestration subsystem just for reauthorization: rejected because it would duplicate OperationRun lifecycle ownership and make the queue path harder to reason about.
Keep adding local checks inside individual jobs: rejected because that is the exact drift pattern this feature is supposed to eliminate.

Decision: Reuse `OperationRunOutcome::Blocked` as the canonical execution-denial outcome

Rationale: OperationRunOutcome already includes blocked, and OperationRunService::finalizeBlockedRun() already writes sanitized blocked outcomes with reason codes, next steps, terminal audit, and normal terminal notification behavior. Reusing that vocabulary keeps Monitoring and operator language consistent.

Alternatives considered:

Add a new denied terminal outcome: rejected because it would fork existing outcome semantics and badge behavior without a strong product need.
Represent execution-time denial as failed: rejected because the spec explicitly requires a clear distinction between intentional trust-policy refusal and ordinary runtime failure.

Decision: Human-initiated queued runs remain actor-bound; scheduled runs remain system-authority runs

Rationale: The architecture audit raised the core identity question directly. For human-initiated work, the safest and most comprehensible rule is that authority must still belong to the initiating actor when the job begins. For scheduled or initiator-null work, the system must act under explicit system authority and current tenant operability rather than pretending a user still owns the action.

Alternatives considered:

Convert all queued jobs into system-owned authority after dispatch: rejected because that would silently broaden authority and weaken audit meaning.
Freeze the dispatch-time actor snapshot as permanent authority: rejected because that preserves the stale-legitimacy gap this spec is trying to close.

Decision: Put the legitimacy check before `TrackOperationRun` marks runs as `running`

Rationale: TrackOperationRun currently transitions the run to running before the job body executes. For this feature, that is too late and too optimistic. A blocked-at-execution job should fail closed before side effects and before Monitoring treats it as an active operation.

Alternatives considered:

Leave TrackOperationRun as-is and block inside each job body: rejected because jobs would already look like active execution and the ordering would vary by job.
Mark the run running first and immediately block it afterward: rejected because it creates misleading transient truth in Monitoring and leaves room for side effects to start too early.

Decision: Reuse `TenantOperabilityService` for tenant-state truth, but add an execution-oriented decision seam

Rationale: Tenant operability is already centralized for selector, route, and lifecycle-safe action semantics. The queue execution path should not reintroduce raw lifecycle checks. At the same time, the existing lanes and questions do not directly represent queued execution, so the plan should extend the central seam with an execution-oriented question or adjacent support primitive.

Alternatives considered:

Hardcode tenant lifecycle checks inside jobs: rejected because it recreates the same drift pattern that Specs 143, 144, and 148 are reducing elsewhere.
Ignore tenant operability and only re-check capability: rejected because archived, discarded, or otherwise non-operable tenants are a distinct class of invalid execution.

Decision: Treat execution-prerequisite failures separately from capability or membership loss, but still fail closed before work

Rationale: The feature needs structured denial reasons, not just a boolean. Existing code already distinguishes provider-configuration blocks and write-gate failures. The execution contract should preserve that distinction so operators can tell the difference between authorization loss, tenant non-operability, and prerequisite invalidity.

Alternatives considered:

Collapse every denied start into one generic blocked reason: rejected because the spec requires operator and audit clarity.
Treat prerequisite failures as retryable by default: rejected because some prerequisite failures are deterministic policy blocks and should be terminal until state changes.

Decision: Scope the first implementation slice to representative queued job families, not every queued job in the repo

Rationale: The repo has dozens of ShouldQueue jobs. Planning all of them as day-one adopters would produce a vague plan and stall execution. The feature needs one shared contract plus enough representative adoption to prove it works across provider-backed operations, restore or write jobs, inventory or sync jobs, and bulk orchestrators.

Alternatives considered:

Attempt repo-wide queue adoption in one slice: rejected because it is too large for a focused hardening feature.
Apply the contract to one provider job only: rejected because that would leave the architecture mostly unchanged while claiming closure.

Decision: Preserve existing external routes and keep the first slice schema-free

Rationale: This feature is an internal execution-hardening change. Existing Filament and Monitoring routes remain the same, and the required new metadata can live in OperationRun.context and failure payloads. That keeps the first slice focused on behavior, not API or persistence churn.

Alternatives considered:

Introduce new routes or a separate operations API just for execution legitimacy: rejected because the feature does not require a new operator flow.
Add dedicated persistence tables for denial state: rejected because existing OperationRun and AuditLog structures already provide the right observability foundation.

6.3 KiB Raw Blame History

Phase 0 Research: Queued Execution Reauthorization and Scope Continuity

Decision: Extend the existing OperationRun and queue middleware seam instead of creating a second execution framework

Decision: Reuse OperationRunOutcome::Blocked as the canonical execution-denial outcome

Decision: Human-initiated queued runs remain actor-bound; scheduled runs remain system-authority runs

Decision: Put the legitimacy check before TrackOperationRun marks runs as running

Decision: Reuse TenantOperabilityService for tenant-state truth, but add an execution-oriented decision seam

Decision: Treat execution-prerequisite failures separately from capability or membership loss, but still fail closed before work

Decision: Scope the first implementation slice to representative queued job families, not every queued job in the repo

Decision: Preserve existing external routes and keep the first slice schema-free

6.3 KiB

Raw Blame History

Decision: Extend the existing `OperationRun` and queue middleware seam instead of creating a second execution framework

Decision: Reuse `OperationRunOutcome::Blocked` as the canonical execution-denial outcome

Decision: Put the legitimacy check before `TrackOperationRun` marks runs as `running`

Decision: Reuse `TenantOperabilityService` for tenant-state truth, but add an execution-oriented decision seam