ahmido 5bcb4f6ab8 feat: harden queued execution legitimacy (#179 )

## Summary
- add a canonical queued execution legitimacy contract for actor-bound and system-authority operation runs
- enforce legitimacy before queued jobs transition runs to running across provider, inventory, restore, bulk, sync, and scheduled backup flows
- surface blocked execution outcomes consistently in Monitoring, notifications, audit data, and the tenantless operation viewer
- add Spec 149 artifacts and focused Pest coverage for legitimacy decisions, middleware ordering, blocked presentation, retry behavior, and cross-family adoption

## Testing
- vendor/bin/sail artisan test --compact tests/Unit/Operations/QueuedExecutionLegitimacyGateTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/QueuedExecutionMiddlewareOrderingTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Verification/ProviderExecutionReauthorizationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/RunInventorySyncExecutionReauthorizationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/ExecuteRestoreRunExecutionReauthorizationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/SystemRunBlockedExecutionNotificationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/BulkOperationExecutionReauthorizationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/QueuedExecutionRetryReauthorizationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/QueuedExecutionContractMatrixTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/OperationRunBlockedExecutionPresentationTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/QueuedExecutionAuditTrailTest.php
- vendor/bin/sail artisan test --compact tests/Feature/Operations/TenantlessOperationRunViewerTest.php
- vendor/bin/sail bin pint --dirty --format agent

## Manual validation
- validated queued provider execution blocking for tenant operability drift in the integrated browser on /admin/operations and /admin/operations/{run}
- validated 404 vs 403 route behavior for non-membership vs in-scope capability denial
- validated initiator-null blocked system-run behavior without creating a user terminal notification

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #179

2026-03-17 21:52:40 +00:00

17 KiB

Raw Blame History

Implementation Plan: Queued Execution Reauthorization and Scope Continuity

Branch: 149-queued-execution-reauthorization | Date: 2026-03-17 | Spec: specs/149-queued-execution-reauthorization/spec.md Input: Feature specification from /specs/149-queued-execution-reauthorization/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

Introduce one canonical execution-legitimacy contract for queued tenant-affecting operations so work is re-authorized when the worker is actually about to act, not only when the operator clicked Start. The implementation will reuse the existing OperationRun observability model, OperationRunService blocked outcome flow, TenantOperabilityService, and write-hardening seams, but add a shared run-time execution gate and middleware ordering that can fail closed before any side effects.

This is a support-layer and job-orchestration hardening feature, not a new UI surface or persistence redesign. The plan therefore focuses on extending the queue execution path already present in app/Jobs, app/Jobs/Middleware, app/Services/OperationRunService.php, app/Services/Providers, app/Services/Tenants, and app/Services/Hardening, then migrating representative high-risk job families first: provider-backed queued runs, restore or write jobs, inventory or sync jobs, and bulk orchestrator families.

Technical Context

Language/Version: PHP 8.4.15
Primary Dependencies: Laravel 12, Filament 5, Livewire 4, Pest 4, existing OperationRunService, TrackOperationRun, ProviderOperationStartGate, TenantOperabilityService, CapabilityResolver, and WriteGateInterface seams
Storage: PostgreSQL-backed application data plus queue-serialized OperationRun context; no schema migration planned for the first implementation slice
Testing: Pest 4 unit and feature coverage run through Laravel Sail
Target Platform: Laravel Sail web application with queue workers processing Filament-started and scheduled tenant-affecting operations Project Type: Laravel monolith web application
Performance Goals: Execution legitimacy checks must complete synchronously before side effects, add no render-time remote calls, and keep per-job startup overhead limited to current authoritative DB lookups plus existing support-layer evaluation
Constraints: Preserve existing Ops-UX run lifecycle ownership, terminal notification rules, centralized badge semantics, Filament v5 plus Livewire v4 compliance, provider registration in bootstrap/providers.php, and current route contracts; no new Graph bypasses, no asset or panel changes, and no weakening of 404 versus 403 semantics
Scale/Scope: One shared execution-legitimacy contract, one queue-middleware or execution-gate integration path, representative adoption across provider, restore, inventory or sync, and bulk job families, plus focused regression coverage under tests/Feature and tests/Unit

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Pre-Phase 0 Gate: PASS

Inventory-first: PASS. This feature does not change inventory or snapshot ownership and only hardens when queued jobs may act.
Read/write separation: PASS WITH HARDENING EMPHASIS. In-scope write paths remain queued, auditable, and confirmation-backed where already required. This feature strengthens fail-closed execution rather than broadening write authority.
Graph contract path: PASS. No new Microsoft Graph domain is introduced, and existing provider or restore flows continue to rely on existing service layers instead of direct endpoint shortcuts.
Deterministic capabilities: PASS. Capability enforcement remains tied to the canonical capability registry and resolver services; the feature extends when those checks happen.
RBAC-UX planes: PASS. This feature remains in the admin /admin plane and tenant-context admin starts. Platform /system is not broadened. Non-members remain 404, members lacking capability remain 403 in authorization semantics.
Workspace isolation: PASS. Execution legitimacy will re-check workspace and tenant scope using authoritative records, not UI memory.
Tenant isolation: PASS. Tenant-bound queued work must still prove current tenant entitlement before acting.
Destructive confirmation: PASS. No new destructive Filament action is introduced; existing start actions keep current confirmation rules.
Global search safety: PASS. No global search behavior changes are planned.
Run observability and Ops-UX: PASS. OperationRun remains the canonical observability record, start surfaces remain enqueue-only, Monitoring remains DB-only, and denied execution paths remain terminal run outcomes rather than ad-hoc notifications.
Ops-UX lifecycle ownership: PASS. OperationRun.status and OperationRun.outcome remain service-owned through OperationRunService; the implementation must not let middleware or jobs bypass that rule.
Ops-UX summary counts: PASS. Denied runs will continue using existing normalized summary counts and failure payload rules.
Ops-UX guards: PASS WITH EXTENSION. Existing guard philosophy remains correct; this feature will add focused regression tests around execution-time denial rather than weaken current service-ownership rules.
Ops-UX system runs: PASS. Scheduled or initiator-null runs remain visible in Monitoring without initiator-only terminal DB notifications.
Automation and idempotency: PASS. Existing queue locks, idempotency, stale-queued handling, and dedupe contracts remain in force and become more reliable when legitimacy is rechecked before work starts.
Data minimization: PASS. Denial reasons and audit entries will remain sanitized and secret-free.
Badge semantics (BADGE-001): PASS. Existing blocked versus failed outcome semantics remain centralized through operation outcome helpers.
UI naming (UI-NAMING-001): PASS. Operator-facing text continues to use domain wording such as blocked, failed, queued, and View run.
Filament Action Surface Contract: PASS. Visible action inventories are unchanged; only their backend trust contract is hardened.
Filament UX-001: PASS. No layout change is planned.
Asset strategy: PASS. No new Filament or front-end assets are needed, so deployment guidance for php artisan filament:assets remains unchanged.

Post-Phase 1 Re-check: PASS

The design extends existing support seams instead of introducing a second operation-run lifecycle model.
No database migration, Graph-contract registry change, panel registration change, or asset build change is required for the first implementation slice.
Livewire v4 and Filament v5 compliance remain intact, and provider registration stays in bootstrap/providers.php.
Existing global-search requirements remain satisfied because no resource search contract is changed.

Project Structure

Documentation (this feature)

specs/149-queued-execution-reauthorization/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   ├── execution-legitimacy.schema.json
│   └── no-external-api-changes.md
└── tasks.md

Source Code (repository root)

app/
├── Contracts/
│   └── Hardening/
├── Jobs/
│   ├── Middleware/
│   └── Operations/
├── Models/
│   ├── OperationRun.php
│   ├── ProviderConnection.php
│   ├── Tenant.php
│   └── User.php
├── Notifications/
│   └── OperationRunCompleted.php
├── Policies/
├── Services/
│   ├── Hardening/
│   ├── Inventory/
│   ├── OperationRunService.php
│   ├── Providers/
│   ├── Tenants/
│   └── Verification/
└── Support/
    ├── Auth/
    ├── Badges/
    ├── Operation*/
    ├── OpsUx/
    ├── Providers/
    └── Tenants/
tests/
├── Feature/
│   ├── Operations/
│   ├── Rbac/
│   ├── Restore/
│   └── Verification/
└── Unit/
    ├── Jobs/
    ├── Operations/
    └── Tenants/

Structure Decision: Use the existing Laravel monolith and harden the queue execution boundary in-place. Shared execution legitimacy should live beside the current job, tenant-operability, provider-start, and OperationRun seams rather than in a new standalone subsystem.

Phase 0 Research Summary

OperationRunService already provides the core observability primitives needed for this feature: canonical queued runs, stale-queued failure handling, blocked terminal outcomes, sanitized failure payloads, and terminal audit plus notification emission.
ProviderOperationStartGate is a strong dispatch-time gate for provider-backed operations, but it only validates legitimacy before enqueue and not again inside the worker. The new feature should extend the same contract to execution time rather than replacing it.
TrackOperationRun currently marks runs as running before any legitimacy recheck. That ordering is too early for this spec because a denied-at-execution run must fail closed before side effects and ideally before being treated as a real running operation.
The codebase already distinguishes a provider-blocked outcome through OperationRunOutcome::Blocked, finalizeBlockedRun(), and provider reason codes. That is the correct existing vocabulary to reuse for denied execution instead of inventing a parallel terminal state.
Job families are inconsistent today. Provider Gen2 flows use an explicit start gate and connection resolution, restore or write jobs use localized WriteGateInterface checks inside the job, and other queued jobs often resolve tenant and user IDs directly without a canonical actor or scope continuity contract.
TenantOperabilityService already centralizes tenant lifecycle and entitlement-aware decisions, but it has no queued-execution lane or execution-specific question. The first implementation slice should reuse its authority while introducing an execution-oriented decision seam rather than reintroducing local lifecycle checks in jobs.
The audit-derived candidate notes already narrowed the must-answer questions for this feature: execution identity, non-operable tenant handling, retryable versus terminal denials, and OperationRun plus AuditLog representation. Those questions are fully resolved in research.md and carried into the design below.

Phase 1 Design

Implementation Approach

Add one shared execution-legitimacy contract ahead of side effects.
- Introduce a support-layer decision boundary that evaluates whether a queued operation may still begin.
- Keep OperationRunService as the lifecycle owner and treat the new decision layer as the gate that decides whether the run may transition from queued to meaningful execution.
Separate dispatch-time acceptance from execution-time legitimacy.
- Keep existing dispatch gates such as ProviderOperationStartGate for enqueue-time checks, dedupe, and blocked preflight outcomes.
- Add a second revalidation stage in the worker path so queue delay cannot bypass current authorization, scope, operability, or prerequisite truth.
Reuse existing blocked outcome semantics.
- Represent execution-time refusal through OperationRunOutcome::Blocked plus structured reason codes and sanitized failure payloads.
- Do not introduce a second terminal state just for execution reauthorization.
Distinguish human-bound authority from system authority explicitly.
- Human-initiated runs remain actor-bound and must re-check the current actor's membership, entitlement, and capability at execution time.
- Scheduled or initiator-null runs remain system-authority runs and must re-check allowed system execution plus tenant operability and prerequisites without pretending they are user-authorized actions.
- The allowed system execution policy comes from one canonical operation-type allowlist owned by the execution legitimacy gate and fed only by trusted scheduler or system entry paths.
Move the first legitimacy check before TrackOperationRun marks a run as running.
- Either introduce a new queue middleware that executes before TrackOperationRun or refactor the existing middleware flow so legitimacy is evaluated first.
- The queue worker must not display a denied run as running before the denial is known.
Scope the first implementation slice to representative high-risk job families.
- Provider-backed runs already using ProviderOperationStartGate.
- Restore or write jobs currently guarded by WriteGateInterface inside the worker.
- Inventory or sync jobs that resolve tenant and user at execution time and already use TrackOperationRun.
- Bulk orchestrator or worker families that fan out destructive tenant-affecting work.
Keep the first slice schema-free and asset-free.
- Store new authority or denial metadata inside existing OperationRun.context and failure payloads.
- Reuse current Monitoring pages, notifications, and badges.
- Prove the metadata contract with focused regression coverage so blocked execution remains observable without adding persistence fields.

Planned Workstreams

Workstream A: Execution legitimacy core model
Introduce or extend support-layer types for authority mode, execution context, legitimacy decision, denial classification, reason codes, and the initial retryability mapping. Keep them close to the existing operation and tenant support layers.
Workstream B: Queue middleware and lifecycle ordering
Refactor queue execution entry so legitimacy is evaluated before TrackOperationRun marks a run as running, while preserving service-owned run transitions and retry-safe behavior.
Workstream C: Representative job-family adoption
Apply the shared contract to provider-backed jobs, restore or write jobs, inventory or sync jobs, and at least one bulk orchestrator family so the new contract is proven across different execution shapes.
Workstream D: Denial observability and audit semantics
Normalize execution-time denial into blocked run outcomes, structured reason codes, Monitoring detail messaging, summary-count-safe payloads, and audit events that clearly separate policy refusal from runtime failure.
Workstream E: Regression hardening
Add focused Pest coverage for allowed paths, lost capability, lost entitlement, tenant non-operability, system-run behavior, retry behavior, representative job-family adoption, and direct-access 404 versus 403 semantics on canonical operations surfaces.

Testing Strategy

Add unit tests for the execution-legitimacy decision layer, covering actor-bound and system-authority contexts plus structured denial reasons.
Add unit tests for the canonical system-authority allowlist and the initial retryability mapping so gate decisions stay deterministic across job families.
Add unit or integration tests for queue-middleware ordering to prove a run is not marked running before legitimacy passes.
Add focused feature tests for representative provider-backed jobs showing dispatch-time acceptance plus execution-time denial when connection or scope truth changes.
Add focused feature tests for representative provider-backed, restore, and system-authority flows showing still-legitimate execution continues successfully without false denial.
Add focused feature tests for restore or write jobs showing existing write-gate checks are folded into the canonical execution contract rather than left as isolated local patterns.
Add focused feature tests for inventory or sync jobs showing lost capability, lost tenant membership, and non-operable tenant outcomes are blocked before execution.
Add focused feature tests for at least one bulk orchestrator family showing retries perform a fresh legitimacy evaluation and blocked execution remains observable.
Add focused tests proving initiator-null runs do not emit initiator-only terminal database notifications while still recording blocked terminal outcomes in Monitoring.
Add or update run-detail, notification, and canonical operations access tests to prove blocked execution remains distinct from generic failure while preserving 404 versus 403 semantics.
Add focused tests proving blocked-run summary counts remain normalized through the canonical summary key contract.
Add focused tests proving authority and denial metadata stays inside existing OperationRun context and failure payload structures with no schema change.
Run the minimum focused Pest suite through Sail; no full-suite run is required for planning artifacts.

Complexity Tracking

No constitution violations or exceptional complexity are planned at this stage.

17 KiB Raw Blame History