# Quickstart: Operation Lifecycle Guarantees & Queue-to-Domain Failure Reconciliation ## Goal Validate that covered queued `OperationRun` executions always converge to trustworthy terminal truth and that Monitoring surfaces no longer imply indefinite normal activity for orphaned runs. ## Prerequisites 1. Start Sail. 2. Ensure the queue worker is running through Sail. 3. Ensure the database contains at least one workspace with operator-visible operation runs for covered types. 4. Ensure test fixtures or factories can create `OperationRun` records in `queued`, `running`, and `completed` states. ## Implementation Validation Order ### 1. Run focused lifecycle service tests ```bash vendor/bin/sail artisan test --compact --filter=OperationRunService ``` Expected outcome: - Stale queued reconciliation still works. - Stale running reconciliation is added and service-owned. - Terminal runs are not mutated. ### 2. Run focused reconciler tests ```bash vendor/bin/sail artisan test --compact --filter=LifecycleReconciler vendor/bin/sail artisan test --compact --filter=stale ``` Expected outcome: - Stale queued runs are force-resolved to `completed/failed`. - Stale running runs are force-resolved to `completed/failed`. - Fresh runs remain untouched. - Reconciliation is idempotent across repeated execution. ### 3. Run focused failed-job bridge tests ```bash vendor/bin/sail artisan test --compact --filter=failed vendor/bin/sail artisan test --compact --filter=MaxAttempts vendor/bin/sail artisan test --compact --filter=TimeoutExceeded ``` Expected outcome: - Covered jobs with direct `failed()` bridges map queue failure truth back to `OperationRun`. - Queue failures that never complete normal middleware finalization still converge through reconciliation. ### 4. Run the Run-126 regression scenario ```bash vendor/bin/sail artisan test --compact --filter=Run126 vendor/bin/sail artisan test --compact --filter=orphaned ``` Expected outcome: - A run left in `running` without `completeRun()` or `failRun()` is marked terminal failed once the stale threshold is exceeded. - The operator-facing state no longer implies normal active work. ### 5. Run focused Monitoring UX tests ```bash vendor/bin/sail artisan test --compact tests/Feature/Operations vendor/bin/sail artisan test --compact --filter=Operations ``` Expected outcome: - The Operations index distinguishes fresh activity from stale or reconciled failure semantics. - The run detail distinguishes normal failure from reconciled lifecycle failure. - Canonical Monitoring authorization semantics remain intact. ### 6. Run runtime timing guard tests ```bash vendor/bin/sail artisan test --compact --filter=retry_after vendor/bin/sail artisan test --compact --filter=timeout ``` Expected outcome: - Covered lifecycle policy timeouts stay safely below effective `retry_after`. - Misaligned timing assumptions fail validation instead of remaining implicit. ## Runtime notes - Covered lifecycle jobs now declare explicit `timeout` values and set `failOnTimeout = true`. - The lifecycle validator expects covered job timeouts and expected runtimes to stay below queue `retry_after` with a safety margin. - If queue worker settings change during rollout, run `vendor/bin/sail artisan queue:restart` so workers pick up the new lifecycle contract. - Production and staging stop-wait expectations must stay above the longest covered timeout so workers can exit cleanly instead of orphaning in-flight runs. ### 7. Manual smoke-check in the browser 1. Open `/admin/operations` and inspect a fresh active run. 2. Inspect a deliberately stale or reconciled run and confirm the list no longer presents it as ordinary in-progress work. 3. Open `/admin/operations/{run}` for a reconciled run and confirm the detail page shows operator-safe lifecycle explanation plus secondary diagnostics. 4. Confirm existing `View run` navigation remains canonical and no new destructive action is introduced. ## Non-Goals For This Slice - No resumable execution or checkpoint recovery. - No queue backend replacement or Horizon adoption. - No new manual retry or re-drive UI. - No new `OperationRun` status enum.