4.1 KiB
4.1 KiB
Quickstart: Operation Lifecycle Guarantees & Queue-to-Domain Failure Reconciliation
Goal
Validate that covered queued OperationRun executions always converge to trustworthy terminal truth and that Monitoring surfaces no longer imply indefinite normal activity for orphaned runs.
Prerequisites
- Start Sail.
- Ensure the queue worker is running through Sail.
- Ensure the database contains at least one workspace with operator-visible operation runs for covered types.
- Ensure test fixtures or factories can create
OperationRunrecords inqueued,running, andcompletedstates.
Implementation Validation Order
1. Run focused lifecycle service tests
vendor/bin/sail artisan test --compact --filter=OperationRunService
Expected outcome:
- Stale queued reconciliation still works.
- Stale running reconciliation is added and service-owned.
- Terminal runs are not mutated.
2. Run focused reconciler tests
vendor/bin/sail artisan test --compact --filter=LifecycleReconciler
vendor/bin/sail artisan test --compact --filter=stale
Expected outcome:
- Stale queued runs are force-resolved to
completed/failed. - Stale running runs are force-resolved to
completed/failed. - Fresh runs remain untouched.
- Reconciliation is idempotent across repeated execution.
3. Run focused failed-job bridge tests
vendor/bin/sail artisan test --compact --filter=failed
vendor/bin/sail artisan test --compact --filter=MaxAttempts
vendor/bin/sail artisan test --compact --filter=TimeoutExceeded
Expected outcome:
- Covered jobs with direct
failed()bridges map queue failure truth back toOperationRun. - Queue failures that never complete normal middleware finalization still converge through reconciliation.
4. Run the Run-126 regression scenario
vendor/bin/sail artisan test --compact --filter=Run126
vendor/bin/sail artisan test --compact --filter=orphaned
Expected outcome:
- A run left in
runningwithoutcompleteRun()orfailRun()is marked terminal failed once the stale threshold is exceeded. - The operator-facing state no longer implies normal active work.
5. Run focused Monitoring UX tests
vendor/bin/sail artisan test --compact tests/Feature/Operations
vendor/bin/sail artisan test --compact --filter=Operations
Expected outcome:
- The Operations index distinguishes fresh activity from stale or reconciled failure semantics.
- The run detail distinguishes normal failure from reconciled lifecycle failure.
- Canonical Monitoring authorization semantics remain intact.
6. Run runtime timing guard tests
vendor/bin/sail artisan test --compact --filter=retry_after
vendor/bin/sail artisan test --compact --filter=timeout
Expected outcome:
- Covered lifecycle policy timeouts stay safely below effective
retry_after. - Misaligned timing assumptions fail validation instead of remaining implicit.
Runtime notes
- Covered lifecycle jobs now declare explicit
timeoutvalues and setfailOnTimeout = true. - The lifecycle validator expects covered job timeouts and expected runtimes to stay below queue
retry_afterwith a safety margin. - If queue worker settings change during rollout, run
vendor/bin/sail artisan queue:restartso workers pick up the new lifecycle contract. - Production and staging stop-wait expectations must stay above the longest covered timeout so workers can exit cleanly instead of orphaning in-flight runs.
7. Manual smoke-check in the browser
- Open
/admin/operationsand inspect a fresh active run. - Inspect a deliberately stale or reconciled run and confirm the list no longer presents it as ordinary in-progress work.
- Open
/admin/operations/{run}for a reconciled run and confirm the detail page shows operator-safe lifecycle explanation plus secondary diagnostics. - Confirm existing
View runnavigation remains canonical and no new destructive action is introduced.
Non-Goals For This Slice
- No resumable execution or checkpoint recovery.
- No queue backend replacement or Horizon adoption.
- No new manual retry or re-drive UI.
- No new
OperationRunstatus enum.