## Summary - harden operation-run lifecycle handling with explicit reconciliation policy, stale-run healing, failed-job bridging, and monitoring visibility - refactor audit log event inspection into a Filament slide-over and remove the stale inline detail/header-action coupling - align panel theme asset resolution and supporting Filament UI updates, including the rounded 2xl theme token regression fix ## Testing - ran focused Pest coverage for the affected audit-log inspection flow and related visibility tests - ran formatting with `vendor/bin/sail bin pint --dirty --format agent` - manually verified the updated audit-log slide-over flow in the integrated browser ## Notes - branch includes the Spec 160 artifacts under `specs/160-operation-lifecycle-guarantees/` - the full test suite was not rerun as part of this final commit/PR step Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #190
106 lines
4.1 KiB
Markdown
106 lines
4.1 KiB
Markdown
# Quickstart: Operation Lifecycle Guarantees & Queue-to-Domain Failure Reconciliation
|
|
|
|
## Goal
|
|
|
|
Validate that covered queued `OperationRun` executions always converge to trustworthy terminal truth and that Monitoring surfaces no longer imply indefinite normal activity for orphaned runs.
|
|
|
|
## Prerequisites
|
|
|
|
1. Start Sail.
|
|
2. Ensure the queue worker is running through Sail.
|
|
3. Ensure the database contains at least one workspace with operator-visible operation runs for covered types.
|
|
4. Ensure test fixtures or factories can create `OperationRun` records in `queued`, `running`, and `completed` states.
|
|
|
|
## Implementation Validation Order
|
|
|
|
### 1. Run focused lifecycle service tests
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact --filter=OperationRunService
|
|
```
|
|
|
|
Expected outcome:
|
|
- Stale queued reconciliation still works.
|
|
- Stale running reconciliation is added and service-owned.
|
|
- Terminal runs are not mutated.
|
|
|
|
### 2. Run focused reconciler tests
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact --filter=LifecycleReconciler
|
|
vendor/bin/sail artisan test --compact --filter=stale
|
|
```
|
|
|
|
Expected outcome:
|
|
- Stale queued runs are force-resolved to `completed/failed`.
|
|
- Stale running runs are force-resolved to `completed/failed`.
|
|
- Fresh runs remain untouched.
|
|
- Reconciliation is idempotent across repeated execution.
|
|
|
|
### 3. Run focused failed-job bridge tests
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact --filter=failed
|
|
vendor/bin/sail artisan test --compact --filter=MaxAttempts
|
|
vendor/bin/sail artisan test --compact --filter=TimeoutExceeded
|
|
```
|
|
|
|
Expected outcome:
|
|
- Covered jobs with direct `failed()` bridges map queue failure truth back to `OperationRun`.
|
|
- Queue failures that never complete normal middleware finalization still converge through reconciliation.
|
|
|
|
### 4. Run the Run-126 regression scenario
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact --filter=Run126
|
|
vendor/bin/sail artisan test --compact --filter=orphaned
|
|
```
|
|
|
|
Expected outcome:
|
|
- A run left in `running` without `completeRun()` or `failRun()` is marked terminal failed once the stale threshold is exceeded.
|
|
- The operator-facing state no longer implies normal active work.
|
|
|
|
### 5. Run focused Monitoring UX tests
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact tests/Feature/Operations
|
|
vendor/bin/sail artisan test --compact --filter=Operations
|
|
```
|
|
|
|
Expected outcome:
|
|
- The Operations index distinguishes fresh activity from stale or reconciled failure semantics.
|
|
- The run detail distinguishes normal failure from reconciled lifecycle failure.
|
|
- Canonical Monitoring authorization semantics remain intact.
|
|
|
|
### 6. Run runtime timing guard tests
|
|
|
|
```bash
|
|
vendor/bin/sail artisan test --compact --filter=retry_after
|
|
vendor/bin/sail artisan test --compact --filter=timeout
|
|
```
|
|
|
|
Expected outcome:
|
|
- Covered lifecycle policy timeouts stay safely below effective `retry_after`.
|
|
- Misaligned timing assumptions fail validation instead of remaining implicit.
|
|
|
|
## Runtime notes
|
|
|
|
- Covered lifecycle jobs now declare explicit `timeout` values and set `failOnTimeout = true`.
|
|
- The lifecycle validator expects covered job timeouts and expected runtimes to stay below queue `retry_after` with a safety margin.
|
|
- If queue worker settings change during rollout, run `vendor/bin/sail artisan queue:restart` so workers pick up the new lifecycle contract.
|
|
- Production and staging stop-wait expectations must stay above the longest covered timeout so workers can exit cleanly instead of orphaning in-flight runs.
|
|
|
|
### 7. Manual smoke-check in the browser
|
|
|
|
1. Open `/admin/operations` and inspect a fresh active run.
|
|
2. Inspect a deliberately stale or reconciled run and confirm the list no longer presents it as ordinary in-progress work.
|
|
3. Open `/admin/operations/{run}` for a reconciled run and confirm the detail page shows operator-safe lifecycle explanation plus secondary diagnostics.
|
|
4. Confirm existing `View run` navigation remains canonical and no new destructive action is introduced.
|
|
|
|
## Non-Goals For This Slice
|
|
|
|
- No resumable execution or checkpoint recovery.
|
|
- No queue backend replacement or Horizon adoption.
|
|
- No new manual retry or re-drive UI.
|
|
- No new `OperationRun` status enum.
|