## Summary - harden operation-run lifecycle handling with explicit reconciliation policy, stale-run healing, failed-job bridging, and monitoring visibility - refactor audit log event inspection into a Filament slide-over and remove the stale inline detail/header-action coupling - align panel theme asset resolution and supporting Filament UI updates, including the rounded 2xl theme token regression fix ## Testing - ran focused Pest coverage for the affected audit-log inspection flow and related visibility tests - ran formatting with `vendor/bin/sail bin pint --dirty --format agent` - manually verified the updated audit-log slide-over flow in the integrated browser ## Notes - branch includes the Spec 160 artifacts under `specs/160-operation-lifecycle-guarantees/` - the full test suite was not rerun as part of this final commit/PR step Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #190
7.8 KiB
Phase 1 Data Model: Operation Lifecycle Guarantees & Queue-to-Domain Failure Reconciliation
Overview
This feature does not require a new database table in the first implementation slice. The primary data-model work is the formalization of existing OperationRun persistence plus new derived lifecycle-policy and freshness concepts that make queue truth, domain truth, and operator-visible truth converge deterministically.
Persistent Domain Entities
OperationRun
Purpose: Canonical workspace-scoped operational record for long-running, queued, scheduled, or otherwise operator-visible work.
Key fields:
idworkspace_idtenant_idnullableuser_idnullabletypestatuswith canonical valuesqueued,running,completedoutcomewith canonical values includingpending,succeeded,partially_succeeded,blocked,failedrun_identity_hashsummary_countsJSONBfailure_summaryJSONB arraycontextJSONBstarted_atcompleted_atcreated_atupdated_at
Relationships:
- Belongs to one workspace
- Optionally belongs to one tenant
- Optionally belongs to one initiating user
Validation rules relevant to this feature:
statusandoutcometransitions remain service-owned viaOperationRunService.- Non-terminal active runs remain constrained by the existing active-run unique index semantics.
summary_countskeys remain from the canonical summary key catalog and values remain numeric-only.- Reconciliation metadata must be stored in a standardized structure inside
contextandfailure_summarywithout persisting secrets.
State transitions relevant to this feature:
queued/pending→running/pendingqueued/pending→completed/failedwhen stale queued reconciliation or direct queue-failure bridging resolves an orphaned queued runrunning/pending→completed/succeeded|partially_succeeded|blocked|failedrunning/pending→completed/failedwhen stale running reconciliation resolves an orphaned active runcompleted/*is terminal and must never be mutated by reconciliation
Failed Job Record (failed_jobs)
Purpose: Infrastructure-level evidence that a queued job exhausted attempts, timed out, or otherwise failed.
Key fields used conceptually:
- UUID or failed-job identifier
- connection
- queue
- payload
- exception
- failed_at
Relationships:
- Not directly related through a foreign key to
OperationRun - Linked back to
OperationRunthrough job-owned identity resolution or reconciliation evidence
Validation rules relevant to this feature:
- A failed-job record is evidence, not operator-facing truth by itself.
- Evidence may inform reconciliation or diagnostics but must not replace the domain transition on
OperationRun.
Queue Job Definition
Purpose: The queued class that owns or advances a covered OperationRun.
Key lifecycle-relevant properties:
operationRunreference orgetOperationRun()contract- optional
$timeout - optional
$failOnTimeout - optional
$triesorretryUntil() middleware()includingTrackOperationRunand other queue middleware- optional
failed(Throwable $e)callback
Validation rules relevant to this feature:
- Covered jobs must provide a credible path to terminal truth through direct failure bridging, fallback reconciliation, or both.
- Covered long-running jobs must have intentional timeout behavior and must be compatible with queue timing invariants.
New Derived Domain Objects
OperationLifecyclePolicy
Purpose: Configuration-backed policy describing which operation types are in scope for V1 and how their lifecycle should be evaluated.
Fields:
operationTypecoveredbooleanqueuedStaleAfterSecondsrunningStaleAfterSecondsexpectedMaxRuntimeSecondsrequiresDirectFailedBridgebooleansupportsReconciliationboolean
Validation rules:
queuedStaleAfterSecondsandrunningStaleAfterSecondsmust be positive integers.expectedMaxRuntimeSecondsmust stay below effective queueretry_afterwith safety margin.- Only covered operation types participate in the generic reconciler for V1.
OperationRunFreshnessAssessment
Purpose: Derived classification used by reconcilers and Monitoring surfaces to determine whether a non-terminal run is still trustworthy as active.
Fields:
operationRunIdstatusfreshnessStatewith canonical valuesfresh,likely_stale,terminal,unknownevaluatedAtthresholdSecondsevidencekey-value map
Behavior:
- For
queuedruns, freshness is typically derived fromcreated_at, absence ofstarted_at, and policy threshold. - For
runningruns, freshness is derived fromstarted_at, last meaningful update evidence available in persisted state, and policy threshold. completedruns always assess as terminal and are excluded from stale reconciliation.
LifecycleReconciliationRecord
Purpose: Structured reconciliation evidence stored in OperationRun.context and mirrored in failure_summary.
Fields:
reconciledAtreconciliationKindsuch asstale_queued,stale_running,queue_failure_bridge,adapter_syncreasonCodereasonMessageevidencekey-value mapsourcesuch asfailed_callback,scheduled_reconciler,adapter_reconciler
Validation rules:
- Must only be added when the feature force-resolves or directly bridges a run.
- Must be idempotent; repeat reconciliation must not append conflicting terminal truth.
- Must be operator-safe and sanitized.
OperationQueueFailureBridge
Purpose: Derived mapping between a queued job failure and the owning OperationRun.
Fields:
operationRunIdjobClassbridgeSourcesuch asfailed_callbackorreconcilerexceptionClassreasonCodeterminalOutcome
Behavior:
- Exists conceptually as a design contract, not necessarily as a standalone stored table.
- Bridges queue truth into service-owned
OperationRunterminal transitions.
Supporting Catalogs
Reconciliation Reason Codes
Purpose: Stable reason-code catalog for lifecycle healing.
Initial values:
run.stale_queuedrun.stale_runningrun.infrastructure_timeout_or_abandonmentrun.queue_failure_bridgerun.adapter_out_of_sync
Validation rules:
- Operator-facing text must be derived from centralized presenters or reason translation helpers.
- Codes remain stable enough for regression assertions and audit review.
Monitoring Freshness State
Purpose: Derived presentation state for Operations surfaces.
Initial values:
fresh_activelikely_stalereconciled_failedterminal_normal
Behavior:
- Not stored as a new top-level database enum in V1.
- Derived centrally so tables, detail pages, and notifications do not drift.
Consumer Mapping
| Consumer | Primary data it needs |
|---|---|
| Generic lifecycle reconciler | Covered operation policy, active non-terminal runs, freshness assessment, standardized reconciliation transition |
| Covered queued jobs | Owning OperationRun, timeout behavior, direct failed() bridge path |
| Operations index | Current status, outcome, freshness assessment, reconciliation evidence summary |
| Operation run detail | Full reconciliation record, translated reason, run timing, summary counts, failure details |
| Runtime invariant validation | Queue connection retry_after, effective job timeout, covered operation lifecycle policy |
Migration Notes
- No schema migration is required for the first implementation slice.
- Existing
contextandfailure_summarystructures should be normalized for reconciliation evidence rather than replaced. - If later observability needs require indexed reconciliation metrics, a follow-up slice can promote reconciliation metadata into first-class columns or projections.