ahmido 2a12729dc5 feat: implement operation run queue truth foundation (spec 358) (#429 )

Implements platform feature branch `358-operationrun-queue-truth-foundation`.

Target branch: `platform-dev`.

Follow-up integration path after merge:

`platform-dev` -> `dev`.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #429

2026-06-06 12:03:11 +00:00

28 KiB

Raw Blame History

Feature Specification: OperationRun Queue Truth Foundation

Feature Branch: 358-operationrun-queue-truth-foundation
Created: 2026-06-06
Status: Draft
Input: User-provided OperationRun queue-truth draft, reconciled against current repo truth

Spec Candidate Check (mandatory — SPEC-GATE-001)

Problem: Current OperationRun UX still allows contradictory generic queue truth. The repo can show stale-attention semantics and ordinary queue/progress reassurance for the same active run, depending on which helper or surface renders first.
Today's failure: A run can surface Likely stale lifecycle attention while generic copy still says Waiting for worker. or No action needed yet. The operation is waiting for a worker. This creates a false-calm operator message on active monitoring surfaces.
User-visible improvement: Queued and running OperationRun records will render one honest generic lifecycle story across shell hints, monitoring rows, and canonical detail without overclaiming domain success or orphaned queue state.
Smallest enterprise-capable version: Align the existing generic lifecycle helpers (OperationRunFreshnessState, OperationRunProgressContract, RunDurationInsights, and OperationUxPresenter) and apply the resulting truth to current monitoring and shell surfaces only.
Explicit non-goals: No new persisted OperationRun statuses or outcomes, no queue schema redesign, no new worker-health subsystem, no new notification family, no new adapter framework, no new domain-success reconciliation, no restore/review/backup auto-completion logic, and no destructive action changes.
Permanent complexity imported: One bounded derived queue-truth path over existing OperationRun state, plus focused tests that lock the contract across shared monitoring surfaces.
Why now: Historical specs already improved stale visibility and shared progress, but current repo truth still contains contradictory generic wording. This is an active operator-trust gap in currently shipped runtime paths.
Why not local: Fixing only one Blade view, one banner, or one detail card would leave the contradiction between progress, freshness, and queue-guidance helpers intact.
Approval class: Core Enterprise
Red flags triggered: shared interaction family, monitoring-state semantics, and possible helper consolidation. Defense: the slice remains derived-only, persistence-free, and narrower than a new reconciliation framework.
Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexität: 1 | Produktnähe: 2 | Wiederverwendung: 2 | Gesamt: 11/12
Decision: approve

Repo Truth Reconciliation

The user draft is directionally correct, but current repo truth changes the exact framing:

The repo helper advanced to 1000 because specs/999-seeder-external-id/ already exists, but this package is intentionally finalized as user-requested Spec 358.
Generic stale reconciliation already exists via TenantpilotReconcileOperationRuns and OperationLifecycleReconciler; adapter-backed reconciliation already exists via OpsReconcileAdapterRuns and AdapterRunReconciler.
A shared progress helper already exists via OperationRunProgressContract, and Spec 272 already extended its active-progress precedence for phased/composite truth; this spec aligns stale/fresh queue guidance with that current shared contract instead of inventing a new one.
The current canonical Monitoring routes are workspace-scoped: /admin/workspaces/{workspace}/operations and /admin/workspaces/{workspace}/operations/{run}. OperationRunResource remains an implementation seam, not a route-backed surface.
No follow-up Spec 359 promise is recorded here. Any later adapter or domain-success work must be promoted from fresh repo truth rather than pre-allocating a speculative framework sequence.

Spec Scope Fields (mandatory)

Scope: workspace, tenant, canonical-view
Primary Routes:
- workspace-admin surfaces that host the shared shell activity hint while an entitled managed environment is active, including current /admin/workspaces/{workspace}/environments/{environment}/... starts
- /admin/workspaces/{workspace}/operations
- /admin/workspaces/{workspace}/operations/{run}
- the current canonical Monitoring surfaces App\Filament\Pages\Monitoring\Operations and App\Filament\Pages\Operations\TenantlessOperationRunViewer
Data Ownership:
- operation_runs remain the only lifecycle and freshness source of truth
- OperationRun context remains the only place where legitimacy or reconciliation evidence may already exist
- no new persisted queue-truth mirror, no new audit artifact, and no new worker-health record are introduced
RBAC:
- existing workspace and managed-environment entitlement rules remain authoritative
- non-members and out-of-scope actors remain 404
- this spec changes wording and derived presentation only; it does not widen access or add new capability strings

For canonical-view specs, the spec MUST define:

Default filter behavior when tenant-context is active: Existing environment-prefilter and page-state behavior on /admin/workspaces/{workspace}/operations remain unchanged. This spec changes queue truth, not monitoring state ownership.
Explicit entitlement checks preventing cross-tenant leakage: Derived stale/queued guidance must use only runs the actor is already authorized to view. No new wording may reveal hidden tenant scope or hidden run existence.

UI Surface Impact (mandatory — UI-COV-001)

No UI surface impact
Existing page changed
New page/route added
Navigation changed
Filament panel/provider surface changed
New modal/drawer/wizard/action added
New table/form/state added
Customer-facing surface changed
Dangerous action changed
Status/evidence/review presentation changed
Workspace/environment context presentation changed

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

Route/page/surface:
- tenant shell activity hint (BulkOperationProgress)
- workspace operations hub (App\Filament\Pages\Monitoring\Operations) at /admin/workspaces/{workspace}/operations
- canonical operation detail (App\Filament\Pages\Operations\TenantlessOperationRunViewer) at /admin/workspaces/{workspace}/operations/{run}
- shared implementation seam only: App\Filament\Resources\OperationRunResource for the reused table/detail payload contract
Current or new page archetype: existing monitoring/workbench family only
Design depth: Domain Pattern Surface
Repo-truth level: repo-verified
Existing pattern reused: current OperationRun monitoring family, current shell activity hint, current monitoring detail banners
New pattern required: none; this is a truth-alignment follow-up within the existing pattern family
Screenshot required: no; the slice corrects shared wording/derived truth inside existing monitoring surfaces
Page audit required: no new page-report identity; existing monitoring family coverage is sufficient
Customer-safe review required: no; operator-only monitoring surfaces
Dangerous-action review required: no; no action hierarchy or destructive behavior changes
Coverage files updated or explicitly not needed: no new coverage file is required because the existing monitoring-family anchors already cover the touched reachable surfaces: docs/ui-ux-enterprise-audit/route-inventory.md (UI-016, UI-017), docs/ui-ux-enterprise-audit/page-reports/ui-003-operations.md, docs/ui-ux-enterprise-audit/strategic-surfaces.md, and specs/313-workspace-environment-context-browser-verification/surface-inventory.md
No-impact rationale when applicable: N/A

Cross-Cutting / Shared Pattern Reuse (mandatory)

Cross-cutting feature?: yes
Interaction class(es): status messaging, queue/progress guidance, monitoring detail guidance, shell active-work hint
Systems touched:
- OperationRunFreshnessState
- OperationRunProgressContract
- RunDurationInsights
- OperationUxPresenter
- current shell banner and canonical monitoring/detail surfaces
Existing pattern(s) to extend: current lifecycle policy, freshness derivation, the Spec-270/272 shared progress contract, monitoring detail banner, and shared run-link/presenter paths
Shared contract / presenter / builder / renderer to reuse: OperationRunProgressContract, OperationUxPresenter, OperationRunFreshnessState, and current monitoring/view renderer paths
Why the existing shared path is sufficient or insufficient: The repo already has the right ownership boundaries, but the generic queue truth is split across multiple helpers that can disagree in wording and emphasis.
Allowed deviation and why: none by default; if one tiny helper is required to keep existing presenters reviewable, it must remain local to current Ops UX truth and must not become an adapter registry
Consistency impact: queued/running/stale wording, progress availability, lifecycle attention, and detail guidance must remain aligned across shell, list, and canonical detail
Review focus: no surface may reassure with ordinary queue copy after freshness has already escalated the same active run to stale attention

OperationRun UX Impact (mandatory)

Touches OperationRun start/completion/link UX?: yes, reuse-only
Shared OperationRun UX contract/layer reused: current OperationRun link, freshness, presenter, and progress helper paths
Delegated start/completion UX behaviors: existing queued toasts, canonical run links, browser events, and terminal notification paths remain unchanged
Local surface-owned behavior that remains: density and placement only
Queued DB-notification policy: unchanged
Terminal notification path: unchanged central lifecycle mechanism
Exception required?: none

Provider Boundary / Platform Core Check (mandatory)

Shared provider/platform boundary touched?: no
Boundary classification: N/A
Seams affected: N/A
Neutral platform terms preserved or introduced: operation, queued, running, lifecycle window, review worker health
Provider-specific semantics retained and why: none
Why this does not deepen provider coupling accidentally: the slice works entirely on platform-owned OperationRun truth and existing shared monitoring helpers
Follow-up path: none

UI / Surface Guardrail Impact (mandatory)

Surface / Change	Operator-facing surface change?	Native vs Custom	Shared-Family Relevance	State Layers Touched	Exception Needed?	Low-Impact / `N/A` Note
Tenant shell activity hint	yes	Native Filament + existing Livewire view	shared active-work hint family	shell, page	no	no new action or route
Operations list / resource detail summary	yes	Native Filament resource/detail	shared monitoring family	page, detail	no	wording-only inside existing collection/detail
Canonical tenantless run viewer banners	yes	Native Filament page	shared monitoring detail family	detail	no	no new diagnostics section family

Decision-First Surface Role (mandatory)

Surface	Decision Role	Human-in-the-loop Moment	Immediately Visible for First Decision	On-Demand Detail / Evidence	Why This Is Primary or Why Not	Workflow Alignment	Attention-load Reduction
Tenant shell activity hint	Secondary Context Surface	Decide whether an active run needs inspection now	active-state truth, one open link, honest progress availability	full diagnostics stay on canonical monitoring/detail pages	secondary because it supports ongoing work rather than owning diagnostics	follows existing start-surface workflow	removes false calmness
Operations list	Primary Decision Surface	Decide which active run needs inspection first	lifecycle attention, queue truth, scope, and run identity	full detail after drill-through	primary because it is the canonical monitoring queue	aligns with current monitoring triage	removes open-every-row guesswork
Canonical run detail	Tertiary Evidence / Diagnostics Surface	Confirm what the stale or active state really means	one honest lifecycle explanation before deep diagnostics	raw context, history, evidence, and related links	tertiary because inspection already happened	preserves current detail role	removes banner/guidance contradiction

Audience-Aware Disclosure (mandatory)

Surface	Audience Modes In Scope	Decision-First Default-Visible Content	Operator Diagnostics	Support / Raw Evidence	One Dominant Next Action	Hidden / Gated By Default	Duplicate-Truth Prevention
Tenant shell activity hint	operator-MSP	active-state summary plus one open action	minimal guidance only	raw/support data stays off-surface	`View operation` or current collective review action	raw detail stays on canonical monitoring surfaces	do not repeat stale explanation multiple ways
Operations list	operator-MSP	row-level stale/queued truth and scope	detail remains secondary	raw payloads stay on detail	row open	raw and related evidence stay on detail	one row summary, not multiple competing summaries
Canonical run detail	operator-MSP, support-platform	one honest lifecycle banner	diagnostics sections below the banner	raw context remains lower-priority	existing return/open actions	support/raw detail remains secondary to the top summary	lifecycle banner and queue guidance must not disagree

UI/UX Surface Classification (mandatory)

Surface	Action Surface Class	Surface Type	Likely Next Operator Action	Primary Inspect/Open Model	Row Click	Secondary Actions Placement	Destructive Actions Placement	Canonical Collection Route	Canonical Detail Route	Scope Signals	Canonical Noun	Critical Truth Visible by Default	Exception Type / Justification
Tenant shell activity hint	Monitoring hint	Activity shell hint	Open the active run if guidance escalates	explicit open link	forbidden	existing shell secondary actions only	none	`/admin/workspaces/{workspace}/operations` with current contextual prefilter rules	`/admin/workspaces/{workspace}/operations/{run}`	current tenant/workspace shell context	Operation	whether the run is ordinary active work or already stale	none
Operations list	List / Table / Monitoring	Read-only monitoring registry	Open the run that needs follow-up	full-row open	required	existing table controls only	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	workspace scope and current filters	Operation run	honest queued/running lifecycle truth	none
Canonical run detail	Record / Detail / Monitoring	Diagnostics-first detail surface	Inspect lifecycle truth before deeper diagnosis	canonical detail page	N/A	existing header/related links only	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	workspace scope plus entitled tenant context	Operation run	honest active-state explanation	none

Operator Surface Contract (mandatory)

Surface	Primary Persona	Decision / Operator Action Supported	Surface Type	Primary Operator Question	Default-visible Information	Diagnostics-only Information	Status Dimensions Used	Mutation Scope	Primary Actions	Dangerous Actions
Tenant shell activity hint	tenant operator	Decide whether to inspect active work now	shell hint	Is this active work still ordinary, or is it already stale?	label, status, progress availability, open link	deep diagnostics remain elsewhere	lifecycle, freshness, progress availability	none	current open/review action	none
Operations list	workspace operator	Prioritize which active run to inspect	monitoring registry	Which run is actually waiting, progressing, or already stale?	row summary, scope, lifecycle attention	raw payloads on detail	lifecycle, freshness, queue truth	none	open row	none
Canonical run detail	workspace operator	Confirm generic lifecycle truth before diagnosing	detail surface	Why does this run read as stale or still-active?	one lifecycle banner and queue guidance	raw context, failure payloads, related artifacts	lifecycle, freshness, legitimacy evidence	none	existing navigation and related links	none

Proportionality Review (mandatory when structural complexity is introduced)

New source of truth?: no
New persisted entity/table/artifact?: no
New abstraction?: no by default; prefer extending existing helpers
New enum/state/reason family?: no
New cross-domain UI framework/taxonomy?: no
Current operator problem: current generic queue truth can contradict itself across existing shared monitoring surfaces
Existing structure is insufficient because: lifecycle, freshness, progress, and queue guidance currently live in separate helpers that can drift in wording and emphasis
Narrowest correct implementation: align existing shared helpers and current monitoring renderers without adding a new framework, status family, or persistence layer
Ownership cost: bounded shared-helper review burden plus focused tests
Alternative intentionally rejected: a new reconciliation framework or queue-health persistence layer was rejected because the current issue is derived wording drift, not missing stored truth
Release truth: current-release operator-trust correction

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

Test purpose / classification: Unit + Feature
Validation lane(s): fast-feedback, confidence
Why this classification and these lanes are sufficient: the feature changes derived queue truth and current rendered monitoring output; focused unit and feature proof are sufficient without a new browser or heavy-governance lane
New or expanded test families:
- apps/platform/tests/Unit/Support/OpsUx/OperationRunProgressContractTest.php
- apps/platform/tests/Unit/Support/OpsUx/RunDurationInsightsTest.php or adjacent focused unit coverage if the repo keeps these assertions in an existing file
- apps/platform/tests/Feature/OpsUx/ActivityFeedbackSurfaceTest.php
- apps/platform/tests/Feature/OpsUx/BulkOperationProgressDbOnlyTest.php
- apps/platform/tests/Feature/MonitoringOperationsTest.php
- apps/platform/tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php
- apps/platform/tests/Feature/Monitoring/MonitoringOperationsTest.php
- apps/platform/tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php
- apps/platform/tests/Feature/Operations/TenantlessOperationRunViewerTest.php
Fixture / helper cost impact: low to moderate; reuse existing OperationRun factories, workspace membership helpers, and monitoring fixtures only
Heavy-family visibility / justification: none
Special surface test profile: monitoring-state-page plus shared-detail-family
Standard-native relief or required special coverage: feature coverage is sufficient; no browser smoke is required unless implementation later proves a render-only regression that unit/feature coverage cannot catch
Reviewer handoff: reviewers must confirm the same stale run no longer mixes stale attention with ordinary queue/progress reassurance on shell, list, and detail surfaces
Budget / baseline / trend impact: none expected beyond small feature-local coverage growth
Escalation needed: reject-or-split if the slice expands into new persistence, adapter orchestration, or queue runtime redesign
Active feature PR close-out entry: Guardrail / Smoke Coverage
Planned validation commands:
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OpsUx/OperationRunProgressContractTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/OpsUx/ActivityFeedbackSurfaceTest.php tests/Feature/OpsUx/BulkOperationProgressDbOnlyTest.php tests/Feature/MonitoringOperationsTest.php tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php tests/Feature/Monitoring/MonitoringOperationsTest.php tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php tests/Feature/Operations/TenantlessOperationRunViewerTest.php
- git diff --check

User Scenarios & Testing (mandatory)

User Story 1 - See honest queued and stale truth on active monitoring surfaces (Priority: P1)

As an operator, I need active monitoring surfaces to distinguish normal queued/running work from stale lifecycle drift without mixing calm queue reassurance into the same state.

Why this priority: This is the direct trust gap visible in current runtime surfaces.

Independent Test: Seed fresh and stale queued/running runs, open the shell activity hint and monitoring list, and verify that stale-active runs do not mix stale attention with ordinary queue or progress reassurance.

Acceptance Scenarios:

Given a queued or running run is past its lifecycle window, When the shell or monitoring list renders it, Then the visible stale-active state does not also reassure with ordinary Waiting for worker., Progress details pending., or equivalent calm queue/progress copy.
Given a fresh queued or running run is still within its lifecycle window, When the same surfaces render it, Then the copy remains calm and does not falsely escalate it as stale.

User Story 2 - Keep canonical run detail aligned with generic queue truth (Priority: P1)

As an operator, I need canonical run detail to confirm the same generic lifecycle meaning that list and shell surfaces already communicate before I inspect deeper diagnostics.

Why this priority: The current contradiction is most damaging when detail and compact surfaces disagree.

Independent Test: Open canonical detail for fresh, stale, and reconciled runs and verify that lifecycle banners, queue guidance, and top summary agree with the generic monitoring truth.

Acceptance Scenarios:

Given a stale queued or stale running run, When the detail page opens, Then the top summary explains that the run is past its lifecycle window before deeper diagnostics render.
Given a reconciled terminal run, When the detail page opens, Then it preserves the existing reconciled truth and does not regress to ordinary active-work guidance.

User Story 3 - Keep proof-backed reconciliation separate from generic queue truth (Priority: P2)

As a product owner, I need generic queue truth to stay separate from domain-success or adapter reconciliation so that stale monitoring copy does not silently become a business-completion engine.

Why this priority: The slice must stay narrow and avoid speculative framework growth.

Independent Test: Compare a stale generic run and an adapter-reconciled run, then verify that generic stale wording remains cautious while existing reconciled evidence paths still drive terminal truth where already implemented.

Acceptance Scenarios:

Given a stale active run without proof-backed queue legitimacy evidence, When the UI renders it, Then the wording stays at past lifecycle window or equivalent cautious language and does not claim orphaned or domain-complete truth.
Given a run already completed through scheduled or adapter reconciliation, When the UI renders it, Then the existing reconciled terminal semantics remain intact and are not rewritten as generic queue waiting.

Edge Cases

A queued run is stale but has no trustworthy worker correlation or job identifier.
A running run has determinate counts but is still past the lifecycle window.
A run is unsupported by current lifecycle policy and therefore should not receive stale overclaiming.
A run became terminal through scheduled_reconciler or adapter_reconciler.
A system-initiated run has no human initiator but still needs honest lifecycle wording.
Existing phase or composite progress hints exist in context and must not override stale lifecycle truth with false reassurance.

Requirements (mandatory)

FR-358-001: The system MUST keep OperationRun.status and OperationRun.outcome as the only persisted lifecycle fields for this slice.
FR-358-002: Generic queue truth MUST be derived from current lifecycle, freshness, and trusted context only.
FR-358-003: A run that is likely_stale by current lifecycle policy MUST NOT render ordinary queue or ordinary progress reassurance as visible stale-active guidance on shell, list, or canonical detail surfaces, whether that reassurance appears alone or alongside a stale label, banner, or attention cue.
FR-358-004: Fresh queued and fresh running runs MUST remain visibly calm and MUST NOT inherit stale emphasis.
FR-358-005: Existing scheduled or adapter reconciliation paths MUST remain authoritative for terminal truth where they already exist.
FR-358-006: This feature MUST NOT introduce a new queue-orphaned, reconciled, or business-success persistence family.
FR-358-007: This feature MUST NOT claim hard orphaned queue state unless current repo truth already provides trustworthy legitimacy evidence for that specific run.
FR-358-008: Existing OperationRun links, queued toasts, browser events, and terminal notification paths MUST remain unchanged.
FR-358-009: Existing workspace and tenant authorization boundaries MUST remain unchanged.
FR-358-010: The canonical monitoring list, shell hint, and canonical run detail MUST use one aligned generic lifecycle vocabulary for fresh queued/running work, stale active work, and reconciled terminal work.
FR-358-011: The implementation MUST stay within current helper ownership unless one small local helper is required to avoid unreadable presenter logic.
FR-358-012: Focused unit and feature coverage MUST prove both positive and negative cases for fresh vs stale queue truth.

Success Criteria (mandatory)

SC-358-001: In focused regression coverage, stale queued/running runs no longer mix stale attention with ordinary queue/progress reassurance on the affected surfaces.
SC-358-002: In focused regression coverage, fresh queued/running runs remain calm and do not falsely escalate to stale.
SC-358-003: Existing reconciled terminal behavior remains intact for runs already completed by scheduled or adapter truth.
SC-358-004: No new persisted lifecycle field, queue artifact, or adapter framework is introduced.

Assumptions

The current contradiction is a derived UX-truth problem, not a missing persistence problem.
Existing lifecycle policy thresholds remain valid for this slice.
Existing generic and adapter reconciliation commands remain out of scope except as historical context the UI must respect.

Risks

Over-correction: the slice could make all active work sound problematic. Mitigation: explicit fresh vs stale negative assertions.
Framework creep: the slice could drift into a new generic queue-health subsystem. Mitigation: no new persistence, no new adapter registry, and explicit out-of-scope boundaries.
Detail/list drift survives: one renderer could remain on old wording. Mitigation: focused list + detail + shell coverage in the same package.

Out of Scope

Queue worker health automation
New scheduler, command, or job families
New OperationRun statuses or outcomes
Domain-success reconciliation for review, restore, backup, sync, or export runs
Adapter framework expansion
New Filament resources, routes, or destructive actions

28 KiB Raw Blame History

Feature Specification: OperationRun Queue Truth Foundation

Spec Candidate Check (mandatory — SPEC-GATE-001)

Repo Truth Reconciliation

Spec Scope Fields (mandatory)

UI Surface Impact (mandatory — UI-COV-001)

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

Cross-Cutting / Shared Pattern Reuse (mandatory)

OperationRun UX Impact (mandatory)

Provider Boundary / Platform Core Check (mandatory)

UI / Surface Guardrail Impact (mandatory)

Decision-First Surface Role (mandatory)

Audience-Aware Disclosure (mandatory)

UI/UX Surface Classification (mandatory)

Operator Surface Contract (mandatory)

Proportionality Review (mandatory when structural complexity is introduced)

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

User Scenarios & Testing (mandatory)

User Story 1 - See honest queued and stale truth on active monitoring surfaces (Priority: P1)

User Story 2 - Keep canonical run detail aligned with generic queue truth (Priority: P1)

User Story 3 - Keep proof-backed reconciliation separate from generic queue truth (Priority: P2)

Edge Cases

Requirements (mandatory)

Success Criteria (mandatory)

Assumptions

Risks

Out of Scope

28 KiB

Raw Blame History