TenantAtlas/specs/358-operationrun-queue-truth-foundation/spec.md
ahmido 2a12729dc5 feat: implement operation run queue truth foundation (spec 358) (#429)
Implements platform feature branch `358-operationrun-queue-truth-foundation`.

Target branch: `platform-dev`.

Follow-up integration path after merge:

`platform-dev` -> `dev`.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #429
2026-06-06 12:03:11 +00:00

28 KiB

Feature Specification: OperationRun Queue Truth Foundation

Feature Branch: 358-operationrun-queue-truth-foundation
Created: 2026-06-06
Status: Draft
Input: User-provided OperationRun queue-truth draft, reconciled against current repo truth

Spec Candidate Check (mandatory — SPEC-GATE-001)

  • Problem: Current OperationRun UX still allows contradictory generic queue truth. The repo can show stale-attention semantics and ordinary queue/progress reassurance for the same active run, depending on which helper or surface renders first.
  • Today's failure: A run can surface Likely stale lifecycle attention while generic copy still says Waiting for worker. or No action needed yet. The operation is waiting for a worker. This creates a false-calm operator message on active monitoring surfaces.
  • User-visible improvement: Queued and running OperationRun records will render one honest generic lifecycle story across shell hints, monitoring rows, and canonical detail without overclaiming domain success or orphaned queue state.
  • Smallest enterprise-capable version: Align the existing generic lifecycle helpers (OperationRunFreshnessState, OperationRunProgressContract, RunDurationInsights, and OperationUxPresenter) and apply the resulting truth to current monitoring and shell surfaces only.
  • Explicit non-goals: No new persisted OperationRun statuses or outcomes, no queue schema redesign, no new worker-health subsystem, no new notification family, no new adapter framework, no new domain-success reconciliation, no restore/review/backup auto-completion logic, and no destructive action changes.
  • Permanent complexity imported: One bounded derived queue-truth path over existing OperationRun state, plus focused tests that lock the contract across shared monitoring surfaces.
  • Why now: Historical specs already improved stale visibility and shared progress, but current repo truth still contains contradictory generic wording. This is an active operator-trust gap in currently shipped runtime paths.
  • Why not local: Fixing only one Blade view, one banner, or one detail card would leave the contradiction between progress, freshness, and queue-guidance helpers intact.
  • Approval class: Core Enterprise
  • Red flags triggered: shared interaction family, monitoring-state semantics, and possible helper consolidation. Defense: the slice remains derived-only, persistence-free, and narrower than a new reconciliation framework.
  • Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexität: 1 | Produktnähe: 2 | Wiederverwendung: 2 | Gesamt: 11/12
  • Decision: approve

Repo Truth Reconciliation

The user draft is directionally correct, but current repo truth changes the exact framing:

  1. The repo helper advanced to 1000 because specs/999-seeder-external-id/ already exists, but this package is intentionally finalized as user-requested Spec 358.
  2. Generic stale reconciliation already exists via TenantpilotReconcileOperationRuns and OperationLifecycleReconciler; adapter-backed reconciliation already exists via OpsReconcileAdapterRuns and AdapterRunReconciler.
  3. A shared progress helper already exists via OperationRunProgressContract, and Spec 272 already extended its active-progress precedence for phased/composite truth; this spec aligns stale/fresh queue guidance with that current shared contract instead of inventing a new one.
  4. The current canonical Monitoring routes are workspace-scoped: /admin/workspaces/{workspace}/operations and /admin/workspaces/{workspace}/operations/{run}. OperationRunResource remains an implementation seam, not a route-backed surface.
  5. No follow-up Spec 359 promise is recorded here. Any later adapter or domain-success work must be promoted from fresh repo truth rather than pre-allocating a speculative framework sequence.

Spec Scope Fields (mandatory)

  • Scope: workspace, tenant, canonical-view
  • Primary Routes:
    • workspace-admin surfaces that host the shared shell activity hint while an entitled managed environment is active, including current /admin/workspaces/{workspace}/environments/{environment}/... starts
    • /admin/workspaces/{workspace}/operations
    • /admin/workspaces/{workspace}/operations/{run}
    • the current canonical Monitoring surfaces App\Filament\Pages\Monitoring\Operations and App\Filament\Pages\Operations\TenantlessOperationRunViewer
  • Data Ownership:
    • operation_runs remain the only lifecycle and freshness source of truth
    • OperationRun context remains the only place where legitimacy or reconciliation evidence may already exist
    • no new persisted queue-truth mirror, no new audit artifact, and no new worker-health record are introduced
  • RBAC:
    • existing workspace and managed-environment entitlement rules remain authoritative
    • non-members and out-of-scope actors remain 404
    • this spec changes wording and derived presentation only; it does not widen access or add new capability strings

For canonical-view specs, the spec MUST define:

  • Default filter behavior when tenant-context is active: Existing environment-prefilter and page-state behavior on /admin/workspaces/{workspace}/operations remain unchanged. This spec changes queue truth, not monitoring state ownership.
  • Explicit entitlement checks preventing cross-tenant leakage: Derived stale/queued guidance must use only runs the actor is already authorized to view. No new wording may reveal hidden tenant scope or hidden run existence.

UI Surface Impact (mandatory — UI-COV-001)

  • No UI surface impact
  • Existing page changed
  • New page/route added
  • Navigation changed
  • Filament panel/provider surface changed
  • New modal/drawer/wizard/action added
  • New table/form/state added
  • Customer-facing surface changed
  • Dangerous action changed
  • Status/evidence/review presentation changed
  • Workspace/environment context presentation changed

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

  • Route/page/surface:
    • tenant shell activity hint (BulkOperationProgress)
    • workspace operations hub (App\Filament\Pages\Monitoring\Operations) at /admin/workspaces/{workspace}/operations
    • canonical operation detail (App\Filament\Pages\Operations\TenantlessOperationRunViewer) at /admin/workspaces/{workspace}/operations/{run}
    • shared implementation seam only: App\Filament\Resources\OperationRunResource for the reused table/detail payload contract
  • Current or new page archetype: existing monitoring/workbench family only
  • Design depth: Domain Pattern Surface
  • Repo-truth level: repo-verified
  • Existing pattern reused: current OperationRun monitoring family, current shell activity hint, current monitoring detail banners
  • New pattern required: none; this is a truth-alignment follow-up within the existing pattern family
  • Screenshot required: no; the slice corrects shared wording/derived truth inside existing monitoring surfaces
  • Page audit required: no new page-report identity; existing monitoring family coverage is sufficient
  • Customer-safe review required: no; operator-only monitoring surfaces
  • Dangerous-action review required: no; no action hierarchy or destructive behavior changes
  • Coverage files updated or explicitly not needed: no new coverage file is required because the existing monitoring-family anchors already cover the touched reachable surfaces: docs/ui-ux-enterprise-audit/route-inventory.md (UI-016, UI-017), docs/ui-ux-enterprise-audit/page-reports/ui-003-operations.md, docs/ui-ux-enterprise-audit/strategic-surfaces.md, and specs/313-workspace-environment-context-browser-verification/surface-inventory.md
  • No-impact rationale when applicable: N/A

Cross-Cutting / Shared Pattern Reuse (mandatory)

  • Cross-cutting feature?: yes
  • Interaction class(es): status messaging, queue/progress guidance, monitoring detail guidance, shell active-work hint
  • Systems touched:
    • OperationRunFreshnessState
    • OperationRunProgressContract
    • RunDurationInsights
    • OperationUxPresenter
    • current shell banner and canonical monitoring/detail surfaces
  • Existing pattern(s) to extend: current lifecycle policy, freshness derivation, the Spec-270/272 shared progress contract, monitoring detail banner, and shared run-link/presenter paths
  • Shared contract / presenter / builder / renderer to reuse: OperationRunProgressContract, OperationUxPresenter, OperationRunFreshnessState, and current monitoring/view renderer paths
  • Why the existing shared path is sufficient or insufficient: The repo already has the right ownership boundaries, but the generic queue truth is split across multiple helpers that can disagree in wording and emphasis.
  • Allowed deviation and why: none by default; if one tiny helper is required to keep existing presenters reviewable, it must remain local to current Ops UX truth and must not become an adapter registry
  • Consistency impact: queued/running/stale wording, progress availability, lifecycle attention, and detail guidance must remain aligned across shell, list, and canonical detail
  • Review focus: no surface may reassure with ordinary queue copy after freshness has already escalated the same active run to stale attention

OperationRun UX Impact (mandatory)

  • Touches OperationRun start/completion/link UX?: yes, reuse-only
  • Shared OperationRun UX contract/layer reused: current OperationRun link, freshness, presenter, and progress helper paths
  • Delegated start/completion UX behaviors: existing queued toasts, canonical run links, browser events, and terminal notification paths remain unchanged
  • Local surface-owned behavior that remains: density and placement only
  • Queued DB-notification policy: unchanged
  • Terminal notification path: unchanged central lifecycle mechanism
  • Exception required?: none

Provider Boundary / Platform Core Check (mandatory)

  • Shared provider/platform boundary touched?: no
  • Boundary classification: N/A
  • Seams affected: N/A
  • Neutral platform terms preserved or introduced: operation, queued, running, lifecycle window, review worker health
  • Provider-specific semantics retained and why: none
  • Why this does not deepen provider coupling accidentally: the slice works entirely on platform-owned OperationRun truth and existing shared monitoring helpers
  • Follow-up path: none

UI / Surface Guardrail Impact (mandatory)

Surface / Change Operator-facing surface change? Native vs Custom Shared-Family Relevance State Layers Touched Exception Needed? Low-Impact / N/A Note
Tenant shell activity hint yes Native Filament + existing Livewire view shared active-work hint family shell, page no no new action or route
Operations list / resource detail summary yes Native Filament resource/detail shared monitoring family page, detail no wording-only inside existing collection/detail
Canonical tenantless run viewer banners yes Native Filament page shared monitoring detail family detail no no new diagnostics section family

Decision-First Surface Role (mandatory)

Surface Decision Role Human-in-the-loop Moment Immediately Visible for First Decision On-Demand Detail / Evidence Why This Is Primary or Why Not Workflow Alignment Attention-load Reduction
Tenant shell activity hint Secondary Context Surface Decide whether an active run needs inspection now active-state truth, one open link, honest progress availability full diagnostics stay on canonical monitoring/detail pages secondary because it supports ongoing work rather than owning diagnostics follows existing start-surface workflow removes false calmness
Operations list Primary Decision Surface Decide which active run needs inspection first lifecycle attention, queue truth, scope, and run identity full detail after drill-through primary because it is the canonical monitoring queue aligns with current monitoring triage removes open-every-row guesswork
Canonical run detail Tertiary Evidence / Diagnostics Surface Confirm what the stale or active state really means one honest lifecycle explanation before deep diagnostics raw context, history, evidence, and related links tertiary because inspection already happened preserves current detail role removes banner/guidance contradiction

Audience-Aware Disclosure (mandatory)

Surface Audience Modes In Scope Decision-First Default-Visible Content Operator Diagnostics Support / Raw Evidence One Dominant Next Action Hidden / Gated By Default Duplicate-Truth Prevention
Tenant shell activity hint operator-MSP active-state summary plus one open action minimal guidance only raw/support data stays off-surface View operation or current collective review action raw detail stays on canonical monitoring surfaces do not repeat stale explanation multiple ways
Operations list operator-MSP row-level stale/queued truth and scope detail remains secondary raw payloads stay on detail row open raw and related evidence stay on detail one row summary, not multiple competing summaries
Canonical run detail operator-MSP, support-platform one honest lifecycle banner diagnostics sections below the banner raw context remains lower-priority existing return/open actions support/raw detail remains secondary to the top summary lifecycle banner and queue guidance must not disagree

UI/UX Surface Classification (mandatory)

Surface Action Surface Class Surface Type Likely Next Operator Action Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type / Justification
Tenant shell activity hint Monitoring hint Activity shell hint Open the active run if guidance escalates explicit open link forbidden existing shell secondary actions only none /admin/workspaces/{workspace}/operations with current contextual prefilter rules /admin/workspaces/{workspace}/operations/{run} current tenant/workspace shell context Operation whether the run is ordinary active work or already stale none
Operations list List / Table / Monitoring Read-only monitoring registry Open the run that needs follow-up full-row open required existing table controls only none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} workspace scope and current filters Operation run honest queued/running lifecycle truth none
Canonical run detail Record / Detail / Monitoring Diagnostics-first detail surface Inspect lifecycle truth before deeper diagnosis canonical detail page N/A existing header/related links only none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} workspace scope plus entitled tenant context Operation run honest active-state explanation none

Operator Surface Contract (mandatory)

Surface Primary Persona Decision / Operator Action Supported Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Tenant shell activity hint tenant operator Decide whether to inspect active work now shell hint Is this active work still ordinary, or is it already stale? label, status, progress availability, open link deep diagnostics remain elsewhere lifecycle, freshness, progress availability none current open/review action none
Operations list workspace operator Prioritize which active run to inspect monitoring registry Which run is actually waiting, progressing, or already stale? row summary, scope, lifecycle attention raw payloads on detail lifecycle, freshness, queue truth none open row none
Canonical run detail workspace operator Confirm generic lifecycle truth before diagnosing detail surface Why does this run read as stale or still-active? one lifecycle banner and queue guidance raw context, failure payloads, related artifacts lifecycle, freshness, legitimacy evidence none existing navigation and related links none

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: no
  • New persisted entity/table/artifact?: no
  • New abstraction?: no by default; prefer extending existing helpers
  • New enum/state/reason family?: no
  • New cross-domain UI framework/taxonomy?: no
  • Current operator problem: current generic queue truth can contradict itself across existing shared monitoring surfaces
  • Existing structure is insufficient because: lifecycle, freshness, progress, and queue guidance currently live in separate helpers that can drift in wording and emphasis
  • Narrowest correct implementation: align existing shared helpers and current monitoring renderers without adding a new framework, status family, or persistence layer
  • Ownership cost: bounded shared-helper review burden plus focused tests
  • Alternative intentionally rejected: a new reconciliation framework or queue-health persistence layer was rejected because the current issue is derived wording drift, not missing stored truth
  • Release truth: current-release operator-trust correction

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

  • Test purpose / classification: Unit + Feature
  • Validation lane(s): fast-feedback, confidence
  • Why this classification and these lanes are sufficient: the feature changes derived queue truth and current rendered monitoring output; focused unit and feature proof are sufficient without a new browser or heavy-governance lane
  • New or expanded test families:
    • apps/platform/tests/Unit/Support/OpsUx/OperationRunProgressContractTest.php
    • apps/platform/tests/Unit/Support/OpsUx/RunDurationInsightsTest.php or adjacent focused unit coverage if the repo keeps these assertions in an existing file
    • apps/platform/tests/Feature/OpsUx/ActivityFeedbackSurfaceTest.php
    • apps/platform/tests/Feature/OpsUx/BulkOperationProgressDbOnlyTest.php
    • apps/platform/tests/Feature/MonitoringOperationsTest.php
    • apps/platform/tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php
    • apps/platform/tests/Feature/Monitoring/MonitoringOperationsTest.php
    • apps/platform/tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php
    • apps/platform/tests/Feature/Operations/TenantlessOperationRunViewerTest.php
  • Fixture / helper cost impact: low to moderate; reuse existing OperationRun factories, workspace membership helpers, and monitoring fixtures only
  • Heavy-family visibility / justification: none
  • Special surface test profile: monitoring-state-page plus shared-detail-family
  • Standard-native relief or required special coverage: feature coverage is sufficient; no browser smoke is required unless implementation later proves a render-only regression that unit/feature coverage cannot catch
  • Reviewer handoff: reviewers must confirm the same stale run no longer mixes stale attention with ordinary queue/progress reassurance on shell, list, and detail surfaces
  • Budget / baseline / trend impact: none expected beyond small feature-local coverage growth
  • Escalation needed: reject-or-split if the slice expands into new persistence, adapter orchestration, or queue runtime redesign
  • Active feature PR close-out entry: Guardrail / Smoke Coverage
  • Planned validation commands:
    • cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OpsUx/OperationRunProgressContractTest.php
    • cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/OpsUx/ActivityFeedbackSurfaceTest.php tests/Feature/OpsUx/BulkOperationProgressDbOnlyTest.php tests/Feature/MonitoringOperationsTest.php tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php tests/Feature/Monitoring/MonitoringOperationsTest.php tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php tests/Feature/Operations/TenantlessOperationRunViewerTest.php
    • git diff --check

User Scenarios & Testing (mandatory)

User Story 1 - See honest queued and stale truth on active monitoring surfaces (Priority: P1)

As an operator, I need active monitoring surfaces to distinguish normal queued/running work from stale lifecycle drift without mixing calm queue reassurance into the same state.

Why this priority: This is the direct trust gap visible in current runtime surfaces.

Independent Test: Seed fresh and stale queued/running runs, open the shell activity hint and monitoring list, and verify that stale-active runs do not mix stale attention with ordinary queue or progress reassurance.

Acceptance Scenarios:

  1. Given a queued or running run is past its lifecycle window, When the shell or monitoring list renders it, Then the visible stale-active state does not also reassure with ordinary Waiting for worker., Progress details pending., or equivalent calm queue/progress copy.
  2. Given a fresh queued or running run is still within its lifecycle window, When the same surfaces render it, Then the copy remains calm and does not falsely escalate it as stale.

User Story 2 - Keep canonical run detail aligned with generic queue truth (Priority: P1)

As an operator, I need canonical run detail to confirm the same generic lifecycle meaning that list and shell surfaces already communicate before I inspect deeper diagnostics.

Why this priority: The current contradiction is most damaging when detail and compact surfaces disagree.

Independent Test: Open canonical detail for fresh, stale, and reconciled runs and verify that lifecycle banners, queue guidance, and top summary agree with the generic monitoring truth.

Acceptance Scenarios:

  1. Given a stale queued or stale running run, When the detail page opens, Then the top summary explains that the run is past its lifecycle window before deeper diagnostics render.
  2. Given a reconciled terminal run, When the detail page opens, Then it preserves the existing reconciled truth and does not regress to ordinary active-work guidance.

User Story 3 - Keep proof-backed reconciliation separate from generic queue truth (Priority: P2)

As a product owner, I need generic queue truth to stay separate from domain-success or adapter reconciliation so that stale monitoring copy does not silently become a business-completion engine.

Why this priority: The slice must stay narrow and avoid speculative framework growth.

Independent Test: Compare a stale generic run and an adapter-reconciled run, then verify that generic stale wording remains cautious while existing reconciled evidence paths still drive terminal truth where already implemented.

Acceptance Scenarios:

  1. Given a stale active run without proof-backed queue legitimacy evidence, When the UI renders it, Then the wording stays at past lifecycle window or equivalent cautious language and does not claim orphaned or domain-complete truth.
  2. Given a run already completed through scheduled or adapter reconciliation, When the UI renders it, Then the existing reconciled terminal semantics remain intact and are not rewritten as generic queue waiting.

Edge Cases

  • A queued run is stale but has no trustworthy worker correlation or job identifier.
  • A running run has determinate counts but is still past the lifecycle window.
  • A run is unsupported by current lifecycle policy and therefore should not receive stale overclaiming.
  • A run became terminal through scheduled_reconciler or adapter_reconciler.
  • A system-initiated run has no human initiator but still needs honest lifecycle wording.
  • Existing phase or composite progress hints exist in context and must not override stale lifecycle truth with false reassurance.

Requirements (mandatory)

  • FR-358-001: The system MUST keep OperationRun.status and OperationRun.outcome as the only persisted lifecycle fields for this slice.
  • FR-358-002: Generic queue truth MUST be derived from current lifecycle, freshness, and trusted context only.
  • FR-358-003: A run that is likely_stale by current lifecycle policy MUST NOT render ordinary queue or ordinary progress reassurance as visible stale-active guidance on shell, list, or canonical detail surfaces, whether that reassurance appears alone or alongside a stale label, banner, or attention cue.
  • FR-358-004: Fresh queued and fresh running runs MUST remain visibly calm and MUST NOT inherit stale emphasis.
  • FR-358-005: Existing scheduled or adapter reconciliation paths MUST remain authoritative for terminal truth where they already exist.
  • FR-358-006: This feature MUST NOT introduce a new queue-orphaned, reconciled, or business-success persistence family.
  • FR-358-007: This feature MUST NOT claim hard orphaned queue state unless current repo truth already provides trustworthy legitimacy evidence for that specific run.
  • FR-358-008: Existing OperationRun links, queued toasts, browser events, and terminal notification paths MUST remain unchanged.
  • FR-358-009: Existing workspace and tenant authorization boundaries MUST remain unchanged.
  • FR-358-010: The canonical monitoring list, shell hint, and canonical run detail MUST use one aligned generic lifecycle vocabulary for fresh queued/running work, stale active work, and reconciled terminal work.
  • FR-358-011: The implementation MUST stay within current helper ownership unless one small local helper is required to avoid unreadable presenter logic.
  • FR-358-012: Focused unit and feature coverage MUST prove both positive and negative cases for fresh vs stale queue truth.

Success Criteria (mandatory)

  • SC-358-001: In focused regression coverage, stale queued/running runs no longer mix stale attention with ordinary queue/progress reassurance on the affected surfaces.
  • SC-358-002: In focused regression coverage, fresh queued/running runs remain calm and do not falsely escalate to stale.
  • SC-358-003: Existing reconciled terminal behavior remains intact for runs already completed by scheduled or adapter truth.
  • SC-358-004: No new persisted lifecycle field, queue artifact, or adapter framework is introduced.

Assumptions

  • The current contradiction is a derived UX-truth problem, not a missing persistence problem.
  • Existing lifecycle policy thresholds remain valid for this slice.
  • Existing generic and adapter reconciliation commands remain out of scope except as historical context the UI must respect.

Risks

  • Over-correction: the slice could make all active work sound problematic. Mitigation: explicit fresh vs stale negative assertions.
  • Framework creep: the slice could drift into a new generic queue-health subsystem. Mitigation: no new persistence, no new adapter registry, and explicit out-of-scope boundaries.
  • Detail/list drift survives: one renderer could remain on old wording. Mitigation: focused list + detail + shell coverage in the same package.

Out of Scope

  • Queue worker health automation
  • New scheduler, command, or job families
  • New OperationRun statuses or outcomes
  • Domain-success reconciliation for review, restore, backup, sync, or export runs
  • Adapter framework expansion
  • New Filament resources, routes, or destructive actions