TenantAtlas/specs/178-ops-truth-alignment/spec.md
ahmido 1142d283eb feat: Spec 178 — Operations Lifecycle Alignment & Cross-Surface Truth Consistency (#209)
## Spec 178 — Operations Lifecycle Alignment & Cross-Surface Truth Consistency

Härtet die Run-Lifecycle-Wahrheit und Cross-Surface-Konsistenz über alle zentralen Operator-Flächen hinweg.

### Kern-Änderungen

**Lifecycle Truth Alignment**
- Einheitliche stale/stuck-Semantik zwischen Tenant-, Workspace-, Admin- und System-Surfaces
- `OperationRunFreshnessState` wird konsistent über alle Widgets und Seiten propagiert
- Gemeinsame Problem-Klassen-Trennung: `terminal_follow_up` vs. `active_stale_attention`

**BulkOperationProgress Freshness**
- Overlay zeigt nur noch `healthyActive()` Runs statt alle aktiven Runs
- Likely-stale Runs halten das Polling nicht mehr künstlich aktiv
- Terminal Runs verschwinden zeitnah aus dem Progress-Overlay

**Decision Zone im Run Detail**
- Stale/reconciled Attention in der primären Decision-Hierarchie
- Klare Antworten: aktiv? stale? reconciled? nächster Schritt?
- Artifact-reiche Runs behalten Lifecycle-Truth vor Deep-Diagnostics

**Cross-Surface Link-Continuity**
- Dashboard → Operations Hub → Run Detail erzählen dieselbe Geschichte
- Notifications referenzieren korrekte Problem-Klasse
- Workspace/Tenant-Attention verlinken problemklassengerecht

**System-Plane Fixes**
- `/system/ops/failures` 500-Error behoben (panel-sichere Artifact-URLs)
- System-Stuck/Failures zeigen reconciled stale lineage

### Weitere Fixes
- Inventory auth guard bereinigt (Gate statt ad-hoc Facades)
- Browser-Smoke-Tests stabilisiert (DOM-Assertions statt fragile Klicks)
- Test-Assertion-Drift für Verification/Lifecycle-Texte korrigiert

### Test-Ergebnis
Full Suite: **3269 passed**, 8 skipped, 0 failed

### Spec-Artefakte
- `specs/178-ops-truth-alignment/spec.md`
- `specs/178-ops-truth-alignment/plan.md`
- `specs/178-ops-truth-alignment/tasks.md`
- `specs/178-ops-truth-alignment/research.md`
- `specs/178-ops-truth-alignment/data-model.md`
- `specs/178-ops-truth-alignment/quickstart.md`
- `specs/178-ops-truth-alignment/contracts/operations-truth-alignment.openapi.yaml`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #209
2026-04-05 22:42:24 +00:00

38 KiB

Feature Specification: Operations Lifecycle Alignment & Cross-Surface Truth Consistency

Feature Branch: 178-ops-truth-alignment
Created: 2026-04-05
Status: Proposed
Input: User description: "Spec 178 - Operations Lifecycle Alignment & Cross-Surface Truth Consistency"

Spec Scope Fields (mandatory)

  • Scope: tenant + workspace + canonical-view + platform
  • Primary Routes:
    • /admin as the workspace overview surface where workspace attention and workspace recent operations appear
    • /admin/t/{tenant} as the tenant dashboard surface where tenant attention, recent operations, and active progress affordances appear
    • /admin/operations as the canonical monitoring hub and drill-through destination from admin-plane summaries
    • /admin/operations/{run} as the canonical run detail surface
    • /system/ops/runs, /system/ops/failures, and /system/ops/stuck as the platform-plane monitoring registry surfaces
    • /system/ops/runs/{run} as the platform-plane operation detail surface
  • Data Ownership:
    • Existing OperationRun records remain the only canonical lifecycle source of truth for queued, running, completed, stale, and automatically reconciled runs
    • Existing workspace-owned monitoring truth with optional tenant linkage remains in place; the feature does not add a second summary record, mirror lifecycle store, or notification-specific state model
    • Freshness interpretation, stale or reconciled visibility, terminal follow-up grouping, and cross-surface drill-through continuity remain derived views over existing OperationRun truth
    • No schema migration, no new persisted lifecycle state, and no enum rewrite are introduced
  • RBAC:
    • Admin-plane summary and canonical-view surfaces continue to require workspace membership, and any tenant-bound summary or run detail continues to require tenant entitlement for the referenced tenant
    • Platform-plane system surfaces continue to rely on existing system operations view and manage capabilities without broadening /system access
    • Non-members or users outside the relevant workspace or tenant scope remain 404; in-scope users lacking a capability for a guarded follow-up affordance remain 403
    • Cross-plane navigation must remain explicit and must not leak tenant truth from admin surfaces into system surfaces or vice versa

For canonical-view specs, the spec MUST define:

  • Default filter behavior when tenant-context is active: /admin/operations may continue to prefilter to the active tenant, but dashboard, attention, recent-operations, and notification drill-throughs MUST also preserve the originating problem class so operators land on the same issue family they clicked. Operators may broaden filters only within already entitled scope.
  • Explicit entitlement checks preventing cross-tenant leakage: Every admin-plane summary claim, pre-applied filter, run detail page, and related drill-through MUST resolve only after workspace membership and tenant entitlement checks against the referenced run. Reconciled, stale, terminal-failure, and follow-up states must not reveal another tenant's existence or activity to unauthorized users.

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

Surface Surface Type Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type
Tenant dashboard operations attention Embedded attention summary One explicit problem-class CTA per summary bucket forbidden none none /admin/operations /admin/operations/{run} Active tenant context and tenant-preserving destination state Operations / Operation Separate terminal issues from stale active issues Multi-bucket summary surface
Tenant dashboard recent operations Diagnostic recency table Row open to canonical operation detail required header link only none /admin/operations with tenant prefilter /admin/operations/{run} Active tenant context and tenant-scoped recent activity Operations / Operation Fresh active, likely stale, and terminal follow-up states remain distinguishable per row none
Bulk operation progress Live progress indicator Compact item open to canonical operation detail plus collection fallback compact item link only collection link only none /admin/operations with tenant prefilter /admin/operations/{run} Active tenant context and active-run-only framing Operations / Operation Only truly active or still-problematic runs remain visible Compact progress surface
Workspace operations attention Embedded attention summary One explicit problem-class CTA per summary bucket forbidden none none /admin/operations /admin/operations/{run} Workspace scope plus tenant counts where relevant Operations / Operation Separate terminal issues from stale active issues across the workspace Multi-bucket summary surface
Workspace recent operations Diagnostic recency table Row open to canonical operation detail required header link only none /admin/operations /admin/operations/{run} Workspace scope with tenant identity per row Operations / Operation Recent operations do not hide stale or reconciled truth behind generic recency language none
Operations hub Read-only Registry / Report Full-row click to canonical operation detail required filters, tabs, and header-level context only none on the list /admin/operations /admin/operations/{run} Workspace scope, tenant filter state, and problem-class filter state Operations / Operation Lifecycle truth, freshness truth, and problem class are visible before opening detail none
Canonical operation detail Detail-first operational surface Dedicated detail page forbidden detail header links only none introduced by this spec /admin/operations /admin/operations/{run} Workspace context, tenant context when relevant, and run identity Operations / Operation Decision-zone lifecycle truth and next step are visible without opening diagnostics none
System failed operations Read-only Registry / Report Full-row click to system operation detail required header CTA only none /system/ops/failures /system/ops/runs/{run} Platform scope only Operations / Operation Terminal-problem truth remains aligned with admin-plane canonical truth none
System stuck operations Read-only Registry / Report Full-row click to system operation detail required header CTA only none /system/ops/stuck /system/ops/runs/{run} Platform scope only Operations / Operation Active stale or stuck truth and reconciled visibility remain operator-legible none

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Surface Primary Persona Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Tenant dashboard operations attention Tenant operator Embedded attention summary Do I have a terminal issue to follow up or an active run that is likely stale? Separate counts or labels for terminal follow-up and stale active attention, with one matching destination each Detailed failure payloads, count internals, and infrastructure evidence problem class, urgency, tenant scope none Open terminal issues, open stale active issues none
Tenant dashboard recent operations Tenant operator Diagnostic recency table Which recent tenant operation should I inspect next? Operation label, lifecycle truth, outcome, freshness truth, and recency Failure internals, raw summary counts, extended diagnostics execution status, execution outcome, freshness none Open operation detail, open operations list none
Bulk operation progress Tenant operator Live progress indicator Is this run really still active, or has its truth changed since enqueue time? Active run identity, current visible lifecycle truth, and quick path to detail Low-level progress internals and failure metadata execution status, freshness, active visibility none Open operation detail, open operations list none
Workspace operations attention Workspace operator Embedded attention summary Which operation problem class needs workspace-level follow-up first? Separate terminal issues from stale active issues, with workspace-safe destination semantics Deep diagnostics remain on operations views problem class, urgency, workspace spread none Open terminal issues, open stale active issues none
Workspace recent operations Workspace operator Diagnostic recency table Which operation across the workspace changed meaning recently? Run identity, tenant, lifecycle truth, and recency Deeper failure and reconciliation detail remain secondary execution status, freshness, tenant scope none Open operation detail, open operations list none
Operations hub Workspace operator Read-only Registry / Report Is this run fresh active, likely stale, reconciled, or a terminal issue, and which bucket am I looking at? Explicit problem-class framing, lifecycle truth, freshness truth, outcome, tenant or workspace scope Queue internals, raw context, and extended traces execution status, execution outcome, freshness, problem class none Open operation detail, adjust filter or tab none
Canonical operation detail Workspace operator Detail-first operational surface What happened, is the run still active, was it automatically reconciled, and what do I do next? Primary decision zone with lifecycle assessment, active or not-active answer, reconciliation state, and one primary next step Raw payloads, detailed failure arrays, and artifact-deep diagnostics execution status, execution outcome, freshness, operator next action none Return to operations, open related artifact or follow-up destination none introduced by this spec
System failed operations Platform operator Read-only Registry / Report Which terminal operation issue needs platform investigation first? Terminal problem class, operation identity, workspace, tenant, and recency Deep diagnostics remain on system detail execution outcome, terminal problem class, recency none Open operation detail, show all operations none
System stuck operations Platform operator Read-only Registry / Report Which active run crossed the stuck threshold or was recently auto-reconciled for that reason? Stuck or stale class, operation identity, workspace, tenant, and recency Deep diagnostics remain on system detail freshness, lifecycle stall state, recency none Open operation detail, show all operations none

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: No
  • New persisted entity/table/artifact?: No
  • New abstraction?: No
  • New enum/state/reason family?: No
  • New cross-domain UI framework/taxonomy?: Yes, but only as a narrow derived monitoring split between terminal follow-up and active stale/stuck attention, built from existing lifecycle truth rather than new stored state
  • Current operator problem: Operators can currently see the same run framed as normal active progress on one surface, terminal or reconciled on another, invisible on system stuck surfaces, and mixed into a generic follow-up bucket elsewhere
  • Existing structure is insufficient because: Existing surfaces already have valid local logic, but their aggregation, drill-through, and attention language do not consistently tell the same operator story for stale, reconciled, and terminal problem runs
  • Narrowest correct implementation: Reuse the current OperationRun, status, outcome, freshness, and reconciliation model, then align summary buckets, filters, drill-throughs, and decision-zone emphasis across existing surfaces without adding persistence or a new lifecycle engine
  • Ownership cost: The codebase takes on shared cross-surface classification rules, copy alignment, and regression coverage to keep dashboard, recent, bulk, admin monitoring, and system monitoring semantics locked together
  • Alternative intentionally rejected: A new persisted problem-state model, an enum rewrite, a notification redesign, or a full operations architecture refactor were rejected because the present issue is truth drift between existing surfaces, not missing core domain structure
  • Release truth: Current-release truth. The feature hardens already shipped lifecycle semantics before more triage or monitoring slices depend on them

User Scenarios & Testing (mandatory)

User Story 1 - Recover The Same Truth From Every Entry Point (Priority: P1)

As an operator, I want dashboard, attention, recent-operations, monitoring, and system surfaces to describe the same run with the same problem class, so that I do not have to guess which screen is telling the truth.

Why this priority: Cross-surface truth drift is the core trust problem. If the same run reads differently across entry points, every later triage decision becomes suspect.

Independent Test: Can be fully tested by seeding fresh active, likely stale, reconciled-failed, and terminal problem runs, then verifying that tenant, workspace, canonical, and system surfaces classify the same run consistently and drill through into matching destinations.

Acceptance Scenarios:

  1. Given a run is canonically likely_stale, When an operator sees it on tenant attention, workspace attention, recent operations, the operations hub, and canonical detail, Then none of those surfaces frame it as an unremarkable normal active run.
  2. Given a run is terminal with a blocked, partial, or failed outcome, When an operator reaches it from dashboard or monitoring summaries, Then the destination confirms a terminal follow-up problem rather than an active stale issue.
  3. Given a run was automatically reconciled after becoming stale, When an operator checks admin monitoring and system monitoring surfaces, Then the stale or reconciled history remains discoverable instead of disappearing from the truth chain.

User Story 2 - Trust Live Progress Without Waiting For A New Event (Priority: P1)

As a tenant operator, I want local progress and recent-activity surfaces to stop implying that a finished or reconciled run is still active, even when no new enqueue event occurs, so that I can trust what is on screen.

Why this priority: Bulk progress and recent activity are the most immediate trust surfaces. If they lag behind canonical truth, operators see false liveness first.

Independent Test: Can be fully tested by opening active-progress and recent-operations surfaces, changing the underlying run to terminal or reconciled truth without dispatching a new enqueue event, and verifying that local surfaces update within the allowed refresh window and then stop behaving like live active surfaces.

Acceptance Scenarios:

  1. Given BulkOperationProgress is open for an active run, When the run completes or is automatically reconciled, Then the surface stops presenting it as active within the next refresh cycle even if no new enqueue event fires.
  2. Given Recent Operations is visible on a tenant or workspace surface, When a displayed run becomes likely stale or terminal, Then the row updates to the new truth instead of continuing to imply healthy progress.
  3. Given no relevant active runs remain, When the surface reaches that state, Then live refresh stops or becomes inactive instead of polling indefinitely.

User Story 3 - Decide What To Do From The Canonical Detail Surface (Priority: P2)

As an operator opening canonical run detail, I want the primary decision zone to tell me immediately whether the run is still active, likely stale, already reconciled, or terminal and what the next step is, so that I do not have to derive action from scattered diagnostics.

Why this priority: Canonical detail is the highest-trust surface. If it makes lifecycle attention secondary, summary surfaces cannot reliably inherit the right operator interpretation.

Independent Test: Can be fully tested by opening stale, reconciled, partial, failed, and healthy active runs and verifying that the decision zone makes lifecycle truth and next action visible without relying on banners or secondary panels alone.

Acceptance Scenarios:

  1. Given a run is likely stale but not yet reconciled, When the canonical detail page loads, Then the primary decision zone states that the run is still non-terminal but likely unhealthy and names the next investigation step.
  2. Given a run has already been automatically reconciled, When the canonical detail page loads, Then the primary decision zone states that the run is no longer active, that reconciliation already happened, and what follow-up is appropriate.
  3. Given a run type has deeper artifact truth, When the canonical detail page loads, Then lifecycle truth and next action remain visible before artifact-deep diagnostics.

User Story 4 - Preserve Problem-Class Continuity In System And Notification Entry Points (Priority: P3)

As a system or workspace operator, I want notifications and platform monitoring entry points to confirm the same problem class that brought me there, so that I never land on a calmer or differently framed destination than the one I clicked.

Why this priority: Link continuity is where trust drift becomes obvious. If the destination tells a different story, operators stop trusting the product's routing and labels.

Independent Test: Can be fully tested by navigating from dashboard KPIs, attention items, recent operations, and operation notifications into admin and system monitoring destinations, then verifying that the originating problem class is visible and recoverable on arrival.

Acceptance Scenarios:

  1. Given a notification frames a run as needing terminal follow-up, When the operator opens the linked destination, Then the destination visibly confirms that terminal-problem framing.
  2. Given a dashboard or workspace attention link frames a run as stale active attention, When the operator opens the monitoring destination, Then the destination visibly confirms the stale active problem class instead of a generic mixed bucket.

Edge Cases

  • A run may move from likely_stale to reconciled_failed while an operator keeps a local progress surface open; the UI must not continue showing healthy activity after reconciliation.
  • A run may be removed from the active stuck list after reconciliation; the system truth chain must still expose that it was recently stale or auto-reconciled rather than making the issue disappear.
  • A run may be terminal with a poor outcome and also belong to an artifact-heavy domain; the page must not bury the lifecycle answer behind artifact diagnostics.
  • A tenant-scoped summary may link into the canonical operations hub while tenant context is stale or absent; the destination must preserve the correct tenant-safe problem filter or fall back to workspace-safe scope without changing the run's problem class.
  • Notifications may be generated before a stale run is later reconciled; entry-point language and destinations must not stay calmer than the current run truth.
  • Run types may differ in artifact richness, but none may diverge on the base question of fresh active, likely stale, reconciled, or terminal follow-up.

Requirements (mandatory)

Constitution alignment (required): This feature introduces no new Microsoft Graph calls, no new write workflow, no new queued operation type, and no new persisted operations record. It hardens the truth alignment of existing operations and monitoring surfaces over existing OperationRun, freshness, and reconciliation semantics.

Constitution alignment (PROP-001 / ABSTR-001 / PERSIST-001 / STATE-001 / BLOAT-001): This feature stays deliberately narrow. It adds no new persistence, no new lifecycle table, no new orchestration layer, and no new enum family. The only new semantic split is a derived operator-facing distinction between terminal follow-up and active stale or stuck attention, built from existing status, outcome, freshness, and reconciliation truth.

Constitution alignment (OPS-UX): Existing OperationRun records remain subject to the three-surface feedback contract. Toasts remain intent-only. Active awareness remains on allowed progress and monitoring surfaces only. Terminal state transitions remain service-owned. This feature may change how active progress surfaces refresh and how summaries classify runs, but it must not add ad-hoc status mutation or a second terminal lifecycle model. Summary counts remain numeric-only and scheduled or system-run notification rules remain unchanged. Regression coverage MUST prove progress freshness, truth alignment, and reconciled visibility without reintroducing direct state mutation on render surfaces.

Constitution alignment (RBAC-UX): This feature spans the admin plane and the platform plane. Admin-plane tenant and workspace surfaces continue to use deny-as-not-found for non-members or non-entitled users, and canonical operation routes continue to authorize from workspace and tenant entitlement before revealing run truth. Platform-plane system monitoring continues to rely on platform capability checks. The feature adds no new mutation, no new destructive action, and no cross-plane bypass. Any in-scope destination affordance that is visible but capability-gated must remain helper-texted or disabled rather than turning into a misleading dead-end link.

Constitution alignment (OPS-EX-AUTH-001): Not applicable. Authentication handshake exceptions remain unrelated to operations monitoring and cannot be used to justify stale or reconciled truth drift.

Constitution alignment (BADGE-001): Existing centralized semantics for operation status, outcome, freshness, and related attention labels remain authoritative. The feature MUST not allow dashboard widgets, recent-operation surfaces, operations hub rows, or notifications to invent page-local meanings for stale, reconciled, blocked, partial, or failed states.

Constitution alignment (UI-FIL-001): The feature reuses existing Filament widgets, tables, detail sections, alerts, tabs, and shared UI primitives. It should strengthen semantic emphasis through existing components and shared mappings, not through page-local markup or a new local status language.

Constitution alignment (UI-NAMING-001): The target objects are operations summary buckets, operation rows, run detail labels, and notification or entry-point copy. Needs follow-up may remain as an umbrella concept, but operator-facing copy MUST differentiate the two problem classes it currently mixes: terminal follow-up and active stale or stuck attention. Copy MUST not use a generic blocked or needs follow-up label for a mixed bucket unless the visible sub-class is also made explicit.

Constitution alignment (UI-CONST-001 / UI-SURF-001 / UI-HARD-001 / UI-EX-001 / UI-REVIEW-001): Each changed surface keeps one primary inspect or drill-through model. Attention summaries use explicit problem-class destinations. Recent-operation tables keep row-click inspection. The operations hub remains a scan-first registry with explicit problem-class filtering. The canonical detail page remains the highest-trust detail surface. System failed and system stuck lists remain row-click-only registry surfaces. No new destructive action is introduced, and no exception to the action-surface contract is required.

Constitution alignment (OPSURF-001): Default-visible content must stay operator-first. Summary surfaces answer whether the operator is dealing with terminal follow-up or active stale attention. The operations hub answers which bucket the operator is in and what it means. Canonical detail answers what happened, whether the run is still active, and what to do next before showing diagnostics. System surfaces answer which platform-visible failure or stuck class is being surfaced without requiring the operator to infer it from raw context.

Constitution alignment (UI-SEM-001 / LAYER-001 / TEST-TRUTH-001): Direct mapping from canonical run truth to UI remains preferred. The feature may add a thin derived problem-class split, but it must not create redundant truth across persisted records, presenters, summaries, notifications, and system surfaces. Tests MUST focus on operator-visible consequences: whether the same run tells the same story across surfaces and whether drill-through preserves that story.

Constitution alignment (Filament Action Surfaces): The Action Surface Contract remains satisfied. No new View actions, no empty action groups, and no list-level destructive controls are introduced. Changed dashboard and monitoring surfaces remain inspection or drill-through surfaces only. UI-FIL-001 remains satisfied with no exemption.

Constitution alignment (UX-001 — Layout & Information Architecture): The canonical run detail page keeps one primary decision zone and must elevate stale or reconciled lifecycle truth inside that decision zone rather than only in side banners or lower sections. Summary surfaces keep operator priority order: problem class first, recency and diagnostics second. Existing tables continue to support search, sort, and filtering on core lifecycle dimensions.

Functional Requirements

  • FR-178-001: The system MUST treat canonical OperationRun lifecycle and freshness truth as authoritative for every summary, list, detail, and notification surface covered by this feature.
  • FR-178-002: The same run MUST NOT appear as fresh normal activity on one covered surface and as likely stale, reconciled, or terminal problem truth on another covered surface at the same time.
  • FR-178-003: Covered admin and system monitoring surfaces MUST use one shared derived lifecycle interpretation that distinguishes at least fresh_active, likely_stale, reconciled_failed, and terminal_normal without introducing a new persisted state model.
  • FR-178-004: Reconciliation behavior and system stuck monitoring MUST remain semantically aligned so stale runs do not disappear from operator truth once they are auto-reconciled.
  • FR-178-005: Automatically reconciled stale runs MUST remain semantically discoverable for operators on admin monitoring or system monitoring surfaces within one navigation step.
  • FR-178-006: Bulk operation progress surfaces MUST refresh while relevant active runs exist and MUST stop presenting a run as active once canonical truth becomes terminal or reconciled.
  • FR-178-007: Bulk operation progress surfaces MUST remove or reclassify terminal or reconciled runs within one refresh cycle even when no new enqueue event occurs.
  • FR-178-008: Recent Operations surfaces on tenant and workspace pages MUST distinguish fresh active runs, likely stale active runs, and terminal follow-up runs rather than flattening them into generic recency.
  • FR-178-009: Tenant and workspace attention surfaces MUST separate terminal follow-up from active stale or stuck attention instead of mixing them into one undifferentiated bucket.
  • FR-178-010: The operations hub MUST expose an explicit monitoring view, filter, or tab for active but likely stale runs and an explicit view, filter, or tab for terminal follow-up runs.
  • FR-178-011: Dashboard, attention, KPI, and recent-operation drill-throughs into the operations hub MUST preserve the originating problem class in visible destination framing.
  • FR-178-012: The canonical run detail page MUST present stale, reconciled, and terminal-problem lifecycle truth inside the primary decision zone rather than only in secondary banners, side panels, or lower diagnostic sections.
  • FR-178-013: For likely stale and reconciled runs, the primary decision zone MUST answer whether the run is still active, whether automatic reconciliation already happened, and what the primary next step is.
  • FR-178-014: Local summary and progress surfaces MUST reuse centralized status, outcome, freshness, and problem-class semantics rather than page-local mappings.
  • FR-178-015: Notification and in-app entry-point language MUST NOT frame a run more calmly than its current lifecycle or freshness truth.
  • FR-178-016: Cross-links from dashboard KPIs, attention surfaces, recent operations, and notifications MUST land on destination surfaces that visibly confirm the same problem class that initiated the navigation.
  • FR-178-017: The feature MUST use the existing OperationRun, status, outcome, freshness, and reconciliation model without introducing a schema migration, a new persisted lifecycle artifact, or an enum rewrite.
  • FR-178-018: Run-type differences MAY preserve deeper artifact truth, but they MUST NOT change the base lifecycle answers of fresh active, likely stale, reconciled, or terminal follow-up.
  • FR-178-019: Regression coverage MUST prove that the same seeded runs are classified consistently across tenant dashboard, workspace overview, operations hub, canonical run detail, and system failed or stuck surfaces.
  • FR-178-020: Regression coverage MUST prove bulk-progress freshness, reconciliation visibility, drill-through continuity, and decision-zone emphasis for stale or reconciled runs.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
Tenant dashboard operations attention /admin/t/{tenant} dashboard none Explicit problem-class CTA per bucket none none Existing healthy fallback remains read-only reassurance only when no operations issue exists n/a n/a no new audit behavior Summary surface only; must not render mixed problem buckets
Tenant dashboard recent operations /admin/t/{tenant} dashboard Open operations Row click to canonical operation detail none none Existing diagnostic empty state remains non-primary n/a n/a no new audit behavior Recency surface; no destructive actions
Workspace operations attention /admin workspace overview none Explicit problem-class CTA per bucket none none Existing healthy fallback remains read-only reassurance only when no operations issue exists n/a n/a no new audit behavior Summary surface only; must not render mixed problem buckets
Workspace recent operations /admin workspace overview Open operations Row click to canonical operation detail none none Existing diagnostic empty state remains non-primary n/a n/a no new audit behavior Recency surface; no destructive actions
Operations hub /admin/operations Filter or tab controls only; no new destructive actions Full-row click to canonical operation detail none none Existing empty state remains explanatory and filter-aware n/a n/a no new audit behavior Scan-first registry surface; problem-class filters must align with summary entry points
Canonical operation detail /admin/operations/{run} Back to operations plus existing related navigation only n/a n/a n/a n/a Existing related navigation only; no new destructive action introduced by this spec n/a no new audit behavior Decision-zone truth is the hardening target
System failed operations /system/ops/failures Show all operations Full-row click to system operation detail none none Show all operations n/a n/a no new audit behavior Must confirm terminal-problem semantics, not generic follow-up
System stuck operations /system/ops/stuck Show all operations Full-row click to system operation detail none none Show all operations n/a n/a no new audit behavior Must preserve stale or reconciled visibility for platform operators

Key Entities (include if feature involves data)

  • Operation Run: The canonical operational record whose status, outcome, freshness, and reconciliation context define the authoritative lifecycle truth.
  • Freshness State: The derived lifecycle interpretation that distinguishes fresh active work, likely stale work, reconciled failure, and normal terminal completion without adding new persisted state.
  • Problem Class: The operator-facing split between terminal follow-up and active stale or stuck attention, derived from existing lifecycle truth and used to align summary surfaces and drill-throughs.
  • Drill-through Contract: The promise that a summary count, notification, or attention label can be visibly rediscovered on the destination surface it opens.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-178-001: In covered regression scenarios, 100% of runs seeded as likely_stale are shown as stale or otherwise problematic on every covered summary and monitoring surface, and 0 are shown as unremarkable fresh activity.
  • SC-178-002: In covered regression scenarios, 100% of automatically reconciled stale runs remain semantically recoverable for operators through admin monitoring or system monitoring within one navigation step.
  • SC-178-003: In covered freshness regression scenarios, local progress surfaces stop showing terminal or reconciled runs as active within one refresh cycle and without requiring a new enqueue event.
  • SC-178-004: In covered navigation regression scenarios, 100% of dashboard, attention, recent-operation, and notification drill-throughs land on destinations whose visible framing matches the originating problem class.
  • SC-178-005: In operator review on seeded scenarios, an operator can determine within 10 seconds whether the run is fresh active, likely stale, reconciled, or terminal follow-up from every covered entry surface.
  • SC-178-006: The feature ships without a schema migration, a new persisted lifecycle artifact, or a new status or outcome family.

Assumptions

  • Existing lifecycle freshness and reconciliation semantics from the operation lifecycle guarantees work remain the authoritative base truth for this hardening slice.
  • Existing run-detail decision-zone structure remains the correct place to elevate stale and reconciled lifecycle truth.
  • Existing tenant and workspace dashboard truth alignment work remains the baseline grammar for admin-plane summary surfaces.
  • Existing system operations surface alignment remains the baseline interaction model for /system/ops/failures and /system/ops/stuck.

Non-Goals

  • Introducing tenant-admin retry or cancel capabilities
  • Rebuilding the operations domain, run schema, or lifecycle engine
  • Adding a new persisted problem-state model, enum rewrite, or schema migration
  • Redesigning all notification behavior across the product
  • Performing deep non-governance result-quality analysis for every run type
  • Replacing run-type-specific artifact truth with a uniform artifact model

Dependencies

  • Existing operations auto-refresh behavior and active-run polling patterns
  • Existing operation lifecycle guarantees, freshness thresholds, and reconciliation behavior
  • Existing canonical run detail hierarchy and decision-zone structure
  • Existing tenant dashboard and workspace overview truth-alignment semantics
  • Existing system operations surface alignment for row-click-only platform monitoring pages

Risks

  • If stale thresholds are too aggressive, legitimate long-running work could be surfaced as stale too early.
  • If summary and monitoring surfaces share labels but not the same underlying filter meaning, operators will continue to mistrust drill-throughs.
  • If reconciled stale visibility is over-corrected without hierarchy, system surfaces could become noisy instead of trustworthy.
  • If local progress polling is too eager, the product could gain freshness at the cost of unnecessary load.

Definition of Done

Spec 178 is complete when:

  • BulkOperationProgress no longer leaves trust-damaging stale residue that keeps terminal or reconciled runs looking active.
  • stale or stuck semantics are consistent between lifecycle reconciliation, tenant and workspace summaries, the operations hub, canonical run detail, and system stuck or failure surfaces.
  • tenant and workspace summary surfaces visibly separate terminal problem runs from active stale or stuck runs.
  • the operations hub no longer distorts dashboard semantics through mixed or misleading tabs, filters, or bucket names.
  • the canonical run detail page prioritizes stale or reconciled lifecycle truth inside the primary decision zone.
  • cross-surface links preserve the same operator-visible problem class from origin to destination.
  • focused regression coverage proves truth alignment, stale visibility, drill-through continuity, and progress freshness.

Summary

This feature is a late-foundation hardening slice for the operations domain. The underlying lifecycle model is already strong: OperationRun is canonical, status and outcome are separated, stale reconciliation exists, system stuck surfaces exist, and canonical run detail already owns the deepest operational truth. The remaining problem is not missing architecture; it is trust drift between surfaces that summarize or relabel that truth.

Spec 178 closes that gap by making every covered surface tell the same story about whether a run is still active, likely stale, already reconciled, or terminal and in need of follow-up. It keeps the model narrow by reusing existing lifecycle and freshness truth, then aligning summaries, live progress, drill-throughs, and decision-zone emphasis so operators do not have to reconcile conflicting screens by hand.