ahmido 1142d283eb feat: Spec 178 — Operations Lifecycle Alignment & Cross-Surface Truth Consistency (#209 )

## Spec 178 — Operations Lifecycle Alignment & Cross-Surface Truth Consistency

Härtet die Run-Lifecycle-Wahrheit und Cross-Surface-Konsistenz über alle zentralen Operator-Flächen hinweg.

### Kern-Änderungen

**Lifecycle Truth Alignment**
- Einheitliche stale/stuck-Semantik zwischen Tenant-, Workspace-, Admin- und System-Surfaces
- `OperationRunFreshnessState` wird konsistent über alle Widgets und Seiten propagiert
- Gemeinsame Problem-Klassen-Trennung: `terminal_follow_up` vs. `active_stale_attention`

**BulkOperationProgress Freshness**
- Overlay zeigt nur noch `healthyActive()` Runs statt alle aktiven Runs
- Likely-stale Runs halten das Polling nicht mehr künstlich aktiv
- Terminal Runs verschwinden zeitnah aus dem Progress-Overlay

**Decision Zone im Run Detail**
- Stale/reconciled Attention in der primären Decision-Hierarchie
- Klare Antworten: aktiv? stale? reconciled? nächster Schritt?
- Artifact-reiche Runs behalten Lifecycle-Truth vor Deep-Diagnostics

**Cross-Surface Link-Continuity**
- Dashboard → Operations Hub → Run Detail erzählen dieselbe Geschichte
- Notifications referenzieren korrekte Problem-Klasse
- Workspace/Tenant-Attention verlinken problemklassengerecht

**System-Plane Fixes**
- `/system/ops/failures` 500-Error behoben (panel-sichere Artifact-URLs)
- System-Stuck/Failures zeigen reconciled stale lineage

### Weitere Fixes
- Inventory auth guard bereinigt (Gate statt ad-hoc Facades)
- Browser-Smoke-Tests stabilisiert (DOM-Assertions statt fragile Klicks)
- Test-Assertion-Drift für Verification/Lifecycle-Texte korrigiert

### Test-Ergebnis
Full Suite: **3269 passed**, 8 skipped, 0 failed

### Spec-Artefakte
- `specs/178-ops-truth-alignment/spec.md`
- `specs/178-ops-truth-alignment/plan.md`
- `specs/178-ops-truth-alignment/tasks.md`
- `specs/178-ops-truth-alignment/research.md`
- `specs/178-ops-truth-alignment/data-model.md`
- `specs/178-ops-truth-alignment/quickstart.md`
- `specs/178-ops-truth-alignment/contracts/operations-truth-alignment.openapi.yaml`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #209

2026-04-05 22:42:24 +00:00

11 KiB

Raw Blame History

Phase 1 Data Model: Operations Lifecycle Alignment & Cross-Surface Truth Consistency

Overview

This feature does not add a table, persisted summary entity, or new lifecycle state machine. It aligns existing OperationRun truth and existing freshness/reconciliation semantics across multiple operator surfaces by introducing a small set of derived cross-surface contracts.

The central rule is unchanged: OperationRun is the only canonical lifecycle source of truth. Everything else in this slice is derived from status, outcome, freshness, and reconciliation metadata already present in the repo.

Persistent Source Truths

OperationRun

Purpose: Canonical operational record for queued, running, completed, stale, and automatically reconciled work shown across tenant, workspace, canonical admin, and system monitoring surfaces.

Key fields:

id
workspace_id
tenant_id
type
status
outcome
initiator_name
summary_counts
failure_summary
context
created_at
started_at
completed_at

Validation rules:

status and outcome remain service-owned; this feature must not introduce page-local lifecycle mutation.
Every covered summary, list, detail, and notification surface must derive from the same run record and the same underlying lifecycle fields.
The feature must not add a second persisted problem-state field or a new lifecycle table.

Reconciliation Context (within `OperationRun.context`)

Purpose: Existing stored lineage that indicates a run was automatically reconciled after lifecycle drift.

Expected fields:

reconciled_at
reason
reason_code
source

Validation rules:

Reconciliation context remains nested under context.reconciliation; this feature does not move it or normalize it into a new table.
If reconciliation context exists, surfaces must preserve that stale/reconciled lineage within one navigation step of the canonical run truth.

Existing Runtime Source Objects

OperationRunFreshnessState

Purpose: Existing derived lifecycle interpretation for one run.

Cases:

fresh_active
likely_stale
reconciled_failed
terminal_normal
unknown

Validation rules:

This remains the canonical freshness interpretation for the slice.
Spec 178 must not introduce a parallel freshness-state family.

OperationLifecyclePolicy

Purpose: Existing threshold policy for deciding when queued/running work becomes stale.

Consumed fields:

covered operation types
queued stale threshold
running stale threshold

Validation rules:

Tenant, workspace, admin, and system stale or stuck semantics must all derive from the same underlying lifecycle-policy thresholds.

StuckRunClassifier

Purpose: Existing system-panel classifier for active queued/running runs that crossed the stuck threshold.

Validation rules:

Stuck remains an active-stale registry surface.
System visibility for reconciled stale runs must be preserved through adjacent system surfaces rather than a new classifier or new stored state.

OperationUxPresenter

Purpose: Existing operator-facing seam for guidance, notification wording, and run presentation.

Validation rules:

New stale/reconciled wording should flow through this seam or a compatible existing presentation seam rather than widget-local strings.

Derived Cross-Surface Contracts

ProblemClassContract

Purpose: The thin operator-facing split used to align summary buckets, monitoring filters, local progress removal rules, and entry-point wording.

Field	Type	Required	Description
`freshnessState`	enum	yes	Existing `OperationRunFreshnessState` value
`problemClass`	enum	yes	`none`, `active_stale_attention`, or `terminal_follow_up`
`staleLineage`	boolean	yes	Whether the run is terminal but carries stale/reconciled history
`isCurrentlyActive`	boolean	yes	Whether the run should still be treated as actively executing
`requiresOperatorReview`	boolean	yes	Whether the surface should escalate the run into an attention/follow-up bucket

Validation rules:

freshnessState = fresh_active ⇒ problemClass = none, isCurrentlyActive = true
freshnessState = likely_stale ⇒ problemClass = active_stale_attention, isCurrentlyActive = true, requiresOperatorReview = true
freshnessState = reconciled_failed ⇒ problemClass = terminal_follow_up, staleLineage = true, isCurrentlyActive = false, requiresOperatorReview = true
freshnessState = terminal_normal and outcome in {blocked, partially_succeeded, failed} ⇒ problemClass = terminal_follow_up, staleLineage = false
freshnessState = terminal_normal and healthy terminal outcome ⇒ problemClass = none

OperationsAttentionBucket

Purpose: Shared contract for tenant/workspace operations attention summaries.

Field	Type	Required	Description
`problemClass`	enum	yes	`terminal_follow_up` or `active_stale_attention`
`count`	integer	yes	Bucket size
`label`	string	yes	Operator-facing bucket label
`destination`	object	yes	Canonical operations route plus tenant/workspace-safe filter state
`emptyAllowed`	boolean	yes	Whether the bucket may be hidden when count is zero

Validation rules:

Attention surfaces must not mix both problem classes into one undifferentiated bucket.
Each bucket exposes one destination only.

OperationsHubFilterState

Purpose: Structured state needed to keep /admin/operations semantically continuous when opened from a summary or notification.

Field	Type	Required	Description
`workspace_id`	integer	yes	Existing workspace scope
`tenant_id`	integer nullable	no	Active tenant filter when applicable
`problemClass`	enum	yes	`terminal_follow_up`, `active_stale_attention`, or `all`
`activeTab`	string nullable	no	Existing or extended visible tab/filter state
`navigationContext`	string nullable	no	Existing canonical back-link or page-context state

Validation rules:

Links opened because of stale active attention must land in a visibly stale-active view, not a mixed generic bucket.
Links opened because of terminal follow-up must land in a visibly terminal-problem view.

RecentOperationRowTruth

Purpose: Shared row-level contract for tenant/workspace recent-operation tables and admin/system list rendering.

Field	Type	Required	Description
`runId`	integer	yes	Canonical run identifier
`status`	string	yes	Existing lifecycle status
`outcome`	string	yes	Existing execution outcome
`freshnessState`	enum	yes	Existing freshness interpretation
`problemClass`	enum	yes	Derived attention class
`staleLineage`	boolean	yes	Whether the row should visibly indicate reconciled stale history
`guidance`	string nullable	no	Short operator-facing row hint
`destination`	object	yes	Canonical detail URL plus optional collection URL

Validation rules:

A recent-operations row must not visually imply healthy active progress when problemClass = active_stale_attention or staleLineage = true.
Terminal/reconciled rows must remain distinguishable from healthy completed rows.

BulkOperationProgressSnapshot

Purpose: Active-only overlay contract for local progress rendering.

Field	Type	Required	Description
`runId`	integer	yes	Run being shown
`tenantId`	integer	yes	Tenant scope for the overlay
`freshnessState`	enum	yes	Existing freshness interpretation
`displayAsActive`	boolean	yes	Whether the run should remain visible in the overlay
`shouldPoll`	boolean	yes	Whether the component should continue polling
`overflowCount`	integer	yes	Existing overflow behavior

Validation rules:

Only currently active runs may remain in the overlay.
If a run becomes terminal or reconciled, displayAsActive must become false within one refresh cycle.
shouldPoll must become false when no relevant active runs remain.

RunDecisionZoneTruth

Purpose: Canonical detail contract for what the operator should learn first.

Field	Type	Required	Description
`freshnessState`	enum	yes	Existing freshness interpretation
`problemClass`	enum	yes	Derived attention class
`isCurrentlyActive`	boolean	yes	Plain answer to “is the run still active?”
`isReconciled`	boolean	yes	Whether automatic reconciliation already happened
`staleLineageNote`	string nullable	no	Visible explanation when terminal truth came from stale reconciliation
`primaryNextAction`	string	yes	First follow-up step the operator should take

Validation rules:

For likely_stale and reconciled_failed, this contract must be visible in the primary decision hierarchy rather than only in diagnostics.
primaryNextAction must differ by problem class: infrastructure investigation for stale-active, follow-up/retry/artifact review for terminal problems.

NotificationTruthPayload

Purpose: Minimal contract for completed notification wording and link continuity.

Field	Type	Required	Description
`title`	string	yes	Operator-facing summary title
`problemClass`	enum	yes	Derived attention class
`staleLineage`	boolean	yes	Whether stale/reconciled history must be visible in wording
`runUrl`	string	yes	Canonical run destination

Validation rules:

Notification wording must not be calmer than the current run truth.
If staleLineage = true, the notification must preserve that lineage in title or body without requiring the operator to infer it later.

Relationships

One OperationRun yields one OperationRunFreshnessState and one derived ProblemClassContract.
Tenant/workspace attention buckets, recent-operation rows, admin monitoring filters, system monitoring lists, canonical detail, and completed notifications all consume the same derived problem-class contract.
BulkOperationProgressSnapshot is a specialized active-only view over the same run truth.
RunDecisionZoneTruth is the highest-trust detailed interpretation of the same run truth and should confirm the same problem class visible on summary surfaces.

Lifecycle Notes

OperationRun remains the single persisted source of lifecycle truth.
Freshness is derived by the existing lifecycle policy and freshness-state enum.
Problem class is derived from freshness plus terminal outcome; it is not stored.
Summary surfaces consume the derived problem class in separate buckets.
The canonical operations hub consumes the derived problem class as visible filter state.
Local progress consumes the same truth but removes terminal/reconciled runs because it is active-only.
Canonical detail and notifications confirm the same truth with stronger operator guidance.

11 KiB Raw Blame History

Phase 1 Data Model: Operations Lifecycle Alignment & Cross-Surface Truth Consistency

Overview

Persistent Source Truths

OperationRun

Reconciliation Context (within OperationRun.context)

Existing Runtime Source Objects

OperationRunFreshnessState

OperationLifecyclePolicy

StuckRunClassifier

OperationUxPresenter

Derived Cross-Surface Contracts

ProblemClassContract

OperationsAttentionBucket

OperationsHubFilterState

RecentOperationRowTruth

BulkOperationProgressSnapshot

RunDecisionZoneTruth

NotificationTruthPayload

Relationships

Lifecycle Notes

11 KiB

Raw Blame History

Reconciliation Context (within `OperationRun.context`)