# Research: Operation Run Active-State Visibility & Stale Escalation ## Decision 1: Keep lifecycle freshness truth in the existing run model and reconciler - **Decision**: Use `OperationRunFreshnessState`, `OperationRun::freshnessState()`, `OperationRun::problemClass()`, and `OperationLifecycleReconciler` as the only lifecycle-truth inputs for this feature. - **Rationale**: The application already computes `fresh_active`, `likely_stale`, `reconciled_failed`, `terminal_normal`, and `unknown` from the run record plus `OperationLifecyclePolicy`. Canonical monitoring surfaces already rely on that truth, so adding a second stale heuristic would immediately recreate the drift this spec is trying to remove. - **Alternatives considered**: - Add new `OperationRun.status` values such as `stale` or `late`: rejected because the distinction is presentation and triage-oriented, not a new persisted lifecycle state. - Add page-local thresholds per widget: rejected because it would create conflicting meaning across tenant, workspace, and canonical monitoring surfaces. ## Decision 2: Reuse the existing Ops UX presenter path before introducing a new helper - **Decision**: Prefer `OperationUxPresenter::decisionZoneTruth()`, `lifecycleAttentionSummary()`, `surfaceGuidance()`, and centralized badge rendering as the presentation backbone. - **Rationale**: The code already exposes a derived decision-zone payload and shared stale/reconciled copy. `OperationRunStatusBadge` already renders `Likely stale` when queued/running work carries `freshness_state=likely_stale`, and `OperationUxPresenter` already provides compact and diagnostic explanations off the same truth. - **Alternatives considered**: - New dedicated presenter family for active-state visibility: rejected unless the existing presenter path proves insufficient during implementation. - Widget-local copy branches: rejected because they would increase semantic spread and regression risk. ## Decision 3: Treat stale-active runs as still active for tenant progress visibility - **Decision**: Change tenant-local active-progress visibility to include freshness-elevated active runs rather than suppressing them via `healthyActive()`. - **Rationale**: `BulkOperationProgress` and `ActiveRuns::existForTenantId()` previously used `healthyActive()`, which caused stale queued/running work to disappear from the tenant progress overlay and stopped polling when only stale runs remained. That was the clearest concrete contradiction with the canonical monitoring surfaces. - **Alternatives considered**: - Keep stale runs hidden in the progress overlay and rely on dashboard/list only: rejected because the spec explicitly covers tenant-local active-run cards and progress summaries. - Add a separate stale-only overlay: rejected because it would create a second active-work surface family instead of fixing the existing one. ## Decision 4: Preserve current surface roles and drill-through flow - **Decision**: Keep the current route and surface model: tenant dashboard and tenant progress remain secondary context, `/admin/operations` remains the primary triage list, and `/admin/operations/{run}` remains diagnostic-first. - **Rationale**: Existing links already converge through `OperationRunLinks`, and current pages/widgets match the constitution's decision-first model. The gap is the honesty of compact active-state messaging, not missing routes. - **Alternatives considered**: - New operations hub or new tenant-local detail page: rejected as unnecessary workflow expansion. - New notification channel for stale active work: rejected because the spec explicitly excludes new notification behavior. ## Decision 5: Extend existing focused tests and invert stale-hidden assumptions where necessary - **Decision**: Update existing monitoring, Filament, and Ops UX tests rather than creating a new broad suite. - **Rationale**: The repository already has focused coverage for lifecycle presentation and tenant progress behavior. In particular, `BulkOperationProgressDbOnlyTest` and `ProgressWidgetFiltersTest` currently codify the stale-hidden behavior that this feature must deliberately replace. - **Alternatives considered**: - Add a brand-new browser suite: rejected because feature tests already cover the underlying business truth and UI copy. - Leave old progress-widget tests untouched and add parallel tests: rejected because the old assertions would preserve the wrong contract. ## Decision 6: Keep “past expected lifecycle” and “likely stale” as density-specific labels over the same stale truth - **Decision**: Model compact “past expected lifecycle” phrasing and stronger “likely stale” diagnostic phrasing as different density outputs over the same `likely_stale` freshness truth rather than as separate persisted states. - **Rationale**: The spec allows same meaning, different density. The current code already points in that direction: `OperationUxPresenter::surfaceGuidance()` says the run is “past its lifecycle window,” while `OperationRunStatusBadge` can label the same run `Likely stale`. - **Alternatives considered**: - Create two separate freshness states for “late” and “likely stale”: rejected because existing lifecycle truth has only one stale boundary and no additional behavioral consequence. - Collapse all stale-active copy to a single label everywhere: rejected because compact surfaces and canonical detail need different density without changing meaning.