TenantAtlas/specs/367-operationrun-actionability-system/spec.md
ahmido 564da05096 feat: implement operation run actionability system (#439)
This PR introduces the Operation Run Actionability System.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #439
2026-06-08 13:34:25 +00:00

42 KiB

Feature Specification: OperationRun Actionability System v1

Feature Branch: 367-operationrun-actionability-system
Created: 2026-06-08
Status: Draft / Ready for implementation
Input: User-provided Spec 367 draft: separate historical terminal OperationRun truth from current UI follow-up truth.

Spec Candidate Check (mandatory - SPEC-GATE-001)

  • Problem: Historical terminal problem runs are currently treated as today's operator follow-up truth. A resolved old provider.connection.check blocker can still produce dashboard warnings and provider CTAs after later success or current healthy provider state.
  • Today's failure: Operators can be sent from Dashboard to Provider Connections for an old provider_consent_missing run even when the Provider Connection page now shows consent_status=granted and verification_status=healthy. This creates a loop between false dashboard attention and correct domain state.
  • User-visible improvement: Dashboard, Operations hub, shell active-run hints, baseline widgets, and primary CTAs will use one current actionability decision instead of raw historical terminal status. Historical failures remain visible in Operations history, but only current actionable runs drive follow-up counts and CTAs.
  • Smallest enterprise-capable version: Add a derived, non-persisted OperationRun actionability layer that classifies known terminal OperationRun types, handles the provider CTA loop, supports superseded repeatable operations, keeps high-risk restore/promotion/purge runs manual-review by default, and migrates current UI follow-up consumers from terminalFollowUp() / dashboardNeedsFollowUp() to actionability.
  • Explicit non-goals: No legacy data migration, no rewriting historical runs, no manual acknowledge/resolve UI, no new Operations tab system for Resolved/Historical, no notification redesign, no alert delivery UX, no new Provider Connection feature, no restore/backup feature expansion, no destructive actions, no global search enablement for OperationRunResource, no panel/provider/asset/theme changes.
  • Permanent complexity imported: One derived status/value object family, one central resolver/registry/policy layer, correlation helpers only where existing context is insufficient, guard tests for operation-type coverage and direct UI use of historical terminal-follow-up scopes, and focused Unit/Feature/Browser coverage.
  • Why now: Specs 358-365 made OperationRun execution, reconciliation, links, and operator actions more mature, but repo truth still exposes raw terminal follow-up via OperationRun::terminalFollowUp(), dashboardNeedsFollowUp(), problemClass(), NeedsAttention, BaselineCompareNow, and Operations filters.
  • Why not local: Fixing the Provider Connections special case in one widget would leave other repeatable runs and other consumers with the same historical-vs-current truth bug. The repo already has multiple concrete consumers, so the boundary must be central and testable.
  • Approval class: Core Enterprise
  • Red flags triggered: New status axis, resolver/registry layer, multiple UI consumers. Defense: the layer is derived-only, persistence-free, anchored to a confirmed operator loop, reuses existing OperationCatalog, OperationRunActionEligibility, OperationRunLinks, and reconciliation helpers, and directly prevents false governance/operations CTAs.
  • Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexitaet: 1 | Produktnaehe: 2 | Wiederverwendung: 2 | Gesamt: 11/12
  • Decision: approve as a bounded current-actionability truth slice.

Repo Truth Reconciliation

The user draft is accepted as the candidate, with these repo-based scope corrections:

  1. OperationRunActionEligibility already exists and is consumed by Operations list/detail action surfaces. Spec 367 must extend or feed that path; it must not create a parallel action eligibility framework.
  2. OperationCatalog is the canonical operation-type inventory and alias resolver. Actionability coverage must compare against that catalog and known provider/reconciliation types instead of inventing a second source of operation-type truth.
  3. OperationRunReconciliationRegistry and adapters already resolve stale active run proof for several families. Actionability must separate terminal current-follow-up truth from active stale reconciliation truth while reusing existing reconciliation evidence where useful.
  4. OperationRun::terminalFollowUp(), dashboardNeedsFollowUp(), problemClass(), requiresOperatorReview(), and requiresDashboardFollowUp() are existing historical/problem helpers. This spec may deprecate or constrain them but must migrate UI consumers before any removal.
  5. The current canonical Operations routes are workspace-scoped: /admin/workspaces/{workspace}/operations and /admin/workspaces/{workspace}/operations/{run}. OperationRunResource remains globally non-searchable.

Completed-Spec Guardrail

Related specs are context only and are not modified by this prep package:

  • specs/358-operationrun-queue-truth-foundation/
  • specs/359-operationrun-reconciliation-adapter-framework-review-compose-adapter/
  • specs/360-operationrun-canonical-cutover-cleanup/
  • specs/361-report-evidence-reconciliation/
  • specs/362-sync-capture-backup-operation-semantics/
  • specs/363-explicit-uiactioncontext-contract/ (implemented)
  • specs/364-restore-high-risk-operation-reconciliation/
  • specs/365-operations-ui-operator-actions-regression-gate/ (implementation close-out signals)

Spec 367 is a new package because these predecessors do not fully define current terminal actionability or migrate all dashboard/current-follow-up consumers away from historical terminal status.

Spec Scope Fields (mandatory)

  • Scope: workspace, tenant, canonical-view
  • Primary Routes:
    • /admin/workspaces/{workspace}/operations
    • /admin/workspaces/{workspace}/operations/{run}
    • tenant dashboard surfaces that host NeedsAttention and BaselineCompareNow
    • shell active-work hint surface through BulkOperationProgress / ActiveRuns
  • Data Ownership: Existing OperationRun execution history stays in operation_runs. Tenant-bound runs remain tenant-owned operational artifacts via workspace_id + managed_environment_id and must enforce managed-environment entitlement; workspace-only runs are allowed only for explicitly workspace-owned operation types. No new table, migration, persisted current-actionability mirror, or historical data rewrite is introduced.
  • RBAC: Existing workspace membership, managed-environment entitlement, and OperationRunPolicy checks remain authoritative. Non-members remain deny-as-not-found. Capability denial remains 403 after membership is established.

For canonical-view specs:

  • Default filter behavior when tenant-context is active: Existing Operations workspace route and environment-prefilter behavior remain unchanged. Query filters may add or rename problem/actionability filters only if links remain workspace/environment safe.
  • Explicit entitlement checks preventing cross-tenant leakage: Actionability evaluation and counts must operate only on runs already scoped to the actor's workspace and entitled managed environment. Resolved/superseded references must be same workspace and same managed environment unless the operation type is explicitly workspace-only.

UI Surface Impact (mandatory - UI-COV-001)

Does this spec add, remove, rename, or materially change any reachable UI surface?

  • No UI surface impact
  • Existing page changed
  • New page/route added
  • Navigation changed
  • Filament panel/provider surface changed
  • New modal/drawer/wizard/action added
  • New table/form/state added
  • Customer-facing surface changed
  • Dangerous action changed
  • Status/evidence/review presentation changed
  • Workspace/environment context presentation changed

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

  • Route/page/surface:
    • App\Filament\Widgets\Dashboard\NeedsAttention
    • App\Filament\Widgets\Dashboard\BaselineCompareNow
    • App\Filament\Pages\Monitoring\Operations
    • App\Filament\Resources\OperationRunResource
    • App\Filament\Pages\Operations\TenantlessOperationRunViewer
    • App\Filament\Widgets\Operations\OperationsWorkbenchStats
    • App\Livewire\BulkOperationProgress
    • App\Support\GovernanceInbox\GovernanceInboxSectionBuilder
    • App\Support\EnvironmentDashboard\EnvironmentDashboardSummaryBuilder
    • App\Support\Workspaces\WorkspaceOverviewBuilder
    • App\Support\OpsUx\OperationUxPresenter
    • shared link/action helpers that derive Operations follow-up links
  • Current or new page archetype: Existing Operations Hub / monitoring-state page, tenant dashboard widgets, and shared active-run shell hint.
  • Design depth: Strategic Surface for Operations Hub; Domain Pattern Surface for dashboard widgets and shell hint.
  • Repo-truth level: repo-verified.
  • Existing pattern reused: Operations Hub page report, OperationRunActionEligibility, OperationRunLinks, OperationCatalog, OperationUxPresenter, ActiveRuns, dashboard widget patterns, existing Spec 365 browser smoke conventions.
  • New pattern required: one derived actionability truth contract; no new page, navigation branch, action modal family, or independent UI framework.
  • Screenshot required: yes during implementation if visible dashboard/operations copy or filter state changes. Store under specs/367-operationrun-actionability-system/artifacts/ if captured.
  • Page audit required: no new page audit required during prep. Implementation must update existing UI coverage artifacts or record a checked no-update rationale if visual structure remains pattern-compatible.
  • Customer-safe review required: no customer-facing surface. Operator-facing copy must still avoid raw provider payload, SQL, stack trace, secret, and debug leakage.
  • Dangerous-action review required: yes by negative proof. Restore, promotion, purge, and destructive-like runs must not become auto-resolved by unrelated later successes and must not gain new destructive UI actions.
  • Coverage files updated or explicitly not needed:
    • docs/ui-ux-enterprise-audit/route-inventory.md
    • docs/ui-ux-enterprise-audit/design-coverage-matrix.md
    • docs/ui-ux-enterprise-audit/page-reports/...
    • docs/ui-ux-enterprise-audit/strategic-surfaces.md
    • docs/ui-ux-enterprise-audit/grouped-follow-up-candidates.md
    • docs/ui-ux-enterprise-audit/unresolved-pages.md
    • N/A - no reachable UI surface impact
  • Coverage artifact decision: Implementation must either update existing Operations/dashboard coverage entries or record why no coverage artifact changed because the surface contract and visual hierarchy stayed unchanged.
  • No-impact rationale when applicable: N/A.

Cross-Cutting / Shared Pattern Reuse (mandatory)

  • Cross-cutting feature?: yes
  • Interaction class(es): status messaging, dashboard signals/cards, action links, Operations filters, shell active-work hints, related-operation CTAs.
  • Systems touched: OperationRun, OperationCatalog, OperationRunLinks, OperationRunActionEligibility, OperationRunReconciliationRegistry, OperationRunCorrelationResolver, OperationUxPresenter, ActiveRuns, dashboard widgets, Operations list/detail/workbench stats, governance inbox, environment dashboard summary, and workspace overview aggregation.
  • Existing pattern(s) to extend: existing OperationRun monitoring family, existing action eligibility path, existing operation catalog and alias resolution, existing reconciliation registry.
  • Shared contract / presenter / builder / renderer to reuse: Reuse OperationCatalog as operation-type inventory, reuse OperationRunActionEligibility for primary-action decisions, and feed Operations links through OperationRunLinks.
  • Why the existing shared path is sufficient or insufficient: Existing paths know execution state, stale active reconciliation, links, and UI action eligibility, but no existing path answers "does this historical terminal problem still require action today?".
  • Allowed deviation and why: Add a bounded derived actionability resolver/registry/policy family. Do not add persisted current-state truth or a parallel action UI system.
  • Consistency impact: Dashboard counts, baseline dashboard calmness, Operations problem filters, shell hints, primary CTAs, and action eligibility must agree on current actionability.
  • Review focus: No UI consumer may count raw terminalFollowUp() rows as current dashboard/action truth after the migration.

OperationRun UX Impact (mandatory)

  • Touches OperationRun start/completion/link UX?: yes, current-follow-up and deep-link semantics only.
  • Shared OperationRun UX contract/layer reused: OperationRunLinks, OperationRunActionEligibility, OperationUxPresenter, OperationRunReconciliationRegistry, OperationRunService lifecycle ownership remains unchanged.
  • Delegated start/completion UX behaviors: No new queued toast, browser event, run-start path, queued DB notification, or terminal notification path.
  • Local surface-owned behavior that remains: Dashboard/widget density and Operations list placement only.
  • Queued DB-notification policy: N/A - no new run-start behavior.
  • Terminal notification path: unchanged central lifecycle mechanism.
  • Exception required?: none.

Provider Boundary / Platform Core Check (mandatory)

  • Shared provider/platform boundary touched?: yes.
  • Boundary classification: mixed. Actionability is platform-core. Provider health and consent reason codes remain provider-owned diagnostics.
  • Seams affected: provider.connection.check actionability, provider connection health/consent state, OperationRun context/correlation, operator CTA copy.
  • Neutral platform terms preserved or introduced: operation, actionability, historical execution truth, current domain truth, current follow-up, superseded, resolved, manual review.
  • Provider-specific semantics retained and why: Provider Connection health and consent are current Microsoft-provider domain truth needed to fix the confirmed CTA loop.
  • Why this does not deepen provider coupling accidentally: Provider-specific logic stays inside the provider-connection policy. The actionability resolver consumes operation type and same-scope domain proof; it does not make Microsoft provider concepts the platform default.
  • Follow-up path: follow-up-spec only if later implementation discovers provider-specific actionability decisions spreading beyond provider-owned policy classes.

UI / Surface Guardrail Impact (mandatory)

Surface / Change Operator-facing surface change? Native vs Custom Shared-Family Relevance State Layers Touched Exception Needed? Low-Impact / N/A Note
Dashboard NeedsAttention current Operations card yes Existing Filament widget Blade dashboard signal/status messaging widget, link query no Replace raw terminal count with actionability count
Dashboard BaselineCompareNow calmness override yes Existing Filament widget Blade dashboard signal/status messaging widget no Use actionability count for Operations follow-up
Operations list problem filters/next action yes Native Filament table + existing presenters monitoring list/action links page, table, query no Preserve row history; current filters use actionability
OperationRun detail summary/action decision yes Existing Filament page/detail shared-detail-family detail, header, diagnostics no Historical run still visible; action guidance reflects current truth
Operations workbench stats yes Existing Filament stats widget monitoring status messaging widget, scoped query no "Needs attention" count uses current actionability
Governance inbox / environment dashboard / workspace overview operation follow-up yes Existing summary builders dashboard and inbox signal/status messaging aggregate queries, links no Current follow-up aggregates use actionability; historical runs remain reachable from Operations history
Shell active-work hint yes Existing Livewire shell hint active-run status messaging shell no Terminal follow-up is removed from active progress or shown only through a distinct actionability-backed non-active signal

Decision-First Surface Role (mandatory)

Surface Decision Role Human-in-the-loop Moment Immediately Visible for First Decision On-Demand Detail / Evidence Why This Is Primary or Why Not Workflow Alignment Attention-load Reduction
Dashboard attention cards Primary Decision Surface Decide whether the environment needs operator follow-up now current actionable count, direct CTA only when current action exists Operations history and raw run detail remain on Operations primary because it starts the operator's daily attention loop follows environment governance dashboard workflow removes old resolved blockers from today's work
Operations list Primary Decision Surface Triage current actionable operations while preserving history status/outcome, actionability, scope, next action raw context and reconciliation detail on detail page primary for operations triage filters reflect actual work, not storage history prevents opening resolved historical rows as current tasks
OperationRun detail Tertiary Evidence / Diagnostics Surface Understand why a run is actionable, superseded, resolved, or manual-review historical status plus current actionability explanation full raw/support diagnostics below/gated tertiary because the run is selected proof preserves audit history while clarifying current action removes conflict between history and dashboard
Operations workbench / governance inbox / workspace overview operation signals Primary Decision Surface Decide whether an operations follow-up family needs attention across workspace or environment context current actionable/manual-review count and safe Operations CTA Operations history and run detail primary where it drives attention queues; otherwise secondary summary follows existing dashboard/inbox/overview workflows prevents aggregate false-positive attention
Shell active hint Secondary Context Surface Know if active work is in progress or stale active-run state only detail link to Operations secondary; terminal actionability is not an active-run hint supports ongoing work awareness avoids mixing active progress with historical follow-up

Audience-Aware Disclosure (mandatory)

Surface Audience Modes In Scope Decision-First Default-Visible Content Operator Diagnostics Support / Raw Evidence One Dominant Next Action Hidden / Gated By Default Duplicate-Truth Prevention
Dashboard attention cards operator-MSP current follow-up count and CTA none none open current actionable operations raw run data no historical terminal count appears as current work
Operations list operator-MSP, support-platform actionability state, operation label, scope, reason summary, next action reason code and related proof summary raw context on detail only open run or related object raw payloads, stack traces, provider payloads one actionability label per row
OperationRun detail operator-MSP, support-platform historical result plus current actionability explanation superseding run/current-state proof raw context collapsed/capability-gated follow actionability recommendation or inspect history raw/support detail history and current follow-up are separate sections
Aggregate operation signals operator-MSP, support-platform current actionability counts and CTA none by default Operations history/detail only open current actionable operations raw run data no raw terminal count appears as current work

UI/UX Surface Classification (mandatory)

Surface Action Surface Class Surface Type Likely Next Operator Action Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type / Justification
Dashboard attention cards Dashboard signal Governance attention widget Open current actionable operations explicit CTA N/A none none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} environment/workspace context Operations current actionability count none
Operations list List / Table / Monitoring Monitoring-state page Open actionable run or related object row click opens detail required table/link actions none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} workspace/environment filters Operation run historical result plus current actionability none
OperationRun detail Detail / Diagnostics Shared-detail-family Understand or act on current actionability detail page N/A More/header group none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} workspace/environment chips Operation run execution truth and actionability truth separately none
Aggregate operation signals Dashboard / Inbox / Overview signal Existing summary widgets/builders Open current actionable operations explicit CTA N/A none none /admin/workspaces/{workspace}/operations /admin/workspaces/{workspace}/operations/{run} workspace/environment context Operations current actionability count none

Operator Surface Contract (mandatory)

Surface Primary Persona Decision / Operator Action Supported Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Dashboard attention cards tenant/MSP operator Decide whether to investigate operations now dashboard Is there current operational work, or only historical noise? actionable count, reason, link none current actionability, active stale attention none Open operations none
Operations list workspace/operator Prioritize operation follow-up monitoring list Which runs require action today? operation, scope, status/outcome, actionability, next action raw context and proof detail execution status, outcome, actionability, freshness none Open run/related object none
OperationRun detail operator/support Resolve contradiction between historical failure and current state detail Was this run historically problematic, and does it still matter now? execution truth, actionability result, proof summary raw context, stack/provider diagnostics execution, current domain truth, actionability none Open related/current action target none
Aggregate operation signals workspace/operator Decide whether an operations family needs follow-up across one environment or workspace dashboard/inbox/overview signal Is there current operations follow-up, or only old history? actionable/manual-review count, direct Operations CTA raw run data current actionability, active stale attention none Open current actionable operations none

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: no. Historical truth remains operation_runs; current domain truth remains each domain model; actionability is derived at read time.
  • New persisted entity/table/artifact?: no.
  • New abstraction?: yes, a bounded actionability resolver/registry/policy family.
  • New enum/state/reason family?: yes, derived actionability states such as actionable, superseded_by_later_success, resolved_by_current_state, requires_manual_review, informational_only, and not_terminal.
  • New cross-domain UI framework/taxonomy?: no. This feeds existing Operations/dashboard UI and must not become a generic UI framework.
  • Current operator problem: dashboards and CTAs can send users to fix already-resolved historical failures.
  • Existing structure is insufficient because: OperationRun currently exposes historical terminal status as follow-up truth, while existing reconciliation/action eligibility handles active stale runs and UI actions but not terminal current actionability.
  • Narrowest correct implementation: one derived resolver and policy registry over existing operation types, plus consumer migration and guard tests.
  • Ownership cost: actionability policies must be maintained when operation types are added; tests must cover known operation types and high-risk defaults.
  • Alternative intentionally rejected: a provider-only special case was rejected because repeatable sync, baseline, evidence, review, backup, restore, and promotion families need intentionally different current-actionability semantics.
  • Release truth: current-release truth; this directly fixes an observed dashboard/CTA loop and prevents similar false follow-up loops.

Compatibility posture

This feature assumes pre-production. No legacy aliases, migration shims, historical data rewrite, or dual-read compatibility path is required unless implementation proves an existing write path still emits a legacy operation alias and the spec is amended.

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

  • Test purpose / classification: Unit, Feature, Architecture/guard, Browser smoke.
  • Validation lane(s): fast-feedback for Unit/Feature guards; confidence for Filament/Livewire dashboard and Operations behavior; browser for one bounded dashboard-to-Operations CTA loop smoke if rendered UI changes.
  • Why this classification and these lanes are sufficient: Unit tests prove policy decisions; Feature tests prove dashboard counts, Operations filters, and authorization/isolation; guard tests prevent direct terminal-follow-up UI consumption; browser smoke proves the real user loop is gone.
  • New or expanded test families: one focused OperationRun actionability family under Unit/Feature/Operations and one bounded browser smoke if UI changes.
  • Fixture / helper cost impact: Use explicit factories for workspace, managed environment, provider connection, OperationRun, and related domain proof. Do not widen default test setup.
  • Heavy-family visibility / justification: Browser smoke is explicit and limited to the confirmed Provider Connections dashboard loop if implementation changes rendered UI.
  • Special surface test profile: monitoring-state-page, dashboard-signal, shared-detail-family.
  • Standard-native relief or required special coverage: Required coverage for status/actionability semantics, cross-workspace isolation, high-risk manual-review defaults, and no raw terminal follow-up in dashboard.
  • Reviewer handoff: Verify lane fit, guard coverage, actionability policy coverage, no N+1-prone per-row policy queries in Operations table, and no terminalFollowUp() UI consumer remains.
  • Budget / baseline / trend impact: Bounded Unit/Feature tests plus optional single browser smoke. No new heavy-governance family.
  • Escalation needed: none if actionability stays derived and bounded; follow-up-spec if implementation discovers a need for manual acknowledgement/resolution UI.
  • Active feature PR close-out entry: Guardrail + Smoke Coverage.
  • Planned validation commands:
    • cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Unit/Support/Operations tests/Feature/Operations tests/Feature/Monitoring tests/Feature/Filament
    • cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Feature/Guards
    • cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Browser/Spec367OperationRunActionabilitySmokeTest.php when browser UI changes are implemented
    • cd apps/platform && ./vendor/bin/sail pint --dirty --test
    • git diff --check

User Scenarios & Testing (mandatory)

User Story 1 - Dashboard stops false Provider Connection CTA loop (Priority: P1)

As an MSP operator, I want an old provider connection blocker to disappear from today's dashboard follow-up when the same connection is now healthy, so I am not sent through a dead-end fix loop.

Why this priority: This is the confirmed root symptom and the clearest operator-trust failure.

Independent Test: Create an old blocked provider.connection.check, a later successful same-scope check or healthy ProviderConnection state, render dashboard attention, and verify no Provider Connection/terminal follow-up CTA appears for the old run.

Acceptance Scenarios:

  1. Given a blocked old provider check and a later successful same-scope provider check, When the dashboard renders, Then the old run is not counted as current follow-up.
  2. Given a blocked old provider check and current ProviderConnection consent_status=granted plus verification_status=healthy, When actionability is evaluated, Then the run is resolved_by_current_state and no current provider CTA is emitted.
  3. Given a blocked provider check with no later success and unhealthy current state, When actionability is evaluated, Then the run remains actionable.

User Story 2 - Repeatable operations are superseded by later same-scope success (Priority: P1)

As an operator, I want old failed sync/evidence/baseline/review/backup runs to stop driving current follow-up when a later successful same-scope run proves the work is now complete.

Why this priority: Repeatable operation families are common dashboard and Operations noise sources.

Independent Test: Create old failed repeatable runs and later successful same-scope runs for inventory, baseline, evidence/review, and backup families; verify actionability returns superseded_by_later_success.

Acceptance Scenarios:

  1. Given an old failed inventory.sync and a later succeeded same-scope inventory.sync, When Operations follow-up filters run, Then only current actionable runs appear.
  2. Given an old evidence generation failure and a later usable same-scope Evidence Snapshot, When actionability is evaluated, Then the old run does not drive a dashboard CTA.
  3. Given insufficient correlation proof, When an old failed repeatable run is evaluated, Then it remains actionable or manual-review instead of being silently hidden.

User Story 3 - High-risk operations remain manual-review unless explicitly resolved (Priority: P1)

As an operator, I want restore, promotion, purge, and destructive-like operation failures to remain visible for deliberate review unless a type-specific policy can prove resolution, so dangerous operations do not disappear from attention incorrectly.

Why this priority: False calm on high-risk operations is worse than extra review.

Independent Test: Create failed restore/promotion/purge runs plus unrelated later successes and verify actionability remains requires_manual_review.

Acceptance Scenarios:

  1. Given a failed restore.execute and a later successful backup run, When actionability is evaluated, Then the restore remains requires_manual_review.
  2. Given a failed promotion.execute, When Operations filters current follow-up, Then it remains visible unless a future explicit policy says otherwise.
  3. Given high-risk runs, When action eligibility renders, Then no automatic retry/re-execute/destructive action is introduced by this spec.

User Story 4 - Operations preserves history while separating current actionability (Priority: P2)

As a support/operator user, I want Operations history to show the historical outcome and the current actionability explanation separately, so audit history remains intact without confusing today's work.

Why this priority: The platform must preserve audit depth while keeping current work queues quiet and truthful.

Independent Test: Render Operations list/detail for actionable, superseded, resolved, manual-review, informational, active-stale, and succeeded runs and assert historical status remains visible while current actionability controls current follow-up.

Acceptance Scenarios:

  1. Given a superseded old failed run, When Operations history renders, Then the row remains visible in history but not in current-follow-up filters.
  2. Given a manual-review high-risk run, When Operations detail renders, Then current actionability explains why review is still needed.
  3. Given an active stale run, When actionability is evaluated, Then active stale truth remains handled by existing freshness/reconciliation paths, not terminal actionability.

Functional Requirements (mandatory)

  • FR-367-001: The system MUST provide a central derived OperationRun actionability entry point that answers whether a terminal run is current operator follow-up truth.
  • FR-367-002: The resolver MUST distinguish historical execution truth, current domain truth, and UI actionability truth.
  • FR-367-003: The resolver MUST return at least status, actionable boolean, reason code, explanation, optional superseding run id, optional resolving model reference, and policy identifier or equivalent debug metadata.
  • FR-367-004: The actionability registry MUST cover every canonical operation type known to OperationCatalog and any provider/reconciliation operation types discovered by implementation.
  • FR-367-005: Unknown operation types MUST fail guard tests and MUST fail closed at runtime as manual-review/actionable or explicitly unsupported; no silent non-actionable default is allowed.
  • FR-367-006: Provider connection check policy MUST mark old provider blockers non-actionable when a later same-scope successful check exists or current same-scope ProviderConnection state proves consent_status=granted and verification_status=healthy.
  • FR-367-007: Repeatable sync policies MUST supersede old failed/blocked/partial runs only when later same-scope success is proven through canonical type/alias family, workspace, environment, provider/connection, and relevant selection or target scope.
  • FR-367-008: Baseline, evidence, review, review-pack, and backup artifact policies MUST use current repo-backed artifact truth where available and MUST avoid guessing when correlation proof is missing.
  • FR-367-009: Restore, promotion, purge, and destructive-like operation policies MUST default to requires_manual_review for terminal problem outcomes unless an explicit type-specific policy proves otherwise.
  • FR-367-010: Dashboard and current-follow-up consumers MUST stop using raw terminalFollowUp() / dashboardNeedsFollowUp() as current actionability truth.
  • FR-367-011: Operations history MUST keep historical terminal runs visible outside current-follow-up filters.
  • FR-367-012: OperationRunActionEligibility MUST consume or align with actionability so primary actions and disabled reasons do not contradict dashboard/current-follow-up state.
  • FR-367-013: Actionability evaluation MUST be batch-friendly for Operations list/dashboard counts and MUST avoid per-row N+1 domain queries where predictable eager loading or grouped lookup can be used.
  • FR-367-014: Cross-workspace and cross-environment proofs MUST NOT supersede or resolve a run.
  • FR-367-015: No Graph/provider calls may occur during actionability evaluation or UI render.
  • FR-367-016: Guard tests MUST fail when new operation types are introduced without actionability policy coverage.
  • FR-367-017: Guard tests MUST fail when dashboard/current-follow-up UI code directly consumes historical terminal-follow-up scopes or methods after migration.

Non-Functional Requirements

  • NFR-367-001: Actionability is derived, deterministic, and DB-only at render time.
  • NFR-367-002: Evaluation must remain tenant/workspace scoped and RBAC-respecting.
  • NFR-367-003: Copy must be operator-readable and must not expose raw provider payloads, secrets, stack traces, SQL, queue payloads, or internal exception text by default.
  • NFR-367-004: Tests must protect business truth over thin presentation helpers.
  • NFR-367-005: No new package, migration, queue, scheduler, asset registration, panel provider, or env var is required.

Actionability Status Semantics

The exact implementation may be an enum or value object, but the following derived statuses are required:

Status Meaning Counts as current dashboard follow-up?
actionable Operator can or must take a current action yes
requires_manual_review Safe automatic resolution is not possible; deliberate review is required yes
superseded_by_later_success Later same-scope success proves the old terminal problem is no longer current no
resolved_by_current_state Current domain state proves the problem is no longer current no
informational_only Historical/audit information only no
not_terminal Run is active or not a terminal problem no

Policy Groups

  • Provider connection checks: provider.connection.check; can resolve through later same-scope success or healthy ProviderConnection current state.
  • Repeatable sync operations: inventory.sync, inventory_sync, policy.sync, directory.groups.sync, directory.role_definitions.sync, compliance.snapshot, permission_posture_check; can supersede through later same-scope success.
  • Baseline operations: baseline.capture, baseline.compare; can resolve/supersede only through same-scope later success or current baseline artifact truth.
  • Evidence/review/report artifact operations: environment.review.compose, environment.review_pack.generate, tenant.evidence.snapshot.generate, evidence_snapshot.generate, plus stored report/report delivery types discovered in repo; can resolve through usable current artifact proof.
  • Backup operations: backup_set.update, backup.schedule.execute, backup.schedule.retention, backup.schedule.purge; update/execute may supersede with proof, purge defaults manual-review if safety proof is insufficient.
  • Restore/promotion/mutation operations: restore.execute, promotion.execute, destructive operation families; default manual-review for terminal problems.
  • Alert/notification/delivery operations: classify discovered alert delivery/evaluation types deliberately; no silent default.
  • Informational/historical-only operations: classify explicitly only when current product behavior proves no dashboard action is appropriate.
  • Remaining canonical OperationCatalog types: every canonical type not already listed, including policy snapshot/export/delete/restore, assignment fetch/restore, backup set archive/restore/delete, restore-run delete/restore/force-delete, tenant sync, policy-version prune/restore/force-delete, ops reconciliation, RBAC health check, Entra admin role scan, and any discovered aliases, must receive an explicit actionability policy or explicit informational/manual-review classification.

Out of Scope

  • New persisted actionability state, table, or migration.
  • Rewriting historical operation_runs.
  • Manual acknowledge/resolve UI for operations.
  • Full Resolved/Superseded/Historical Operations UX tabs.
  • Notification redesign or alert delivery UX.
  • Provider Connection feature expansion.
  • Restore/backup/promotion behavior expansion.
  • New destructive actions.
  • Global search enablement for OperationRunResource.
  • Filament panel/provider registration changes.
  • Asset or theme registration.

Success Criteria

  • SC-367-001: The known Provider Connection loop is impossible in dashboard and Operations CTAs.
  • SC-367-002: Every known canonical operation type has explicit actionability coverage or an explicit manual-review/informational policy.
  • SC-367-003: Dashboard Operations follow-up counts use current actionability, not raw terminal historical status.
  • SC-367-004: Operations list/detail preserve historical status while clearly separating current actionability.
  • SC-367-005: High-risk operation failures remain manual-review by default.
  • SC-367-006: Guard tests prevent direct UI consumption of historical terminal-follow-up scopes for current follow-up.
  • SC-367-007: No application render path performs Graph calls or cross-tenant/current-state leakage.

Risks

  • Over-abstraction risk: A policy registry can become a generic framework. Mitigation: keep policies narrow, derived-only, and limited to known operation groups.
  • False calm risk: Superseding too aggressively can hide real failures. Mitigation: require same-scope proof and default to actionable/manual-review when proof is incomplete.
  • Performance risk: Operations tables could evaluate actionability per row with N+1 queries. Mitigation: require batch evaluation and grouped lookups.
  • UI drift risk: Dashboard, Operations list, and detail may disagree. Mitigation: central resolver plus consumer guard tests.

Assumptions

  • The product remains pre-production under LEAN-001.
  • OperationCatalog is the primary operation-type source for this slice.
  • Provider Connection current state has enough persisted timestamps or later successful run proof to avoid guessing.
  • Existing Operations routes, RBAC, and global-search-disabled posture stay unchanged.

Open Questions

  • None blocking preparation. Implementation must verify the exact healthy ProviderConnection timestamp/proof fields before using current state as resolution proof.

Follow-up Spec Candidates

  • Manual OperationRun acknowledgement / resolve UX.
  • Resolved and Superseded Operations history tabs or filters beyond the minimum current-follow-up filter.
  • Actionability explanation UI polish if operator/support audiences need richer proof detail.
  • Alert-delivery actionability refinement if discovered operation families need more than v1 manual-review/informational defaults.