ahmido 564da05096 feat: implement operation run actionability system (#439 )

This PR introduces the Operation Run Actionability System.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #439

2026-06-08 13:34:25 +00:00

42 KiB

Raw Permalink Blame History

Feature Specification: OperationRun Actionability System v1

Feature Branch: 367-operationrun-actionability-system
Created: 2026-06-08
Status: Draft / Ready for implementation
Input: User-provided Spec 367 draft: separate historical terminal OperationRun truth from current UI follow-up truth.

Spec Candidate Check (mandatory - SPEC-GATE-001)

Problem: Historical terminal problem runs are currently treated as today's operator follow-up truth. A resolved old provider.connection.check blocker can still produce dashboard warnings and provider CTAs after later success or current healthy provider state.
Today's failure: Operators can be sent from Dashboard to Provider Connections for an old provider_consent_missing run even when the Provider Connection page now shows consent_status=granted and verification_status=healthy. This creates a loop between false dashboard attention and correct domain state.
User-visible improvement: Dashboard, Operations hub, shell active-run hints, baseline widgets, and primary CTAs will use one current actionability decision instead of raw historical terminal status. Historical failures remain visible in Operations history, but only current actionable runs drive follow-up counts and CTAs.
Smallest enterprise-capable version: Add a derived, non-persisted OperationRun actionability layer that classifies known terminal OperationRun types, handles the provider CTA loop, supports superseded repeatable operations, keeps high-risk restore/promotion/purge runs manual-review by default, and migrates current UI follow-up consumers from terminalFollowUp() / dashboardNeedsFollowUp() to actionability.
Explicit non-goals: No legacy data migration, no rewriting historical runs, no manual acknowledge/resolve UI, no new Operations tab system for Resolved/Historical, no notification redesign, no alert delivery UX, no new Provider Connection feature, no restore/backup feature expansion, no destructive actions, no global search enablement for OperationRunResource, no panel/provider/asset/theme changes.
Permanent complexity imported: One derived status/value object family, one central resolver/registry/policy layer, correlation helpers only where existing context is insufficient, guard tests for operation-type coverage and direct UI use of historical terminal-follow-up scopes, and focused Unit/Feature/Browser coverage.
Why now: Specs 358-365 made OperationRun execution, reconciliation, links, and operator actions more mature, but repo truth still exposes raw terminal follow-up via OperationRun::terminalFollowUp(), dashboardNeedsFollowUp(), problemClass(), NeedsAttention, BaselineCompareNow, and Operations filters.
Why not local: Fixing the Provider Connections special case in one widget would leave other repeatable runs and other consumers with the same historical-vs-current truth bug. The repo already has multiple concrete consumers, so the boundary must be central and testable.
Approval class: Core Enterprise
Red flags triggered: New status axis, resolver/registry layer, multiple UI consumers. Defense: the layer is derived-only, persistence-free, anchored to a confirmed operator loop, reuses existing OperationCatalog, OperationRunActionEligibility, OperationRunLinks, and reconciliation helpers, and directly prevents false governance/operations CTAs.
Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexitaet: 1 | Produktnaehe: 2 | Wiederverwendung: 2 | Gesamt: 11/12
Decision: approve as a bounded current-actionability truth slice.

Repo Truth Reconciliation

The user draft is accepted as the candidate, with these repo-based scope corrections:

OperationRunActionEligibility already exists and is consumed by Operations list/detail action surfaces. Spec 367 must extend or feed that path; it must not create a parallel action eligibility framework.
OperationCatalog is the canonical operation-type inventory and alias resolver. Actionability coverage must compare against that catalog and known provider/reconciliation types instead of inventing a second source of operation-type truth.
OperationRunReconciliationRegistry and adapters already resolve stale active run proof for several families. Actionability must separate terminal current-follow-up truth from active stale reconciliation truth while reusing existing reconciliation evidence where useful.
OperationRun::terminalFollowUp(), dashboardNeedsFollowUp(), problemClass(), requiresOperatorReview(), and requiresDashboardFollowUp() are existing historical/problem helpers. This spec may deprecate or constrain them but must migrate UI consumers before any removal.
The current canonical Operations routes are workspace-scoped: /admin/workspaces/{workspace}/operations and /admin/workspaces/{workspace}/operations/{run}. OperationRunResource remains globally non-searchable.

Completed-Spec Guardrail

Related specs are context only and are not modified by this prep package:

specs/358-operationrun-queue-truth-foundation/
specs/359-operationrun-reconciliation-adapter-framework-review-compose-adapter/
specs/360-operationrun-canonical-cutover-cleanup/
specs/361-report-evidence-reconciliation/
specs/362-sync-capture-backup-operation-semantics/
specs/363-explicit-uiactioncontext-contract/ (implemented)
specs/364-restore-high-risk-operation-reconciliation/
specs/365-operations-ui-operator-actions-regression-gate/ (implementation close-out signals)

Spec 367 is a new package because these predecessors do not fully define current terminal actionability or migrate all dashboard/current-follow-up consumers away from historical terminal status.

Spec Scope Fields (mandatory)

Scope: workspace, tenant, canonical-view
Primary Routes:
- /admin/workspaces/{workspace}/operations
- /admin/workspaces/{workspace}/operations/{run}
- tenant dashboard surfaces that host NeedsAttention and BaselineCompareNow
- shell active-work hint surface through BulkOperationProgress / ActiveRuns
Data Ownership: Existing OperationRun execution history stays in operation_runs. Tenant-bound runs remain tenant-owned operational artifacts via workspace_id + managed_environment_id and must enforce managed-environment entitlement; workspace-only runs are allowed only for explicitly workspace-owned operation types. No new table, migration, persisted current-actionability mirror, or historical data rewrite is introduced.
RBAC: Existing workspace membership, managed-environment entitlement, and OperationRunPolicy checks remain authoritative. Non-members remain deny-as-not-found. Capability denial remains 403 after membership is established.

For canonical-view specs:

Default filter behavior when tenant-context is active: Existing Operations workspace route and environment-prefilter behavior remain unchanged. Query filters may add or rename problem/actionability filters only if links remain workspace/environment safe.
Explicit entitlement checks preventing cross-tenant leakage: Actionability evaluation and counts must operate only on runs already scoped to the actor's workspace and entitled managed environment. Resolved/superseded references must be same workspace and same managed environment unless the operation type is explicitly workspace-only.

UI Surface Impact (mandatory - UI-COV-001)

Does this spec add, remove, rename, or materially change any reachable UI surface?

No UI surface impact
Existing page changed
New page/route added
Navigation changed
Filament panel/provider surface changed
New modal/drawer/wizard/action added
New table/form/state added
Customer-facing surface changed
Dangerous action changed
Status/evidence/review presentation changed
Workspace/environment context presentation changed

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

Route/page/surface:
- App\Filament\Widgets\Dashboard\NeedsAttention
- App\Filament\Widgets\Dashboard\BaselineCompareNow
- App\Filament\Pages\Monitoring\Operations
- App\Filament\Resources\OperationRunResource
- App\Filament\Pages\Operations\TenantlessOperationRunViewer
- App\Filament\Widgets\Operations\OperationsWorkbenchStats
- App\Livewire\BulkOperationProgress
- App\Support\GovernanceInbox\GovernanceInboxSectionBuilder
- App\Support\EnvironmentDashboard\EnvironmentDashboardSummaryBuilder
- App\Support\Workspaces\WorkspaceOverviewBuilder
- App\Support\OpsUx\OperationUxPresenter
- shared link/action helpers that derive Operations follow-up links
Current or new page archetype: Existing Operations Hub / monitoring-state page, tenant dashboard widgets, and shared active-run shell hint.
Design depth: Strategic Surface for Operations Hub; Domain Pattern Surface for dashboard widgets and shell hint.
Repo-truth level: repo-verified.
Existing pattern reused: Operations Hub page report, OperationRunActionEligibility, OperationRunLinks, OperationCatalog, OperationUxPresenter, ActiveRuns, dashboard widget patterns, existing Spec 365 browser smoke conventions.
New pattern required: one derived actionability truth contract; no new page, navigation branch, action modal family, or independent UI framework.
Screenshot required: yes during implementation if visible dashboard/operations copy or filter state changes. Store under specs/367-operationrun-actionability-system/artifacts/ if captured.
Page audit required: no new page audit required during prep. Implementation must update existing UI coverage artifacts or record a checked no-update rationale if visual structure remains pattern-compatible.
Customer-safe review required: no customer-facing surface. Operator-facing copy must still avoid raw provider payload, SQL, stack trace, secret, and debug leakage.
Dangerous-action review required: yes by negative proof. Restore, promotion, purge, and destructive-like runs must not become auto-resolved by unrelated later successes and must not gain new destructive UI actions.
Coverage files updated or explicitly not needed:
- docs/ui-ux-enterprise-audit/route-inventory.md
- docs/ui-ux-enterprise-audit/design-coverage-matrix.md
- docs/ui-ux-enterprise-audit/page-reports/...
- docs/ui-ux-enterprise-audit/strategic-surfaces.md
- docs/ui-ux-enterprise-audit/grouped-follow-up-candidates.md
- docs/ui-ux-enterprise-audit/unresolved-pages.md
- N/A - no reachable UI surface impact
Coverage artifact decision: Implementation must either update existing Operations/dashboard coverage entries or record why no coverage artifact changed because the surface contract and visual hierarchy stayed unchanged.
No-impact rationale when applicable: N/A.

Cross-Cutting / Shared Pattern Reuse (mandatory)

Cross-cutting feature?: yes
Interaction class(es): status messaging, dashboard signals/cards, action links, Operations filters, shell active-work hints, related-operation CTAs.
Systems touched: OperationRun, OperationCatalog, OperationRunLinks, OperationRunActionEligibility, OperationRunReconciliationRegistry, OperationRunCorrelationResolver, OperationUxPresenter, ActiveRuns, dashboard widgets, Operations list/detail/workbench stats, governance inbox, environment dashboard summary, and workspace overview aggregation.
Existing pattern(s) to extend: existing OperationRun monitoring family, existing action eligibility path, existing operation catalog and alias resolution, existing reconciliation registry.
Shared contract / presenter / builder / renderer to reuse: Reuse OperationCatalog as operation-type inventory, reuse OperationRunActionEligibility for primary-action decisions, and feed Operations links through OperationRunLinks.
Why the existing shared path is sufficient or insufficient: Existing paths know execution state, stale active reconciliation, links, and UI action eligibility, but no existing path answers "does this historical terminal problem still require action today?".
Allowed deviation and why: Add a bounded derived actionability resolver/registry/policy family. Do not add persisted current-state truth or a parallel action UI system.
Consistency impact: Dashboard counts, baseline dashboard calmness, Operations problem filters, shell hints, primary CTAs, and action eligibility must agree on current actionability.
Review focus: No UI consumer may count raw terminalFollowUp() rows as current dashboard/action truth after the migration.

OperationRun UX Impact (mandatory)

Touches OperationRun start/completion/link UX?: yes, current-follow-up and deep-link semantics only.
Shared OperationRun UX contract/layer reused: OperationRunLinks, OperationRunActionEligibility, OperationUxPresenter, OperationRunReconciliationRegistry, OperationRunService lifecycle ownership remains unchanged.
Delegated start/completion UX behaviors: No new queued toast, browser event, run-start path, queued DB notification, or terminal notification path.
Local surface-owned behavior that remains: Dashboard/widget density and Operations list placement only.
Queued DB-notification policy: N/A - no new run-start behavior.
Terminal notification path: unchanged central lifecycle mechanism.
Exception required?: none.

Provider Boundary / Platform Core Check (mandatory)

Shared provider/platform boundary touched?: yes.
Boundary classification: mixed. Actionability is platform-core. Provider health and consent reason codes remain provider-owned diagnostics.
Seams affected: provider.connection.check actionability, provider connection health/consent state, OperationRun context/correlation, operator CTA copy.
Neutral platform terms preserved or introduced: operation, actionability, historical execution truth, current domain truth, current follow-up, superseded, resolved, manual review.
Provider-specific semantics retained and why: Provider Connection health and consent are current Microsoft-provider domain truth needed to fix the confirmed CTA loop.
Why this does not deepen provider coupling accidentally: Provider-specific logic stays inside the provider-connection policy. The actionability resolver consumes operation type and same-scope domain proof; it does not make Microsoft provider concepts the platform default.
Follow-up path: follow-up-spec only if later implementation discovers provider-specific actionability decisions spreading beyond provider-owned policy classes.

UI / Surface Guardrail Impact (mandatory)

Surface / Change	Operator-facing surface change?	Native vs Custom	Shared-Family Relevance	State Layers Touched	Exception Needed?	Low-Impact / N/A Note
Dashboard `NeedsAttention` current Operations card	yes	Existing Filament widget Blade	dashboard signal/status messaging	widget, link query	no	Replace raw terminal count with actionability count
Dashboard `BaselineCompareNow` calmness override	yes	Existing Filament widget Blade	dashboard signal/status messaging	widget	no	Use actionability count for Operations follow-up
Operations list problem filters/next action	yes	Native Filament table + existing presenters	monitoring list/action links	page, table, query	no	Preserve row history; current filters use actionability
OperationRun detail summary/action decision	yes	Existing Filament page/detail	shared-detail-family	detail, header, diagnostics	no	Historical run still visible; action guidance reflects current truth
Operations workbench stats	yes	Existing Filament stats widget	monitoring status messaging	widget, scoped query	no	"Needs attention" count uses current actionability
Governance inbox / environment dashboard / workspace overview operation follow-up	yes	Existing summary builders	dashboard and inbox signal/status messaging	aggregate queries, links	no	Current follow-up aggregates use actionability; historical runs remain reachable from Operations history
Shell active-work hint	yes	Existing Livewire shell hint	active-run status messaging	shell	no	Terminal follow-up is removed from active progress or shown only through a distinct actionability-backed non-active signal

Decision-First Surface Role (mandatory)

Surface	Decision Role	Human-in-the-loop Moment	Immediately Visible for First Decision	On-Demand Detail / Evidence	Why This Is Primary or Why Not	Workflow Alignment	Attention-load Reduction
Dashboard attention cards	Primary Decision Surface	Decide whether the environment needs operator follow-up now	current actionable count, direct CTA only when current action exists	Operations history and raw run detail remain on Operations	primary because it starts the operator's daily attention loop	follows environment governance dashboard workflow	removes old resolved blockers from today's work
Operations list	Primary Decision Surface	Triage current actionable operations while preserving history	status/outcome, actionability, scope, next action	raw context and reconciliation detail on detail page	primary for operations triage	filters reflect actual work, not storage history	prevents opening resolved historical rows as current tasks
OperationRun detail	Tertiary Evidence / Diagnostics Surface	Understand why a run is actionable, superseded, resolved, or manual-review	historical status plus current actionability explanation	full raw/support diagnostics below/gated	tertiary because the run is selected proof	preserves audit history while clarifying current action	removes conflict between history and dashboard
Operations workbench / governance inbox / workspace overview operation signals	Primary Decision Surface	Decide whether an operations follow-up family needs attention across workspace or environment context	current actionable/manual-review count and safe Operations CTA	Operations history and run detail	primary where it drives attention queues; otherwise secondary summary	follows existing dashboard/inbox/overview workflows	prevents aggregate false-positive attention
Shell active hint	Secondary Context Surface	Know if active work is in progress or stale	active-run state only	detail link to Operations	secondary; terminal actionability is not an active-run hint	supports ongoing work awareness	avoids mixing active progress with historical follow-up

Audience-Aware Disclosure (mandatory)

Surface	Audience Modes In Scope	Decision-First Default-Visible Content	Operator Diagnostics	Support / Raw Evidence	One Dominant Next Action	Hidden / Gated By Default	Duplicate-Truth Prevention
Dashboard attention cards	operator-MSP	current follow-up count and CTA	none	none	open current actionable operations	raw run data	no historical terminal count appears as current work
Operations list	operator-MSP, support-platform	actionability state, operation label, scope, reason summary, next action	reason code and related proof summary	raw context on detail only	open run or related object	raw payloads, stack traces, provider payloads	one actionability label per row
OperationRun detail	operator-MSP, support-platform	historical result plus current actionability explanation	superseding run/current-state proof	raw context collapsed/capability-gated	follow actionability recommendation or inspect history	raw/support detail	history and current follow-up are separate sections
Aggregate operation signals	operator-MSP, support-platform	current actionability counts and CTA	none by default	Operations history/detail only	open current actionable operations	raw run data	no raw terminal count appears as current work

UI/UX Surface Classification (mandatory)

Surface	Action Surface Class	Surface Type	Likely Next Operator Action	Primary Inspect/Open Model	Row Click	Secondary Actions Placement	Destructive Actions Placement	Canonical Collection Route	Canonical Detail Route	Scope Signals	Canonical Noun	Critical Truth Visible by Default	Exception Type / Justification
Dashboard attention cards	Dashboard signal	Governance attention widget	Open current actionable operations	explicit CTA	N/A	none	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	environment/workspace context	Operations	current actionability count	none
Operations list	List / Table / Monitoring	Monitoring-state page	Open actionable run or related object	row click opens detail	required	table/link actions	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	workspace/environment filters	Operation run	historical result plus current actionability	none
OperationRun detail	Detail / Diagnostics	Shared-detail-family	Understand or act on current actionability	detail page	N/A	More/header group	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	workspace/environment chips	Operation run	execution truth and actionability truth separately	none
Aggregate operation signals	Dashboard / Inbox / Overview signal	Existing summary widgets/builders	Open current actionable operations	explicit CTA	N/A	none	none	`/admin/workspaces/{workspace}/operations`	`/admin/workspaces/{workspace}/operations/{run}`	workspace/environment context	Operations	current actionability count	none

Operator Surface Contract (mandatory)

Surface	Primary Persona	Decision / Operator Action Supported	Surface Type	Primary Operator Question	Default-visible Information	Diagnostics-only Information	Status Dimensions Used	Mutation Scope	Primary Actions	Dangerous Actions
Dashboard attention cards	tenant/MSP operator	Decide whether to investigate operations now	dashboard	Is there current operational work, or only historical noise?	actionable count, reason, link	none	current actionability, active stale attention	none	Open operations	none
Operations list	workspace/operator	Prioritize operation follow-up	monitoring list	Which runs require action today?	operation, scope, status/outcome, actionability, next action	raw context and proof detail	execution status, outcome, actionability, freshness	none	Open run/related object	none
OperationRun detail	operator/support	Resolve contradiction between historical failure and current state	detail	Was this run historically problematic, and does it still matter now?	execution truth, actionability result, proof summary	raw context, stack/provider diagnostics	execution, current domain truth, actionability	none	Open related/current action target	none
Aggregate operation signals	workspace/operator	Decide whether an operations family needs follow-up across one environment or workspace	dashboard/inbox/overview signal	Is there current operations follow-up, or only old history?	actionable/manual-review count, direct Operations CTA	raw run data	current actionability, active stale attention	none	Open current actionable operations	none

Proportionality Review (mandatory when structural complexity is introduced)

New source of truth?: no. Historical truth remains operation_runs; current domain truth remains each domain model; actionability is derived at read time.
New persisted entity/table/artifact?: no.
New abstraction?: yes, a bounded actionability resolver/registry/policy family.
New enum/state/reason family?: yes, derived actionability states such as actionable, superseded_by_later_success, resolved_by_current_state, requires_manual_review, informational_only, and not_terminal.
New cross-domain UI framework/taxonomy?: no. This feeds existing Operations/dashboard UI and must not become a generic UI framework.
Current operator problem: dashboards and CTAs can send users to fix already-resolved historical failures.
Existing structure is insufficient because: OperationRun currently exposes historical terminal status as follow-up truth, while existing reconciliation/action eligibility handles active stale runs and UI actions but not terminal current actionability.
Narrowest correct implementation: one derived resolver and policy registry over existing operation types, plus consumer migration and guard tests.
Ownership cost: actionability policies must be maintained when operation types are added; tests must cover known operation types and high-risk defaults.
Alternative intentionally rejected: a provider-only special case was rejected because repeatable sync, baseline, evidence, review, backup, restore, and promotion families need intentionally different current-actionability semantics.
Release truth: current-release truth; this directly fixes an observed dashboard/CTA loop and prevents similar false follow-up loops.

Compatibility posture

This feature assumes pre-production. No legacy aliases, migration shims, historical data rewrite, or dual-read compatibility path is required unless implementation proves an existing write path still emits a legacy operation alias and the spec is amended.

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

Test purpose / classification: Unit, Feature, Architecture/guard, Browser smoke.
Validation lane(s): fast-feedback for Unit/Feature guards; confidence for Filament/Livewire dashboard and Operations behavior; browser for one bounded dashboard-to-Operations CTA loop smoke if rendered UI changes.
Why this classification and these lanes are sufficient: Unit tests prove policy decisions; Feature tests prove dashboard counts, Operations filters, and authorization/isolation; guard tests prevent direct terminal-follow-up UI consumption; browser smoke proves the real user loop is gone.
New or expanded test families: one focused OperationRun actionability family under Unit/Feature/Operations and one bounded browser smoke if UI changes.
Fixture / helper cost impact: Use explicit factories for workspace, managed environment, provider connection, OperationRun, and related domain proof. Do not widen default test setup.
Heavy-family visibility / justification: Browser smoke is explicit and limited to the confirmed Provider Connections dashboard loop if implementation changes rendered UI.
Special surface test profile: monitoring-state-page, dashboard-signal, shared-detail-family.
Standard-native relief or required special coverage: Required coverage for status/actionability semantics, cross-workspace isolation, high-risk manual-review defaults, and no raw terminal follow-up in dashboard.
Reviewer handoff: Verify lane fit, guard coverage, actionability policy coverage, no N+1-prone per-row policy queries in Operations table, and no terminalFollowUp() UI consumer remains.
Budget / baseline / trend impact: Bounded Unit/Feature tests plus optional single browser smoke. No new heavy-governance family.
Escalation needed: none if actionability stays derived and bounded; follow-up-spec if implementation discovers a need for manual acknowledgement/resolution UI.
Active feature PR close-out entry: Guardrail + Smoke Coverage.
Planned validation commands:
- cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Unit/Support/Operations tests/Feature/Operations tests/Feature/Monitoring tests/Feature/Filament
- cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Feature/Guards
- cd apps/platform && ./vendor/bin/sail php vendor/bin/pest tests/Browser/Spec367OperationRunActionabilitySmokeTest.php when browser UI changes are implemented
- cd apps/platform && ./vendor/bin/sail pint --dirty --test
- git diff --check

User Scenarios & Testing (mandatory)

User Story 1 - Dashboard stops false Provider Connection CTA loop (Priority: P1)

As an MSP operator, I want an old provider connection blocker to disappear from today's dashboard follow-up when the same connection is now healthy, so I am not sent through a dead-end fix loop.

Why this priority: This is the confirmed root symptom and the clearest operator-trust failure.

Independent Test: Create an old blocked provider.connection.check, a later successful same-scope check or healthy ProviderConnection state, render dashboard attention, and verify no Provider Connection/terminal follow-up CTA appears for the old run.

Acceptance Scenarios:

Given a blocked old provider check and a later successful same-scope provider check, When the dashboard renders, Then the old run is not counted as current follow-up.
Given a blocked old provider check and current ProviderConnection consent_status=granted plus verification_status=healthy, When actionability is evaluated, Then the run is resolved_by_current_state and no current provider CTA is emitted.
Given a blocked provider check with no later success and unhealthy current state, When actionability is evaluated, Then the run remains actionable.

User Story 2 - Repeatable operations are superseded by later same-scope success (Priority: P1)

As an operator, I want old failed sync/evidence/baseline/review/backup runs to stop driving current follow-up when a later successful same-scope run proves the work is now complete.

Why this priority: Repeatable operation families are common dashboard and Operations noise sources.

Independent Test: Create old failed repeatable runs and later successful same-scope runs for inventory, baseline, evidence/review, and backup families; verify actionability returns superseded_by_later_success.

Acceptance Scenarios:

Given an old failed inventory.sync and a later succeeded same-scope inventory.sync, When Operations follow-up filters run, Then only current actionable runs appear.
Given an old evidence generation failure and a later usable same-scope Evidence Snapshot, When actionability is evaluated, Then the old run does not drive a dashboard CTA.
Given insufficient correlation proof, When an old failed repeatable run is evaluated, Then it remains actionable or manual-review instead of being silently hidden.

User Story 3 - High-risk operations remain manual-review unless explicitly resolved (Priority: P1)

As an operator, I want restore, promotion, purge, and destructive-like operation failures to remain visible for deliberate review unless a type-specific policy can prove resolution, so dangerous operations do not disappear from attention incorrectly.

Why this priority: False calm on high-risk operations is worse than extra review.

Independent Test: Create failed restore/promotion/purge runs plus unrelated later successes and verify actionability remains requires_manual_review.

Acceptance Scenarios:

Given a failed restore.execute and a later successful backup run, When actionability is evaluated, Then the restore remains requires_manual_review.
Given a failed promotion.execute, When Operations filters current follow-up, Then it remains visible unless a future explicit policy says otherwise.
Given high-risk runs, When action eligibility renders, Then no automatic retry/re-execute/destructive action is introduced by this spec.

User Story 4 - Operations preserves history while separating current actionability (Priority: P2)

As a support/operator user, I want Operations history to show the historical outcome and the current actionability explanation separately, so audit history remains intact without confusing today's work.

Why this priority: The platform must preserve audit depth while keeping current work queues quiet and truthful.

Independent Test: Render Operations list/detail for actionable, superseded, resolved, manual-review, informational, active-stale, and succeeded runs and assert historical status remains visible while current actionability controls current follow-up.

Acceptance Scenarios:

Given a superseded old failed run, When Operations history renders, Then the row remains visible in history but not in current-follow-up filters.
Given a manual-review high-risk run, When Operations detail renders, Then current actionability explains why review is still needed.
Given an active stale run, When actionability is evaluated, Then active stale truth remains handled by existing freshness/reconciliation paths, not terminal actionability.

Functional Requirements (mandatory)

FR-367-001: The system MUST provide a central derived OperationRun actionability entry point that answers whether a terminal run is current operator follow-up truth.
FR-367-002: The resolver MUST distinguish historical execution truth, current domain truth, and UI actionability truth.
FR-367-003: The resolver MUST return at least status, actionable boolean, reason code, explanation, optional superseding run id, optional resolving model reference, and policy identifier or equivalent debug metadata.
FR-367-004: The actionability registry MUST cover every canonical operation type known to OperationCatalog and any provider/reconciliation operation types discovered by implementation.
FR-367-005: Unknown operation types MUST fail guard tests and MUST fail closed at runtime as manual-review/actionable or explicitly unsupported; no silent non-actionable default is allowed.
FR-367-006: Provider connection check policy MUST mark old provider blockers non-actionable when a later same-scope successful check exists or current same-scope ProviderConnection state proves consent_status=granted and verification_status=healthy.
FR-367-007: Repeatable sync policies MUST supersede old failed/blocked/partial runs only when later same-scope success is proven through canonical type/alias family, workspace, environment, provider/connection, and relevant selection or target scope.
FR-367-008: Baseline, evidence, review, review-pack, and backup artifact policies MUST use current repo-backed artifact truth where available and MUST avoid guessing when correlation proof is missing.
FR-367-009: Restore, promotion, purge, and destructive-like operation policies MUST default to requires_manual_review for terminal problem outcomes unless an explicit type-specific policy proves otherwise.
FR-367-010: Dashboard and current-follow-up consumers MUST stop using raw terminalFollowUp() / dashboardNeedsFollowUp() as current actionability truth.
FR-367-011: Operations history MUST keep historical terminal runs visible outside current-follow-up filters.
FR-367-012: OperationRunActionEligibility MUST consume or align with actionability so primary actions and disabled reasons do not contradict dashboard/current-follow-up state.
FR-367-013: Actionability evaluation MUST be batch-friendly for Operations list/dashboard counts and MUST avoid per-row N+1 domain queries where predictable eager loading or grouped lookup can be used.
FR-367-014: Cross-workspace and cross-environment proofs MUST NOT supersede or resolve a run.
FR-367-015: No Graph/provider calls may occur during actionability evaluation or UI render.
FR-367-016: Guard tests MUST fail when new operation types are introduced without actionability policy coverage.
FR-367-017: Guard tests MUST fail when dashboard/current-follow-up UI code directly consumes historical terminal-follow-up scopes or methods after migration.

Non-Functional Requirements

NFR-367-001: Actionability is derived, deterministic, and DB-only at render time.
NFR-367-002: Evaluation must remain tenant/workspace scoped and RBAC-respecting.
NFR-367-003: Copy must be operator-readable and must not expose raw provider payloads, secrets, stack traces, SQL, queue payloads, or internal exception text by default.
NFR-367-004: Tests must protect business truth over thin presentation helpers.
NFR-367-005: No new package, migration, queue, scheduler, asset registration, panel provider, or env var is required.

Actionability Status Semantics

The exact implementation may be an enum or value object, but the following derived statuses are required:

Status	Meaning	Counts as current dashboard follow-up?
`actionable`	Operator can or must take a current action	yes
`requires_manual_review`	Safe automatic resolution is not possible; deliberate review is required	yes
`superseded_by_later_success`	Later same-scope success proves the old terminal problem is no longer current	no
`resolved_by_current_state`	Current domain state proves the problem is no longer current	no
`informational_only`	Historical/audit information only	no
`not_terminal`	Run is active or not a terminal problem	no

Policy Groups

Provider connection checks: provider.connection.check; can resolve through later same-scope success or healthy ProviderConnection current state.
Repeatable sync operations: inventory.sync, inventory_sync, policy.sync, directory.groups.sync, directory.role_definitions.sync, compliance.snapshot, permission_posture_check; can supersede through later same-scope success.
Baseline operations: baseline.capture, baseline.compare; can resolve/supersede only through same-scope later success or current baseline artifact truth.
Evidence/review/report artifact operations: environment.review.compose, environment.review_pack.generate, tenant.evidence.snapshot.generate, evidence_snapshot.generate, plus stored report/report delivery types discovered in repo; can resolve through usable current artifact proof.
Backup operations: backup_set.update, backup.schedule.execute, backup.schedule.retention, backup.schedule.purge; update/execute may supersede with proof, purge defaults manual-review if safety proof is insufficient.
Restore/promotion/mutation operations: restore.execute, promotion.execute, destructive operation families; default manual-review for terminal problems.
Alert/notification/delivery operations: classify discovered alert delivery/evaluation types deliberately; no silent default.
Informational/historical-only operations: classify explicitly only when current product behavior proves no dashboard action is appropriate.
Remaining canonical OperationCatalog types: every canonical type not already listed, including policy snapshot/export/delete/restore, assignment fetch/restore, backup set archive/restore/delete, restore-run delete/restore/force-delete, tenant sync, policy-version prune/restore/force-delete, ops reconciliation, RBAC health check, Entra admin role scan, and any discovered aliases, must receive an explicit actionability policy or explicit informational/manual-review classification.

Out of Scope

New persisted actionability state, table, or migration.
Rewriting historical operation_runs.
Manual acknowledge/resolve UI for operations.
Full Resolved/Superseded/Historical Operations UX tabs.
Notification redesign or alert delivery UX.
Provider Connection feature expansion.
Restore/backup/promotion behavior expansion.
New destructive actions.
Global search enablement for OperationRunResource.
Filament panel/provider registration changes.
Asset or theme registration.

Success Criteria

SC-367-001: The known Provider Connection loop is impossible in dashboard and Operations CTAs.
SC-367-002: Every known canonical operation type has explicit actionability coverage or an explicit manual-review/informational policy.
SC-367-003: Dashboard Operations follow-up counts use current actionability, not raw terminal historical status.
SC-367-004: Operations list/detail preserve historical status while clearly separating current actionability.
SC-367-005: High-risk operation failures remain manual-review by default.
SC-367-006: Guard tests prevent direct UI consumption of historical terminal-follow-up scopes for current follow-up.
SC-367-007: No application render path performs Graph calls or cross-tenant/current-state leakage.

Risks

Over-abstraction risk: A policy registry can become a generic framework. Mitigation: keep policies narrow, derived-only, and limited to known operation groups.
False calm risk: Superseding too aggressively can hide real failures. Mitigation: require same-scope proof and default to actionable/manual-review when proof is incomplete.
Performance risk: Operations tables could evaluate actionability per row with N+1 queries. Mitigation: require batch evaluation and grouped lookups.
UI drift risk: Dashboard, Operations list, and detail may disagree. Mitigation: central resolver plus consumer guard tests.

Assumptions

The product remains pre-production under LEAN-001.
OperationCatalog is the primary operation-type source for this slice.
Provider Connection current state has enough persisted timestamps or later successful run proof to avoid guessing.
Existing Operations routes, RBAC, and global-search-disabled posture stay unchanged.

Open Questions

None blocking preparation. Implementation must verify the exact healthy ProviderConnection timestamp/proof fields before using current state as resolution proof.

Follow-up Spec Candidates

Manual OperationRun acknowledgement / resolve UX.
Resolved and Superseded Operations history tabs or filters beyond the minimum current-follow-up filter.
Actionability explanation UI polish if operator/support audiences need richer proof detail.
Alert-delivery actionability refinement if discovered operation families need more than v1 manual-review/informational defaults.

42 KiB Raw Permalink Blame History

Feature Specification: OperationRun Actionability System v1

Spec Candidate Check (mandatory - SPEC-GATE-001)

Repo Truth Reconciliation

Completed-Spec Guardrail

Spec Scope Fields (mandatory)

UI Surface Impact (mandatory - UI-COV-001)

UI/Productization Coverage (mandatory when UI Surface Impact is not "No UI surface impact")

Cross-Cutting / Shared Pattern Reuse (mandatory)

OperationRun UX Impact (mandatory)

Provider Boundary / Platform Core Check (mandatory)

UI / Surface Guardrail Impact (mandatory)

Decision-First Surface Role (mandatory)

Audience-Aware Disclosure (mandatory)

UI/UX Surface Classification (mandatory)

Operator Surface Contract (mandatory)

Proportionality Review (mandatory when structural complexity is introduced)

Compatibility posture

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

User Scenarios & Testing (mandatory)

User Story 1 - Dashboard stops false Provider Connection CTA loop (Priority: P1)

User Story 2 - Repeatable operations are superseded by later same-scope success (Priority: P1)

User Story 3 - High-risk operations remain manual-review unless explicitly resolved (Priority: P1)

User Story 4 - Operations preserves history while separating current actionability (Priority: P2)

Functional Requirements (mandatory)

Non-Functional Requirements

Actionability Status Semantics

Policy Groups

Out of Scope

Success Criteria

Risks

Assumptions

Open Questions

Follow-up Spec Candidates

42 KiB

Raw Permalink Blame History