ahmido 564da05096 feat: implement operation run actionability system (#439 )

This PR introduces the Operation Run Actionability System.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #439

2026-06-08 13:34:25 +00:00

17 KiB

Raw Permalink Blame History

Implementation Plan: OperationRun Actionability System v1

Branch: 367-operationrun-actionability-system | Date: 2026-06-08 | Spec: specs/367-operationrun-actionability-system/spec.md Input: Feature specification from specs/367-operationrun-actionability-system/spec.md

Summary

Introduce a derived OperationRun actionability layer that separates historical terminal execution truth from current UI follow-up truth. The implementation will keep operation_runs immutable history intact, classify every known operation type deliberately, migrate dashboard/Operations/current-follow-up consumers away from raw terminalFollowUp() semantics, and prove the known Provider Connection CTA loop is closed.

Technical Context

Language/Version: PHP 8.4.15, Laravel 12.52.0
Primary Dependencies: Filament 5.2.1, Livewire 4.1.4, Pest 4.3.1, PostgreSQL
Storage: Existing PostgreSQL tables only; no migration planned
Testing: Pest 4 Unit, Feature, Architecture/guard, optional Browser smoke
Validation Lanes: fast-feedback, confidence, browser only if rendered UI changes
Target Platform: Laravel Sail locally, Dokploy container deployment for staging/production
Project Type: Laravel monolith under apps/platform
Performance Goals: DB-only render-time evaluation; batch-friendly dashboard and Operations table evaluation; no Graph calls during UI render
Constraints: preserve workspace/managed-environment isolation, RBAC, OperationRunService lifecycle ownership, global-search-disabled OperationRun posture, and existing Operations routes
Scale/Scope: Known operation types in OperationCatalog, provider registry, reconciliation registry, jobs, factories, seeders, and current tests

UI / Surface Guardrail Plan

Guardrail scope: changed existing dashboard and Operations status/follow-up surfaces.
Affected routes/pages/actions/states/navigation/panel/provider surfaces:
- /admin/workspaces/{workspace}/operations
- /admin/workspaces/{workspace}/operations/{run}
- tenant dashboard widgets NeedsAttention and BaselineCompareNow
- shell active-work hint via BulkOperationProgress / ActiveRuns
No-impact class, if applicable: N/A.
Native vs custom classification summary: mixed existing native Filament table/detail plus existing dashboard widget Blade/Livewire surfaces.
Shared-family relevance: status messaging, dashboard signals, operation links, current follow-up filters, primary action guidance.
State layers in scope: widget, page, detail, shell hint, URL query for Operations filters.
Audience modes in scope: operator-MSP and support-platform; no customer-facing surface.
Decision/diagnostic/raw hierarchy plan: current actionability is default-visible; historical status remains visible; raw/support diagnostics stay secondary/collapsed/capability-gated.
Raw/support gating plan: unchanged existing Operations detail gating.
One-primary-action / duplicate-truth control: use one actionability result for dashboard count, Operations current-follow-up filters, and OperationRun action eligibility.
Handling modes by drift class or surface: review-mandatory for high-risk operation families and insufficient correlation proof.
Repository-signal treatment: hard-stop-candidate for any UI consumer that still directly treats raw terminal historical status as current follow-up after migration.
Special surface test profiles: monitoring-state-page, dashboard-signal, shared-detail-family.
Required tests or manual smoke: Unit actionability policies, Feature dashboard/Operations regressions, guard tests, optional browser smoke for the provider loop.
Exception path and spread control: none expected.
Active feature PR close-out entry: Guardrail + Smoke Coverage.
UI/Productization coverage decision: implementation must update existing coverage artifacts or document checked no-update rationale.
Coverage artifacts to update: existing Operations/dashboard entries only if visible surface contract materially changes; otherwise record no-update rationale in close-out.
No-impact rationale: N/A.
Navigation / Filament provider-panel handling: no panel/provider changes; providers remain in apps/platform/bootstrap/providers.php.
Screenshot or page-report need: screenshot only if rendered dashboard/Operations layout or copy changes materially.

Shared Pattern & System Fit

Shared pattern touched: OperationRun monitoring, dashboard attention, links, current follow-up, and action eligibility.
Existing shared paths to reuse:
- App\Support\OperationCatalog
- App\Support\OperationRunLinks
- App\Support\Operations\OperationRunActionEligibility
- App\Support\Operations\Reconciliation\OperationRunReconciliationRegistry
- App\Support\Operations\OperationRunCorrelationResolver
- App\Support\OpsUx\OperationUxPresenter
- App\Support\OpsUx\ActiveRuns
New shared path allowed: bounded actionability resolver/registry/policies under App\Support\Operations\Actionability or equivalent local namespace.
Forbidden drift: no parallel UI action framework, no persisted actionability table, no ad-hoc dashboard-only special case, no Graph calls, no broad Operations UX rebuild.
OperationRun UX contract: no run-start or terminal-notification changes. Existing OperationRun lifecycle remains service-owned.

Constitution Check

Inventory-first / snapshots-second: N/A; uses existing OperationRun and domain state only.
Read/write separation: PASS; actionability is read-only and DB-only.
Single Contract Path to Graph: PASS; no Graph calls allowed in evaluation or render.
Proportionality First / BLOAT-001: REQUIRED and satisfied in spec.md; new derived status/resolver/registry is justified by a confirmed false CTA loop and multiple current consumers.
No new persisted truth: PASS; no migration/table.
No new state without behavioral consequence: PASS; actionability states change dashboard counts, CTA routing, filters, and manual-review behavior.
Workspace/Tenant isolation: REQUIRED; same-workspace/same-environment proof only unless operation is explicitly workspace-owned.
RBAC-UX: REQUIRED; existing policies stay authoritative; UI state is not security.
OperationRun standards: REQUIRED; historical execution truth remains OperationRun; current actionability is separate derived truth.
UI-COV-001: REQUIRED; existing reachable UI surfaces change in status/follow-up semantics.
TEST-GOV-001: REQUIRED; test lanes and fixture costs are explicit.
LEAN-001: PASS; no legacy compatibility shims or data migration.

Project Structure

Likely application surfaces for later implementation:

apps/platform/app/Models/OperationRun.php
apps/platform/app/Support/OperationCatalog.php
apps/platform/app/Support/OperationRunLinks.php
apps/platform/app/Support/Operations/Actionability/
apps/platform/app/Support/Operations/OperationRunActionEligibility.php
apps/platform/app/Support/Operations/Reconciliation/
apps/platform/app/Support/OpsUx/ActiveRuns.php
apps/platform/app/Support/OpsUx/OperationUxPresenter.php
apps/platform/app/Support/GovernanceInbox/GovernanceInboxSectionBuilder.php
apps/platform/app/Support/EnvironmentDashboard/EnvironmentDashboardSummaryBuilder.php
apps/platform/app/Support/Workspaces/WorkspaceOverviewBuilder.php
apps/platform/app/Filament/Widgets/Dashboard/NeedsAttention.php
apps/platform/app/Filament/Widgets/Dashboard/BaselineCompareNow.php
apps/platform/app/Filament/Widgets/Operations/OperationsWorkbenchStats.php
apps/platform/app/Filament/Pages/Monitoring/Operations.php
apps/platform/app/Filament/Resources/OperationRunResource.php
apps/platform/app/Filament/Pages/Operations/TenantlessOperationRunViewer.php
apps/platform/app/Livewire/BulkOperationProgress.php
specs/367-operationrun-actionability-system/repo-truth-map.md
apps/platform/lang/en/localization.php
apps/platform/lang/de/localization.php
apps/platform/tests/Unit/Support/Operations/
apps/platform/tests/Feature/Operations/
apps/platform/tests/Feature/Monitoring/
apps/platform/tests/Feature/Guards/
apps/platform/tests/Browser/

Data Model / Persistence Impact

No new tables.
No migration.
No new persisted actionability column.
Tenant-bound OperationRun records remain tenant-owned operational artifacts for authorization purposes even though canonical Operations routes are workspace-scoped; workspace-only runs are valid only for explicitly workspace-owned operation types.
Existing operation_runs.context may be read for correlation proof; implementation must not mutate historical context as part of evaluation.
Related current-state proof may read existing models such as ProviderConnection, BaselineSnapshot, EvidenceSnapshot, EnvironmentReview, ReviewPack, BackupSet, and RestoreRun only where the policy can prove same-scope ownership.

Domain Model / Service Approach

Add a derived actionability result object.
Add a derived actionability status enum/value object.
Add a resolver with single-run and batch evaluation APIs.
Add a registry that maps canonical operation families to policies and fails coverage tests when known types are uncovered.
Implement initial policies:
- provider connection checks
- repeatable sync operations
- baseline operations
- evidence/review/review-pack artifact operations
- backup operations
- restore/promotion/destructive-like operations
- alert/notification/informational fallback decisions discovered during implementation
Explicitly classify every remaining OperationCatalog::canonicalInventory() entry as actionable, manual-review, superseded-capable, resolved-by-current-state-capable, or informational-only.
Reuse OperationCatalog for canonical/alias type resolution.
Reuse existing reconciliation proof where it is already same-scope and safe.
Feed result into existing OperationRunActionEligibility instead of replacing it.

Policy Semantics

Actionable: current operator action exists and should count in dashboard/current filters.
Requires manual review: current action remains necessary because safe automatic resolution is not provable.
Superseded by later success: later same-scope successful run proves old terminal problem no longer current.
Resolved by current state: current domain model proves old terminal problem no longer current.
Informational only: historical record only; never dashboard CTA.
Not terminal: active/non-problem run; active stale remains owned by existing freshness/reconciliation path.

UI / Filament / Livewire Implications

Filament v5 / Livewire v4.0+ compliance remains required; current app has Livewire 4.1.4.
Panel providers remain registered in apps/platform/bootstrap/providers.php; no panel provider changes planned.
OperationRunResource remains protected static bool $isGloballySearchable = false; no global search changes.
No destructive actions are added. Existing high-impact Operations actions continue using existing confirmation/authorization patterns.
Dashboard widgets must show current actionability counts, not raw historical terminal follow-up.
Operations list/current-follow-up filters must distinguish current actionable/manual-review rows from historical resolved/superseded rows.
Operations workbench stats, Governance Inbox, environment dashboard summaries, workspace overview signals, and OperationUxPresenter decision copy must use actionability for current-follow-up truth or explicitly remain historical-only.
Operation detail should show historical status and current actionability separately if implementation touches rendered detail.
Shell active-run hint must not mix terminal actionability with active progress semantics; existing terminal follow-up visibility in ActiveRuns::shellVisibleQueryForTenantId() must be removed or converted into a distinct actionability-backed non-active signal.

RBAC / Policy Implications

Existing OperationRunPolicy and workspace/environment entitlement checks remain authoritative.
Any actionability result that references a superseding run or resolving model must be same workspace and same managed environment or explicitly workspace-owned.
Non-members must not learn that hidden runs or hidden resolving models exist.
Dashboard counts and Operations filters must run inside actor-visible scope.

Audit / Observability / Evidence Implications

No new AuditLog writes for read-only evaluation.
No changes to OperationRun lifecycle transitions.
Historical terminal status remains audit truth.
Current actionability explanation is derived UI truth; it should be testable and explainable but not persisted.

Performance Plan

Use batch evaluation for dashboard/Operations collections.
Group lookups by operation family and scope.
Avoid per-row ProviderConnection, artifact, or later-run queries in table render and in multi-tenant aggregate builders such as workspace overview and governance inbox.
Keep evaluation DB-only and deterministic.
Add tests or code-review tasks to catch obvious N+1 regressions in actionability consumer paths.

Test Strategy

Unit:
- resolver/result/status behavior
- provider connection policy
- repeatable sync policy
- baseline/artifact/backup policies
- high-risk manual-review policy
- registry coverage against operation catalog
Feature:
- dashboard provider loop regression
- dashboard/BaselineCompareNow current follow-up counts
- Operations filters and action links
- cross-workspace/cross-environment non-resolution
- action eligibility alignment
Guard / Architecture:
- no direct current-follow-up UI consumption of terminalFollowUp() / dashboardNeedsFollowUp()
- new operation types require actionability coverage
- no Graph calls in actionability/render tests through fail-hard binding where practical
Browser:
- bounded provider-loop smoke if UI rendering changes: old provider blocker, current healthy state, dashboard no false CTA, Operations history still visible.

Rollout / Deployment Considerations

Env vars: none.
Migrations: none.
Queues/workers: none.
Scheduler: none.
Storage/volumes: none.
Filament assets: no new assets; filament:assets not newly required by this spec.
Staging/production: validate on Staging before Production because dashboard current-follow-up semantics change operator attention routing.
Rollback: code rollback restores old raw terminal-follow-up behavior; no data rollback needed.

Implementation Phases

Phase 1 - Repo Truth Inventory

Confirm all known operation types from OperationCatalog, provider registry, reconciliation registry, jobs, factories, seeders, and tests.
Confirm every current consumer of terminal follow-up / dashboard follow-up / problem class.
Confirm ProviderConnection current-state proof fields.

Phase 2 - Failing Proof

Add Unit/Feature/Guard tests before runtime changes where practical.
Prove the provider CTA loop, repeatable superseded success, high-risk manual review, cross-scope non-resolution, and guard coverage.

Phase 3 - Core Actionability Contract

Add status/result/resolver/registry/policies.
Add batch evaluation APIs.
Register known operation families.

Phase 4 - Consumer Migration

Migrate Dashboard, BaselineCompareNow, Operations filters/list/detail, OperationRunActionEligibility, OperationRunLinks where relevant, and ActiveRuns boundaries.
Keep history visible.

Phase 5 - Validation and Close-Out

Run focused tests and optional browser smoke.
Update or document UI coverage artifacts.
Record Filament/Livewire/global-search/destructive-action/asset/deployment posture in close-out.

Risk Controls

Default to actionable/manual-review when correlation proof is incomplete.
Keep high-risk mutation families manual-review.
Require same-scope proof for superseded/resolved outcomes.
Do not evaluate with Graph calls.
Do not add persistence.
Do not remove terminalFollowUp() until all current consumers are migrated and guard tests exist.

Implementation Verification Notes

No product question blocks implementation. The implementation loop must verify which exact ProviderConnection timestamp or verification proof is safe for resolved_by_current_state when no later successful run exists. If no reliable field exists, the provider policy can still resolve via later same-scope successful runs and must otherwise leave the old blocker actionable/manual-review.

Readiness Assessment

Ready for implementation once tasks.md is followed. The only verification note is ProviderConnection proof-field selection, not a product blocker, because the provider policy can rely on later same-scope successful runs if timestamp proof is insufficient.

17 KiB Raw Permalink Blame History