# Implementation Plan: Spec 391 - Operations Hub Stability and Debug-Safe Runtime **Branch**: `391-operations-hub-stability-debug-safe-runtime` | **Date**: 2026-06-20 | **Spec**: `specs/391-operations-hub-stability-debug-safe-runtime/spec.md` **Input**: Feature specification from `/specs/391-operations-hub-stability-debug-safe-runtime/spec.md` ## Summary Stabilize the existing admin Operations hub so the environment-filtered route renders quickly and safely, then add focused productization browser-smoke guardrails for the exact debug/runtime leakage observed in BUG-001 and BUG-009. The work stays inside the Operations render/query/runtime-smoke surface and must not change Evidence, Provider, Review Pack, Restore, dashboard, provider mutation, export, or customer delivery semantics. ## Technical Context **Language/Version**: PHP 8.4.15, Laravel 12.52, Filament 5.2.1, Livewire 4.1.4. **Primary Dependencies**: Filament v5, Livewire v4, Pest 4, PostgreSQL, existing browser smoke helpers. **Storage**: Existing PostgreSQL `operation_runs`, `workspaces`, and `managed_environments`; no new storage expected. **Testing**: Pest 4 feature/Livewire/browser tests. **Validation Lanes**: fast-feedback/confidence for feature tests; browser for productization smoke; targeted formatting. **Target Platform**: Laravel admin panel at `/admin`, local Sail/Dokploy-style container runtime. **Project Type**: Laravel monolith under `apps/platform`. **Performance Goals**: Operations route under 3 seconds after auth for audited data shape; bounded/paginated index render. **Constraints**: No migrations unless proven and spec/plan updated first; no seeders; no queues/jobs that mutate provider/customer state; no Graph/provider calls in render; do not increase PHP max execution time. **Scale/Scope**: Existing Operations hub, environment-filtered route, runtime-smoke checks. ## UI / Surface Guardrail Plan - **Guardrail scope**: changed existing operator-facing Operations surface plus workflow-only productization browser smoke guardrail. - **Affected routes/pages/actions/states/navigation/panel/provider surfaces**: - `/admin/workspaces/{workspace}/operations` - `/admin/workspaces/{workspace}/operations?environment_id={managedEnvironment}` - `App\Filament\Pages\Monitoring\Operations` - `App\Filament\Resources\OperationRunResource` - Existing dashboard/workspace drilldowns that link to Operations - Productization-smoke browser route checks - **No-impact class, if applicable**: N/A. - **Native vs custom classification summary**: Native Filament page/table/resource plus existing Operations Blade composition; no new visual system. - **Shared-family relevance**: OperationRun monitoring/detail family, action links, status badges, browser-smoke runtime guard. - **State layers in scope**: URL-query `environment_id`, page/table filters, session filter state where already used, browser console/network/DOM assertions. - **Audience modes in scope**: operator-MSP, manager, support-platform. - **Decision/diagnostic/raw hierarchy plan**: Operations default-visible list/workbench remains decision-first; raw context, stack traces, provider payloads, and source links remain diagnostic-only or absent from productization-smoke output. - **Raw/support gating plan**: no new raw/support exposure; smoke must fail if debug pages/source links/raw stack traces become visible. - **One-primary-action / duplicate-truth control**: preserve existing open/detail action as the dominant safe next step; do not add competing retry/export/destructive actions. - **Handling modes by drift class or surface**: review-mandatory for Operations render path and runtime-smoke guard; report-only for existing UI-016 coverage unless implementation materially changes route/archetype. - **Repository-signal treatment**: review-mandatory because this touches a strategic monitoring surface and adds Browser lane proof. - **Special surface test profiles**: `monitoring-state-page` and `global-context-shell`. - **Required tests or manual smoke**: Feature/Livewire render/scoping/bounded tests plus Browser productization smoke. - **Exception path and spread control**: none expected. - **Active feature PR close-out entry**: Guardrail / Exception / Smoke Coverage. - **UI/Productization coverage decision**: Existing UI-016 coverage remains valid; implementation must update audit registry only if visible archetype/route changes exceed stability-state changes. - **Coverage artifacts to update**: none by default; screenshots under the spec artifacts folder for final browser verification. - **No-impact rationale**: N/A. - **Navigation / Filament provider-panel handling**: no panel provider changes; provider registration remains `apps/platform/bootstrap/providers.php`. - **Screenshot or page-report need**: screenshot required for final smoke evidence; no full page report unless implementation changes the Operations page archetype. ## Shared Pattern & System Fit - **Cross-cutting feature marker**: yes, bounded. - **Systems touched**: Operations hub, OperationRunResource table/list rendering, OperationRun links/presenters, productization browser smoke, Debugbar/Vite asset-smoke controls. - **Shared abstractions reused**: `OperationRunLinks`, `OperationUxPresenter`, `BadgeCatalog`, `BadgeRenderer`, `TablePaginationProfiles`, `SuppressDebugbarForSmokeRequests`, `PanelThemeAsset`, existing Pest Browser smoke patterns. - **New abstraction introduced? why?**: none expected. If needed, add only a small test/support helper for productization-smoke runtime assertions. - **Why the existing abstraction was sufficient or insufficient**: Existing OperationRun UI semantics are sufficient; existing smoke coverage missed BUG-001/BUG-009 under the audited route and runtime mode. - **Bounded deviation / spread control**: Any new smoke helper must be test/support-local, explicitly opt-in, and must not disable normal local Debugbar/Vite behavior. ## OperationRun UX Impact - **Touches OperationRun start/completion/link UX?**: yes, link/render path only. - **Central contract reused**: `OperationRunLinks`, existing tenantless OperationRun detail viewer, OperationRunResource table conventions. - **Delegated UX behaviors**: `Open operation` / `View run` URL resolution stays delegated to existing helpers; no queued toast or terminal notification change. - **Surface-owned behavior kept local**: environment filter application, bounded list rendering, controlled empty/error/loading state, browser runtime assertions. - **Queued DB-notification policy**: N/A. - **Terminal notification path**: N/A. - **Exception path**: none. ## Provider Boundary & Portability Fit - **Shared provider/platform boundary touched?**: no. - **Provider-owned seams**: none. - **Platform-core seams**: OperationRun execution truth and Operations monitoring view only. - **Neutral platform terms / contracts preserved**: workspace, managed environment, operation, OperationRun, execution truth. - **Retained provider-specific semantics and why**: none added. - **Bounded extraction or follow-up path**: none. ## Constitution Check - Inventory-first: N/A, no inventory truth changes. - Read/write separation: read-only render/smoke work only; no provider/customer mutations. - Graph contract path: no Graph calls; render path must remain DB-only. - Deterministic capabilities: existing entitlement/capability paths retained. - RBAC-UX: admin plane route, workspace membership, environment entitlement, 404 not-found semantics for non-entitled scopes; UI visibility is not authorization. - Workspace isolation: Operations query and summary/filter options must scope by current workspace before rows render. - Tenant isolation: tenant-bound runs must be visible only when actor is entitled to referenced managed environment. - Run observability: no new OperationRun creation/status transition; existing OperationRun truth remains the source. - OperationRun start UX: no start UX change; links reuse central helpers. - Ops-UX lifecycle: no `OperationRun.status` / `OperationRun.outcome` transitions. - Ops-UX summary counts: no new keys; default list render must not parse large summary/context payloads unnecessarily. - Automation: no queues/jobs are triggered by this spec. - Data minimization: debug pages, stack traces, raw context, provider payloads, `_debugbar`, and source links must not appear in productization-smoke mode. - Test governance: Feature + Browser lanes are explicit and bounded. - Proportionality: no new persistence, domain abstraction, status family, taxonomy, or cross-domain framework. - Filament-native UI: preserve native Filament table/page/resource semantics; no new ad-hoc status styling. - UI/Productization coverage: existing UI-016 coverage remains valid unless implementation discovers material route/archetype change. ## Test Governance Check - **Test purpose / classification by changed surface**: Feature/Livewire for render/scoping/bounded query proof; Browser for runtime/debug leakage; Unit only if a helper is introduced. - **Affected validation lanes**: fast-feedback/confidence and browser. - **Why this lane mix is the narrowest sufficient proof**: Feature tests catch deterministic server render/scoping/performance issues; Browser test catches JS globals, Vite dev-client, Debugbar/source-link, and visible debug page regressions. - **Narrowest proving command(s)**: - `cd apps/platform && php vendor/bin/pest tests/Feature/Monitoring/Spec391OperationsHubRendersWithEnvironmentFilterTest.php` - `cd apps/platform && php vendor/bin/pest tests/Feature/Monitoring/Spec391OperationRunResourceIndexPerformanceTest.php` - `cd apps/platform && php artisan test --compact tests/Browser/Spec391OperationsHubProductizationSmokeTest.php` - `cd apps/platform && php vendor/bin/pint --test ` - `git diff --check` - **Fixture / helper / factory / seed / context cost risks**: Use factories and smoke-login helpers; no seeders; no provider setup; no real Graph; no queue mutation. - **Expensive defaults or shared helper growth introduced?**: no; any browser helper must be explicit and local. - **Heavy-family additions, promotions, or visibility changes**: one explicit browser smoke file. - **Surface-class relief / special coverage rule**: special `monitoring-state-page` / `global-context-shell` coverage required. - **Closing validation and reviewer handoff**: reviewers should check render timing/query bounds, runtime smoke assertions, and no unrelated semantic changes. - **Budget / baseline / trend follow-up**: document actual render timing and whether lower-level guard substitutes for CI browser timing. - **Review-stop questions**: Did implementation fix the expensive path, or merely catch/mask it? Did any helper widen browser/default setup? Did any provider/evidence/review/restore semantics change? - **Escalation path**: document-in-feature. - **Active feature PR close-out entry**: Guardrail / Exception / Smoke Coverage. - **Why no dedicated follow-up spec is needed**: This is a direct audit-regression fix with bounded smoke guardrails; broader BUG-009/system branding follow-up remains separate if needed. ## Project Structure ### Documentation (this feature) ```text specs/391-operations-hub-stability-debug-safe-runtime/ ├── spec.md ├── plan.md ├── tasks.md ├── checklists/ │ └── requirements.md └── artifacts/ ├── verification.md └── screenshots/ ``` ### Source Code (repository root) Implementation is expected to remain in existing Laravel app and test paths: ```text apps/platform/app/Filament/Pages/Monitoring/Operations.php apps/platform/app/Filament/Resources/OperationRunResource.php apps/platform/app/Models/OperationRun.php apps/platform/app/Http/Middleware/SuppressDebugbarForSmokeRequests.php apps/platform/app/Support/Filament/PanelThemeAsset.php apps/platform/tests/Feature/Monitoring/ apps/platform/tests/Browser/ apps/platform/tests/Unit/Filament/ ``` **Structure Decision**: Existing Laravel/Filament app structure under `apps/platform`; no new base folders and no migrations expected. ## Complexity Tracking | Violation | Why Needed | Simpler Alternative Rejected Because | |-----------|------------|-------------------------------------| | N/A | No constitution violation planned | N/A | ## Proportionality Review - **Current operator problem**: A common Operations drilldown fails with timeout/500/debug page and productization browser validation is polluted by debug/runtime leakage. - **Existing structure is insufficient because**: Existing route/tests did not catch environment-filtered render-path cost or productization-smoke runtime leakage. - **Narrowest correct implementation**: Stabilize existing query/render path and add focused runtime leak assertions. - **Ownership cost created**: Small targeted test/browser smoke upkeep. - **Alternative intentionally rejected**: Increase timeout, hide route, broad catch-all, broad UI redesign, broad productization infrastructure rewrite. - **Release truth**: Current productization blocker. ## Technical Approach 1. Reproduce or confirm BUG-001 in browser/Playwright or by targeted route render before editing. 2. Inspect the current render path: - `Operations::decisionWorkbench()` - `Operations::selectedWorkbenchOperation()` - `Operations::topOperationFromQuery()` - `Operations::summaryCount()` - `Operations::table()` - `OperationRunResource::table()` - OperationRun accessors/casts used by status/outcome/next-action/scope columns. 3. Identify the expensive path rather than masking it. Likely investigation areas: - `dashboardNeedsFollowUp()` and current terminal/actionability scopes. - `topOperationFromQuery()` fetching up to 50 full rows and sorting with `requiresOperatorReview()` / `problemClass()` in PHP. - Table columns invoking `actionDecision()`, `primaryActionUrl()`, `targetScopeDisplay()`, `history*Description()`, or badge renderers for every visible row. - `context`, `failure_summary`, and `summary_counts` JSON casts hydrated by `select *`. - Filter option queries for type/initiator scanning historical rows. - Relationship access for tenant/user/related artifacts. 4. Fix by bounding and scoping: - Apply workspace/environment entitlement in base queries. - Keep pagination and page-size profile. - Use selective eager loading only for relationships actually displayed. - Avoid full JSON hydration on index rows where possible. - Move heavy proof/diagnostic work to detail or collapsed/support surfaces. - Replace PHP sorting over hydrated runs with query-level ordering or a smaller deterministic candidate set when possible. 5. Add controlled states: - No-runs empty state for active scope. - Productization-safe non-debug failure assertions. - No false health claims. 6. Add productization-smoke path: - Prefer existing smoke-login and `SuppressDebugbarForSmokeRequests`. - Prefer existing `PanelThemeAsset` / built asset fallback behavior. - Fail on the exact BUG-009 signatures in smoke mode only. ## Data / Migration Implications - No migrations are expected. - If an index becomes necessary to meet the render budget, stop and update `spec.md` and `plan.md` with the proven query plan, migration safety, rollback/forward notes, and PostgreSQL lane coverage before implementing the migration. ## Rollout Considerations - No environment variables are expected unless implementation proves a narrow productization-smoke-only flag is needed. - No queue, scheduler, storage, or provider credential changes. - Normal local Debugbar/Vite developer workflow must remain unchanged outside explicit productization-smoke sessions. - Deployment asset strategy remains normal Filament/Vite deployment; if assets are registered or changed, include `cd apps/platform && php artisan filament:assets` in deploy notes. ## Risk Controls - Do not change OperationRun lifecycle/status/outcome semantics. - Do not add new operation types or summary-count keys. - Do not add unscoped cache. - Do not call Graph or remote provider clients from render. - Do not dispatch provider/restore/export jobs. - Do not rewrite completed Operations productization specs. - Use browser as final source of truth for route status/runtime leakage. ## Implementation Phases ### Phase 1 - Baseline and focused regression tests Confirm current failure or relevant logs, then add failing feature/browser tests around environment-filtered render, scoping, bounded rows, and runtime leakage. ### Phase 2 - Operations render-path stabilization Optimize only the existing Operations query/table/workbench path. Preserve user-visible workbench semantics while eliminating unbounded scans, heavy per-row JSON/accessor work, and unrelated relationship traversal. ### Phase 3 - Controlled states and safe detail links Ensure empty/error/loading states are clear and that safe OperationRun detail links still work for authorized records. ### Phase 4 - Productization-smoke runtime guardrail Make the browser smoke fail on BUG-009 signatures in productization-smoke mode without breaking normal local development. ### Phase 5 - Verification and close-out Run targeted tests, formatting checks, browser smoke, direct route verification, and complete `artifacts/verification.md`.