TenantAtlas/specs/391-operations-hub-stability-debug-safe-runtime/tasks.md
ahmido 40b866604a feat: add operations hub stability and safety runtime checks (#462)
Automated PR created by Codex via Gitea API.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #462
2026-06-20 14:16:20 +00:00

101 lines
12 KiB
Markdown

# Tasks: Spec 391 - Operations Hub Stability and Debug-Safe Runtime
**Input**: Design documents from `/specs/391-operations-hub-stability-debug-safe-runtime/`
**Prerequisites**: `plan.md`, `spec.md`
**Tests**: Required. Use Pest 4 feature/Livewire/browser coverage. No seeders, provider syncs, restore execution, exports, deletes, archives, force-deletes, notifications, or customer-facing delivery actions.
## Test Governance Checklist
- [x] Lane assignment is named and is the narrowest sufficient proof for the changed behavior.
- [x] New or changed tests stay in the smallest honest family, and the browser addition is explicit.
- [x] Shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default; any widening is isolated or documented.
- [x] Planned validation commands cover the change without pulling in unrelated lane cost.
- [x] The declared surface test profile (`monitoring-state-page` plus `global-context-shell`) is explicit.
- [x] Any material budget, baseline, trend, or escalation note is recorded in the active spec or PR.
## Phase 1: Setup and Safety Boundary
- [x] T001 Record initial `git status --short`, current branch, and latest commit in `specs/391-operations-hub-stability-debug-safe-runtime/artifacts/verification.md`.
- [x] T002 Re-read `specs/391-operations-hub-stability-debug-safe-runtime/spec.md`, `plan.md`, `tasks.md`, `specs/browser-productization-bug-audit/browser-bug-report.md`, and completed context-only Specs 328, 361, 362, 364, 367, and 377 before editing runtime code.
- [x] T003 Confirm the implementation scope excludes Evidence, Provider, Review Pack, Restore, dashboard semantics, provider mutations, restore jobs, exports, deletes, archives, force-deletes, notifications, customer-facing delivery actions, migrations, seeders, and `max_execution_time` changes.
- [x] T004 Confirm Filament v5 / Livewire v4.0+ compliance and no Livewire v3/Filament legacy API use in touched code.
- [x] T005 Confirm panel provider registration remains `apps/platform/bootstrap/providers.php` and no panel provider path changes are required.
- [x] T006 Confirm `OperationRunResource` remains non-globally-searchable, or update this spec before changing global-search posture.
- [x] T007 Confirm no new persisted entity, migration, enum/status family, operation type, summary-count key, or domain abstraction is needed; if one appears necessary, stop and update `spec.md` and `plan.md` first.
## Phase 2: Reproduce and Locate Root Cause
- [x] T008 Reproduce or confirm BUG-001 with the browser/Playwright or a targeted route request for `/admin/workspaces/3/operations?environment_id=4`, recording HTTP status, elapsed time, and visible/debug output in `artifacts/verification.md`.
- [x] T009 Inspect the latest Laravel error/log context for the audited max-execution failure without mutating data; record whether `HasAttributes.php:1577` still appears.
- [x] T010 Inspect `apps/platform/app/Filament/Pages/Monitoring/Operations.php` render methods, especially `decisionWorkbench()`, `selectedWorkbenchOperation()`, `topOperationFromQuery()`, `summaryCount()`, `table()`, `scopedSummaryQuery()`, filter handling, and environment entitlement helpers.
- [x] T011 Inspect `apps/platform/app/Filament/Resources/OperationRunResource.php` table columns, filters, actions, URL builders, status/outcome descriptions, target-scope helpers, and any helpers used per visible row.
- [x] T012 Inspect `apps/platform/app/Models/OperationRun.php` accessors/casts used by the list and workbench, including `context`, `failure_summary`, `summary_counts`, `problemClass()`, `freshnessState()`, `requiresOperatorReview()`, and actionability-related helpers.
- [x] T013 Identify whether the render cost comes from unbounded row hydration, query option scans, relationship N+1, JSON casts/accessors, PHP sorting over hydrated rows, actionability/freshness evaluation, or table column/action helper work; record the confirmed root cause in `artifacts/verification.md`.
## Phase 3: Automated Regression Tests First
- [x] T014 Add `apps/platform/tests/Feature/Monitoring/Spec391OperationsHubRendersWithEnvironmentFilterTest.php` proving an authenticated admin can open the Operations route with an entitled environment filter, receives a successful response, sees Operations title/context/table or empty state, and does not see Laravel debug-page, stack-trace, or `Maximum execution time` text.
- [x] T015 Add a test in the same feature file proving the environment filter remains scoped: rows/counts/filter context for another environment or workspace do not appear, and non-entitled environment filters fail closed according to existing 404/filter-discard contract.
- [x] T016 Add a test proving dashboard/workspace links that target Operations with `environment_id` produce the canonical Operations URL and the target route renders.
- [x] T017 Add `apps/platform/tests/Feature/Monitoring/Spec391OperationRunResourceIndexPerformanceTest.php` with more operation runs than a table page and large `context`/`failure_summary` payloads, asserting the index remains bounded and does not require unbounded rows to render.
- [x] T018 Add or extend a no-Graph render guard proving Operations index/workbench rendering never invokes `GraphClientInterface` or provider clients.
- [x] T019 Add a focused empty-state test proving no-runs for an entitled environment displays controlled copy and no false health claim.
- [x] T020 Add a loading-state/context test where feasible, or a browser assertion, proving the Operations route preserves the active workspace/environment filter and does not flash raw framework/debug output while loading.
- [x] T021 Add a safe detail-link test proving at least one authorized row still opens the tenantless OperationRun detail route.
- [x] T022 If a smoke/runtime helper is introduced, add a Unit or Feature test proving it is opt-in and does not disable Debugbar/Vite behavior for normal local requests.
## Phase 4: Browser/Productization Smoke Tests
- [x] T023 Add `apps/platform/tests/Browser/Spec391OperationsHubProductizationSmokeTest.php` using existing browser smoke-login/auth fixture patterns where possible.
- [x] T024 Make the browser test discover or create a safe workspace/environment fixture instead of hardcoding ids, unless the audited workspace 3/environment 4 fixture is explicitly present and safe to use.
- [x] T025 Browser-smoke the authenticated route `/admin/workspaces/{workspace}/operations?environment_id={environment}` and assert page renders successfully with `Operations`/`Operations Hub`, active environment context, and bounded table or controlled empty state.
- [x] T026 Add a browser render-time guard targeting under 3 seconds after authentication for the audited local data shape; if too flaky for CI, keep browser timing recorded and rely on a deterministic lower-level render/query guard.
- [x] T027 Add browser assertions that no visible Laravel debug page, stack trace, `Maximum execution time`, `_debugbar`, `phpstorm://open`, raw source links, or debug exception text is visible in productization-smoke mode.
- [x] T028 Add browser console assertions that fail on missing Filament/Livewire/Alpine runtime globals needed by the route, including `filamentSchema is not defined`, `filamentSchemaComponent is not defined`, `filamentTable is not defined`, and `selectFormComponent is not defined`.
- [x] T029 Add browser network/console assertions that fail on Vite dev-client connection failures for `http://localhost:5173/@vite/client` when running in productization-smoke mode.
- [x] T030 Add browser network assertions that fail on Operations HTTP 500s and `_debugbar` requests in productization-smoke mode.
- [x] T031 Capture the final screenshot under `specs/391-operations-hub-stability-debug-safe-runtime/artifacts/screenshots/` or record why screenshot capture is unavailable.
## Phase 5: Operations Render-Path Stabilization
- [x] T032 Update `apps/platform/app/Filament/Pages/Monitoring/Operations.php` so workspace and environment entitlement filters apply at the query level before list rows, summary counts, selected workbench operation, and filter state render.
- [x] T033 Keep the Operations table paginated with `TablePaginationProfiles::resource()` or a narrower documented equivalent.
- [x] T034 Bound `selectedWorkbenchOperation()` / `topOperationFromQuery()` so it does not hydrate unbounded rows or sort expensive accessor-derived state across large result sets.
- [x] T035 Replace or defer expensive per-row work in `OperationRunResource::table()` columns/actions; keep default list columns useful without parsing raw context/failure payloads for every visible row.
- [x] T036 Restrict eager loading to relationships actually rendered on the index (`tenant`, `user`, or narrower selected columns) and avoid N+1 relationship traversal for status/scope/next-action display.
- [x] T037 Avoid default index hydration/presentation of large JSON payloads (`context`, `failure_summary`, `summary_counts`) unless a visible column truly needs them; move heavy diagnostics to detail/collapsed support paths.
- [x] T038 Scope and bound filter option queries for type and initiator so they do not scan unrelated workspaces or unbounded historical rows during normal index render.
- [x] T039 Preserve existing OperationRun status/outcome/actionability semantics; do not change lifecycle truth to make the list faster.
- [x] T040 Preserve existing canonical detail/view links through `OperationRunLinks` and tenantless OperationRun viewer routes.
## Phase 6: Controlled States and Runtime Smoke Mode
- [x] T041 Ensure the Operations empty state is specific to the active workspace/environment scope, customer-ready, and avoids false health claims.
- [x] T042 Ensure loading behavior preserves the active workspace/environment filter and does not expose framework/debug output.
- [x] T043 Add a controlled display-only error/notice state only if implementation proves one is appropriate; do not use a catch-all to hide the expensive path or raw exceptions.
- [x] T044 Reuse `App\Http\Middleware\SuppressDebugbarForSmokeRequests` for smoke-cookie/session suppression where possible.
- [x] T045 Reuse or extend `App\Support\Filament\PanelThemeAsset` behavior so productization-smoke mode can run without requiring the Vite dev client when built assets are available.
- [x] T046 If a new env/config flag is required, name it narrowly for productization/browser smoke, document it in this spec's verification artifact, and ensure normal local developer Debugbar/Vite workflow remains unchanged.
- [x] T047 Ensure productization-smoke assertions do not fail all arbitrary local warnings; fail only on the explicit runtime/debug leakage signatures from this spec.
## Phase 7: Validation and Formatting
- [x] T048 Run targeted feature tests for Spec 391 render/scoping/bounded behavior.
- [x] T049 Run targeted browser smoke for Spec 391.
- [x] T050 Run targeted formatting for touched PHP files with `php vendor/bin/pint --test <touched php files>` or the project-equivalent narrow formatting command.
- [x] T051 Run `git diff --check` from the repository root.
- [x] T052 Open the Operations route in the browser after implementation and record route, HTTP status, render time, page title/header, table/empty state, workspace/environment context, console errors, network errors, absence of debug page, and absence of Debugbar/source-link leakage in `artifacts/verification.md`.
- [x] T053 Confirm in `artifacts/verification.md` that no provider mutations, restore jobs, exports, deletes, archives, force-deletes, notifications, customer-facing delivery actions, migrations, seeders, or destructive commands were executed.
- [x] T054 Record final `git status --short`, intentionally changed files, pre-existing unrelated dirty files if any, and known limitations in `artifacts/verification.md`.
## Non-Tasks / Guardrails
- [x] NT001 Do not increase PHP `max_execution_time`.
- [x] NT002 Do not hide or remove the Operations route or links.
- [x] NT003 Do not mask the error with a generic catch-all while leaving the expensive render path intact.
- [x] NT004 Do not change Evidence, Provider, Review Pack, Restore, dashboard, or customer-facing artifact semantics.
- [x] NT005 Do not run provider syncs, provider mutations, restore jobs, exports, deletes, archives, force-deletes, seeders, or destructive commands.
- [x] NT006 Do not add migrations unless spec/plan are updated first with proof.
- [x] NT007 Do not add new OperationRun types, statuses, outcomes, summary-count keys, lifecycle semantics, or unscoped caching.
- [x] NT008 Do not rewrite or normalize completed Operations/productization specs.