TenantAtlas/specs/170-system-operations-surface-alignment/research.md
ahmido fdd3a85b64 feat: align system operations surfaces (#201)
## Summary
- align the system-panel Operations, Failed operations, and Stuck operations pages to the read-only registry contract by removing inline row triage and keeping row-click inspection
- keep retry, cancel, and mark-investigated behavior on the canonical system operation detail page while adding the explicit `Show all operations` return path and updated `Operations / Operation` copy
- add and update focused Pest and Livewire coverage for list CTA behavior, detail-owned triage, and view-only versus manage-capable platform access
- add Spec 170 implementation artifacts plus the follow-on Spec 171 and Spec 172 packages

## Testing
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsTriageActionsTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/Guards/ActionSurfaceContractTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsFailuresViewTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsStuckViewTest.php`
- integrated browser smoke on `/system/ops/runs`, `/system/ops/failures`, `/system/ops/stuck`, empty states via search filter, and detail-page retry confirmation visibility

## Notes
- branch pushed from `170-system-operations-surface-alignment`
- latest commit: `64b4d741 feat: align system operations surfaces`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #201
2026-03-30 19:08:56 +00:00

5.3 KiB

Research: System Operations Surface Alignment

Decision 1: Keep the three system list pages as read-only registry surfaces

  • Decision: Runs, Failures, and Stuck stay classified as ReadOnlyRegistryReport surfaces with recordUrl() full-row navigation as the only primary inspect model.
  • Rationale: The pages already declare ActionSurfaceType::ReadOnlyRegistryReport, already use clickable rows, and the constitution requires exactly one primary inspect/open model for registry surfaces. Inline triage on these lists is the specific behavior drift the spec is correcting.
  • Alternatives considered: Reclassify the lists as queue/review surfaces. Rejected because the pages are scan-first status registers split by run state, not in-place decision queues.

Decision 2: Keep triage ownership on the existing ViewRun page

  • Decision: Retry, Cancel, and Mark investigated remain on app/Filament/System/Pages/Ops/ViewRun.php and are removed from the list rows.
  • Rationale: The detail page already exposes all three actions, already loads tenant and workspace, and already provides the surrounding operational context the operator needs before acting. Repo analysis of the operation-run detail surface confirms that the page is already the rich context owner for run truth and next steps.
  • Alternatives considered: Add a new dedicated triage modal or keep one “safe” triage action inline on the lists. Rejected because both options preserve split ownership and weaken the single-entry operational model.
  • Decision: OperationRunTriageService and SystemOperationRunLinks remain the canonical implementation path for retry/cancel/investigate and detail navigation.
  • Rationale: The service already encapsulates retryability/cancelability rules, audit logging, OperationRunService lifecycle transitions, and triage context writes. This spec changes action placement, not domain behavior.
  • Alternatives considered: Introduce a new presenter, coordinator, or list-specific triage wrapper. Rejected under ABSTR-001 and LAYER-001 because there is only one existing triage behavior and the current service already fits it.

Decision 4: Move behavior guards from “direct row triage” to “detail-owned triage”

  • Decision: Update tests/Feature/Guards/ActionSurfaceContractTest.php and tests/Feature/System/Spec114/OpsTriageActionsTest.php so they assert row-click-only lists and detail-header triage ownership.
  • Rationale: The constitution explicitly prioritizes rendered behavior over declaration. The current guard suite still encodes the old hybrid interaction model by expecting retry, cancel, and mark_investigated on all three list pages.
  • Alternatives considered: Leave the existing guards untouched and only rely on declarations, or delete the guards. Rejected because that would preserve declaration-only conformance and remove the regression protection this slice is supposed to strengthen.

Decision 5: Absorb visible Operations naming into Spec 170 while keeping route stability explicit

  • Decision: The changed system surfaces standardize their visible collection and singular nouns to Operations and Operation in navigation labels, headings, empty-state copy, and return links, while existing internal class names and /system/ops/runs route paths may remain stable for compatibility.
  • Rationale: The constitution treats Operations as the canonical collection noun for run records. Leaving the changed system surfaces on Runs would keep the spec itself in conflict with the constitution.
  • Alternatives considered: Defer naming to Spec 171. Rejected because it leaves a known constitution conflict inside the active slice.

Decision 6: Give every changed list surface one explicit CTA and the detail page one explicit return path

  • Decision: The Operations list uses Go to runbooks as its single header and empty-state CTA. Failed operations and Stuck operations use Show all operations as their single header and empty-state CTA. The detail page exposes Show all operations as the canonical return path and keeps Go to runbooks as secondary navigation.
  • Rationale: The constitution requires changed list surfaces to define explicit empty-state CTA behavior and requires a clear canonical navigation model. These CTAs stay navigation-only and do not reintroduce inline row clutter.
  • Alternatives considered: Keep explanation-only empty states or add multiple list CTAs. Rejected because the first violates the constitution and the second weakens scanability.

Decision 7: No new API surface; document the route/UI contract explicitly

  • Decision: Capture the expected route and interaction contract in a small OpenAPI-style YAML file under contracts/ even though the feature does not add a public JSON API.
  • Rationale: The implementation change is route-driven and operator-visible. A route contract that records surface type, inspect affordance, action placement, and capability boundaries gives later tasks and reviews a concrete artifact without inventing a new backend API.
  • Alternatives considered: Skip contracts/ entirely. Rejected because the planning workflow expects a concrete contract artifact, and this feature benefits from a machine-readable record of its operator-visible contract.