TenantAtlas/specs/170-system-operations-surface-alignment/spec.md
ahmido fdd3a85b64 feat: align system operations surfaces (#201)
## Summary
- align the system-panel Operations, Failed operations, and Stuck operations pages to the read-only registry contract by removing inline row triage and keeping row-click inspection
- keep retry, cancel, and mark-investigated behavior on the canonical system operation detail page while adding the explicit `Show all operations` return path and updated `Operations / Operation` copy
- add and update focused Pest and Livewire coverage for list CTA behavior, detail-owned triage, and view-only versus manage-capable platform access
- add Spec 170 implementation artifacts plus the follow-on Spec 171 and Spec 172 packages

## Testing
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsTriageActionsTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/Guards/ActionSurfaceContractTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsFailuresViewTest.php`
- `vendor/bin/sail artisan test --compact tests/Feature/System/Spec114/OpsStuckViewTest.php`
- integrated browser smoke on `/system/ops/runs`, `/system/ops/failures`, `/system/ops/stuck`, empty states via search filter, and detail-page retry confirmation visibility

## Notes
- branch pushed from `170-system-operations-surface-alignment`
- latest commit: `64b4d741 feat: align system operations surfaces`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #201
2026-03-30 19:08:56 +00:00

22 KiB

Feature Specification: System Operations Surface Alignment

Feature Branch: 170-system-operations-surface-alignment
Created: 2026-03-30
Status: Draft
Input: User description: "System Operations Surface Alignment"

Spec Scope Fields (mandatory)

  • Scope: platform
  • Primary Routes:
    • /system/ops/runs
    • /system/ops/failures
    • /system/ops/stuck
    • /system/ops/runs/{run}
  • Data Ownership:
    • No new platform-owned, workspace-owned, or tenant-owned records are introduced
    • Existing OperationRun records remain the only source of truth for system operations list and detail surfaces
    • This feature changes only operator-facing interaction semantics for existing system operations pages
  • RBAC:
    • Platform plane only
    • Existing platform.operations.view and platform.operations.manage capability boundaries remain authoritative
    • Users without system access remain deny-as-not-found by existing platform routing and auth guards
    • Users with view capability but without manage capability remain able to inspect runs but unable to execute triage actions

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

Surface Surface Type Primary Inspect/Open Model Row Click Secondary Actions Placement Destructive Actions Placement Canonical Collection Route Canonical Detail Route Scope Signals Canonical Noun Critical Truth Visible by Default Exception Type
System operations list Read-only Registry / Report Full-row click to system operation detail required header CTA only none on the list /system/ops/runs /system/ops/runs/{run} Platform scope only Operations / Operation status, outcome, operation type, workspace, tenant, recency none
System failed operations list Read-only Registry / Report Full-row click to system operation detail required header CTA only none on the list /system/ops/failures /system/ops/runs/{run} Platform scope only Operations / Operation failed outcome, operation type, workspace, tenant, recency none
System stuck operations list Read-only Registry / Report Full-row click to system operation detail required header CTA only none on the list /system/ops/stuck /system/ops/runs/{run} Platform scope only Operations / Operation queued or running stale state, operation type, workspace, tenant, recency none
System operation detail Detail-first Operational Surface Dedicated detail page forbidden detail header groups only detail header only /system/ops/runs /system/ops/runs/{run} Platform scope only Operations / Operation operation truth, failure cause, context, related navigation, next actions none

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Surface Primary Persona Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
System operations list Platform operator Read-only Registry / Report Which operation should I open next? status, outcome, operation label, workspace, tenant, initiator, activity time raw payloads and deeper traces stay on detail execution outcome, recency Read-only list Open operation, go to runbooks none
System failed operations list Platform operator Read-only Registry / Report Which failed operation needs investigation first? failed outcome, operation label, workspace, tenant, activity time raw payloads and deeper traces stay on detail execution outcome, failure state, recency Read-only list Open operation, show all operations none
System stuck operations list Platform operator Read-only Registry / Report Which queued or running operation has crossed the stuck threshold? stuck class, operation label, workspace, tenant, activity time raw payloads and deeper traces stay on detail lifecycle stall state, recency Read-only list Open operation, show all operations none
System operation detail Platform operator Detail-first Operational Surface What happened on this operation, and what follow-up is appropriate? operation identity, status, outcome, related scope, dominant failure or stall context, related links low-level payloads, internal traces, and extended diagnostics execution outcome, lifecycle state, operability context Existing platform triage only Show all operations, go to runbooks, retry operation, mark investigated Cancel operation when still cancellable

Proportionality Review (mandatory when structural complexity is introduced)

  • New source of truth?: No
  • New persisted entity/table/artifact?: No
  • New abstraction?: No
  • New enum/state/reason family?: No
  • New cross-domain UI framework/taxonomy?: No
  • Current operator problem: The three system operations list pages currently behave like scan-first registry surfaces but also expose direct triage actions, underdefined empty states, and competing Operations versus Runs naming that duplicate or dilute the canonical detail model.
  • Existing structure is insufficient because: The current list surfaces split triage ownership between list and detail, which makes the lists behave like mini control centers instead of scan-first registries.
  • Narrowest correct implementation: Keep the existing system lists and the existing system operation detail page, but move triage ownership fully onto the detail page, align visible naming to Operations / Operation, and give each list one clear navigation CTA without changing persistence or introducing new surfaces.
  • Ownership cost: Existing list and guard tests need to be updated, and operators lose direct row-level triage from the lists in exchange for one consistent detail-first follow-up model.
  • Alternative intentionally rejected: Reclassifying the system lists into a queue or review surface was rejected because the pages are scan-first registries split by status family, not context-preserving queue workflows.
  • Release truth: Current-release truth. The repo already contains the canonical detail page and the duplicated list triage actions; this slice removes the duplication rather than adding new capability.

User Scenarios & Testing (mandatory)

User Story 1 - Scan Lists Without Competing Actions (Priority: P1)

As a platform operator, I want the Operations, Failed operations, and Stuck operations lists to behave as scan-first registries with one obvious open path and one clear navigation CTA, so that I can inspect the right operation without row-level action clutter.

Why this priority: This is the direct constitution violation on the current system operations surfaces.

Independent Test: Can be fully tested by loading each system operations list and asserting that rows remain clickable, row-level triage actions are absent, visible labels use Operations / Operation vocabulary, and each list exposes the expected primary CTA in header and empty state.

Acceptance Scenarios:

  1. Given a platform operator opens the system Operations list, When the table renders, Then each row opens the system operation detail page through row click, exposes no row-level triage actions, and keeps Go to runbooks as the single header and empty-state CTA.
  2. Given a platform operator opens the system Failed operations list, When the table renders, Then the row remains clickable, the list does not expose retry, cancel, or investigate actions inline, and Show all operations is the single header and empty-state CTA.
  3. Given a platform operator opens the system Stuck operations list, When the table renders, Then the row remains clickable, the list does not expose retry, cancel, or investigate actions inline, and Show all operations is the single header and empty-state CTA.

User Story 2 - Perform Triage From The Canonical Detail Page (Priority: P1)

As a platform operator with manage capability, I want system operation triage to live on the canonical operation detail page, so that every follow-up action happens in the surface that already owns full context and keeps a clear return path to all operations.

Why this priority: The lists can only become constitution-compliant if the detail page becomes the single triage destination.

Independent Test: Can be fully tested by opening a system operation detail page as a manage-capable operator and asserting that retry, cancel, and mark investigated remain available there with the same audit and queued-run behavior, while Show all operations remains available as the return path.

Acceptance Scenarios:

  1. Given a failed operation and a manage-capable platform operator, When the operator opens the system operation detail page, Then retry remains available on the detail header and still queues a replacement operation.
  2. Given a cancellable operation and a manage-capable platform operator, When the operator opens the system operation detail page, Then cancel remains available on the detail header and still requires confirmation.
  3. Given an operation that needs documentation, When the operator opens the system operation detail page, Then mark investigated remains available there, still records the investigation action, and the page keeps Show all operations as the canonical return link.

User Story 3 - Preserve View-Only Access Semantics (Priority: P2)

As a platform operator with view-only access, I want to inspect system operations without being offered triage controls, so that the platform plane stays capability-correct while the aligned surfaces remain usable.

Why this priority: Surface alignment must not weaken the existing view/manage separation.

Independent Test: Can be fully tested by rendering list and detail surfaces for a view-only system user and asserting that inspection and navigation remain available while triage actions remain hidden.

Acceptance Scenarios:

  1. Given a platform user with operations view but not operations manage, When the user opens any system operations list, Then the user can inspect rows, use the page CTA, but sees no triage actions there.
  2. Given a platform user with operations view but not operations manage, When the user opens a system operation detail page, Then retry, cancel, and mark investigated remain hidden while Show all operations and Go to runbooks remain available.

Edge Cases

  • A failed operation appears on both the all-operations list and the failed-operations list; both surfaces must expose the same single open model and must not diverge in row actions or naming.
  • A queued or running operation later becomes cancellable or non-cancellable; the list remains read-only while the detail page resolves whether cancel is available.
  • The system operation detail page must keep a clear return path to the canonical Operations list even when opened from Failed operations or Stuck operations.
  • Empty Operations, Failed operations, and Stuck operations states must remain explanation-first while still exposing exactly one primary CTA that matches the page contract.

Requirements (mandatory)

Constitution alignment (required): This feature introduces no Microsoft Graph calls, no new write workflow, and no new long-running work. Existing OperationRun and triage services remain the underlying execution model. The feature only realigns where platform operators inspect and triage existing system runs.

Constitution alignment (PROP-001 / ABSTR-001 / PERSIST-001 / STATE-001 / BLOAT-001): This feature adds no new structure, persistence, abstraction, or state family. It reduces surface complexity by removing duplicated triage ownership from the lists.

Constitution alignment (OPS-UX): Existing retry and cancel flows continue to reuse the current queued-run UX, terminal notification rules, and service-owned OperationRun lifecycle. This feature does not introduce a new run type or change lifecycle ownership.

Constitution alignment (RBAC-UX): This feature stays in the platform plane only. Existing OPERATIONS_VIEW and OPERATIONS_MANAGE capability checks remain the server-side source of truth. List alignment must not weaken the current distinction between view-only inspection and manage-capable triage.

Constitution alignment (OPS-EX-AUTH-001): Not applicable.

Constitution alignment (BADGE-001): Existing status and outcome badge semantics remain unchanged and centralized.

Constitution alignment (UI-FIL-001): The feature continues to use Filament-native tables, row navigation, header actions, confirmation modals, and notifications. No custom local action framework or styling language is introduced.

Constitution alignment (UI-NAMING-001): This slice standardizes the changed system-plane surfaces to the canonical visible nouns Operations and Operation. Existing internal PHP class names and route paths may remain stable, but operator-facing labels, headings, and return links MUST stop presenting Runs as the primary noun.

Constitution alignment (UI-CONST-001 / UI-SURF-001 / UI-HARD-001 / UI-EX-001 / UI-REVIEW-001): The system Runs, Failures, and Stuck pages MUST align to the Read-only Registry / Report surface rules: one-click row open, no competing inline triage controls, and no destructive actions on the list rows. The system run detail page MUST remain the sole triage surface for retry, cancel, and mark investigated.

Constitution alignment (OPSURF-001): The lists remain operator-first scan surfaces that show only the truth needed to choose the next run to inspect. Full follow-up context and triage remain on the detail page, where diagnostic depth already exists.

Constitution alignment (UI-SEM-001 / LAYER-001 / TEST-TRUTH-001): This feature does not add a new semantic layer. It removes duplicated action ownership and keeps tests focused on operator-visible behavior: list inspect model, detail triage ownership, and manage-vs-view capability behavior.

Constitution alignment (Filament Action Surfaces): The Action Surface Contract is satisfied when Operations, Failed operations, and Stuck operations expose row click only, keep bulk actions absent by explicit no-bulk need, provide one header and empty-state CTA each, and leave retry, cancel, and mark investigated to the detail header. No new exemption is introduced.

Constitution alignment (UX-001 — Layout & Information Architecture): List layouts remain scanable registry tables. The system run detail page remains the operational detail surface and continues to own richer follow-up actions. No create or edit layout changes are introduced.

Functional Requirements

  • FR-170-001: The system Operations list MUST behave as a read-only registry surface with one primary inspect model: full-row click to the canonical system operation detail page.
  • FR-170-002: The system Failed operations list MUST behave as a read-only registry surface with one primary inspect model: full-row click to the canonical system operation detail page.
  • FR-170-003: The system Stuck operations list MUST behave as a read-only registry surface with one primary inspect model: full-row click to the canonical system operation detail page.
  • FR-170-004: the Operations, Failed operations, and Stuck operations lists MUST NOT render retry, cancel, or mark investigated as row actions.
  • FR-170-005: The canonical system operation detail page MUST remain the only system-plane surface that exposes retry, cancel, and mark investigated actions.
  • FR-170-006: Retry on the system operation detail page MUST preserve the current queued-run feedback behavior and the current link back to the newly queued operation.
  • FR-170-007: Cancel on the system operation detail page MUST remain confirmation-gated and MUST stay available only when the current operation is still cancellable.
  • FR-170-008: Mark investigated on the system operation detail page MUST remain confirmation-gated and MUST continue to require an operator-supplied reason.
  • FR-170-009: View-only platform operators MUST remain able to open list rows and detail pages while triage actions remain hidden.
  • FR-170-010: Manage-capable platform operators MUST retain triage capability on the system operation detail page after list row actions are removed.
  • FR-170-011: Existing audit behavior for retry and mark investigated MUST remain unchanged.
  • FR-170-012: The system operation detail page MUST provide a clear Show all operations return path to the canonical collection route while preserving Go to runbooks navigation.
  • FR-170-013: This feature MUST NOT introduce a new page, a new system operations capability, or a new persisted artifact.
  • FR-170-014: Repository guard tests for the system operations surfaces MUST be updated so they assert row-click-only lists, list CTAs, canonical Operations / Operation naming, and detail-owned triage instead of direct row triage.
  • FR-170-015: The changed system surfaces MUST use Operations as the canonical visible collection noun and Operation as the canonical visible singular noun.
  • FR-170-016: Each changed system list surface MUST expose exactly one primary empty-state CTA and the corresponding header action when records exist.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
System operations list app/Filament/System/Pages/Ops/Runs.php Go to runbooks recordUrl() full-row click none none Go to runbooks n/a n/a no new audit behavior Read-only Registry / Report; visible label becomes Operations
System failed operations list app/Filament/System/Pages/Ops/Failures.php Show all operations recordUrl() full-row click none none Show all operations n/a n/a no new audit behavior Read-only Registry / Report; visible label becomes Failed operations
System stuck operations list app/Filament/System/Pages/Ops/Stuck.php Show all operations recordUrl() full-row click none none Show all operations n/a n/a no new audit behavior Read-only Registry / Report; visible label becomes Stuck operations
System operation detail app/Filament/System/Pages/Ops/ViewRun.php, resources/views/filament/system/pages/ops/view-run.blade.php Show all operations, Go to runbooks n/a n/a n/a n/a Retry, Cancel, Mark investigated n/a existing retry and mark-investigated audit behavior remains Detail-first operational owner of triage; heading and return link use Operation

Key Entities (include if feature involves data)

  • System operations list surface: The system-panel Operations, Failed operations, and Stuck operations pages that scan existing OperationRun records.
  • System operation detail surface: The canonical system detail page for one selected OperationRun.
  • System operation triage action: Existing follow-up actions for retry, cancel, and mark investigated.

Success Criteria (mandatory)

  • SC-170-001: Operations, Failed operations, and Stuck operations each expose exactly one primary inspect model in automated coverage: row click to the canonical system operation detail page.
  • SC-170-002: Automated coverage verifies that Operations, Failed operations, and Stuck operations expose zero row-level triage actions.
  • SC-170-003: Automated coverage verifies that the system operation detail page still exposes retry, cancel, and mark investigated for manage-capable operators when each action is legitimately available.
  • SC-170-004: Automated coverage verifies that view-only platform operators can inspect system operations but cannot see manage-only triage actions on the detail page.
  • SC-170-005: Automated coverage verifies that the changed system surfaces use the canonical visible nouns Operations and Operation and expose the expected header or empty-state CTA on each list surface.
  • SC-170-006: The feature ships without adding any new capability, persistence, or UI exemption.

Assumptions

  • The current system run detail page remains the correct canonical place for triage ownership.
  • Existing audit behavior for retry and mark investigated is correct and does not need redesign in this slice.
  • Existing internal route paths under /system/ops/runs may remain stable while visible system-surface naming is standardized in this slice.

Non-Goals

  • Renaming internal PHP class names or changing existing /system/ops/runs route paths
  • Retrofitting deferred dashboard, onboarding, or landing surfaces
  • Changing OperationRun lifecycle semantics, run creation behavior, or notification taxonomy
  • Introducing a queue/review model for the system Operations, Failed operations, or Stuck operations pages

Dependencies

  • Existing system operations list pages: Runs, Failures, and Stuck
  • Existing system run detail page
  • Existing OperationRunTriageService
  • Existing platform capability and audit behavior
  • Existing action-surface and system operations guard coverage

Definition of Done

Spec 170 is complete when the three changed system list pages are scan-first row-click-only registry surfaces, visible system naming uses Operations / Operation, each list has one matching header and empty-state CTA, the system operation detail page is the sole owner of retry/cancel/investigate triage and exposes Show all operations, existing view/manage capability semantics remain intact, and guard tests reflect the aligned interaction model.