TenantAtlas/specs/110-ops-ux-enforcement/spec.md
ahmido f13a4ce409 feat(110): Ops-UX enterprise start/dedup standard (repo-wide) (#134)
Implements Spec 110 Ops‑UX Enforcement and applies the repo‑wide “enterprise” standard for operation start + dedup surfaces.

Key points
- Start surfaces: only ephemeral queued toast (no DB notifications for started/queued/running).
- Dedup paths: canonical “already queued” toast.
- Progress refresh: dispatch run-enqueued browser event so the global widget updates immediately.
- Completion: exactly-once terminal DB notification on completion (per Ops‑UX contract).

Tests & formatting
- Full suite: 1738 passed, 8 skipped (8477 assertions).
- Pint: `vendor/bin/sail bin pint --dirty --format agent` (pass).

Notable change
- Removed legacy `RunStatusChangedNotification` (replaced by the terminal-only completion notification policy).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #134
2026-02-24 09:30:15 +00:00

275 lines
22 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feature Specification: Ops-UX Enforcement & Cleanup (Enterprise Standard Rollout)
**Feature Branch**: `110-ops-ux-enforcement`
**Created**: 2026-02-23
**Status**: Draft
**Depends On**: `specs/055-ops-ux-rollout/spec.md`
## Clarifications
### Session 2026-02-23
- Q: For Guard A, what detection strategy should we use to catch forbidden status/outcome transitions while allowing context-only updates? → A: Scan `app/` for `->update([...])` blocks that include `status` or `outcome` keys (same call block, multi-line), excluding `app/Services/OperationRunService.php`.
- Q: Should queued/running DB notifications be allowed (e.g., `OperationRunQueued`) or is the DB notification surface terminal-only? → A: Ban queued/running DB notifications entirely (including `OperationRunQueued`); the only DB notification for operations is the terminal `OperationRunCompleted` notification.
- Q: For Guard B (DB-notification guard), what scanning scope/heuristic should we use? → A: Scan `app/**/*.php` and fail when a file both references an OperationRun (import or variable/property usage) and emits any database notification (`sendToDatabase(` or `->notify(`), except for an explicit allowlist.
## Spec Scope Fields *(mandatory)*
- **Scope**: tenant runtime remediation + repo-wide enforcement guards (CI/static scan)
- **Primary Routes**: No new routes. Affected internal flows: Inventory Sync, Backup Schedule Retention, Backup Schedule Run, Bulk Policy Export, Bulk Restore Run Restore, Bulk Restore Run Force Delete, Bulk Policy Unignore, Entra Group manual sync, Add/Remove Policies to Backup Set, Restore Run execution.
- **Data Ownership**: `operation_runs` (tenant-scoped), `notifications` (tenant user-scoped). No schema changes required.
- **RBAC**: No new RBAC surfaces. Existing capability gates on triggering operations remain unchanged. Notification delivery is limited to the initiator user. System/scheduled runs with no initiator receive no DB notification.
*Canonical-view fields not applicable — no new views introduced.*
## User Scenarios & Testing *(mandatory)*
### User Story 1 — No Silent Completions (Priority: P1)
As a tenant admin, when I trigger any tracked operation (Inventory Sync, Retention, Backup Schedule Run), I always receive exactly one terminal DB notification upon completion, so I can audit the outcome without checking the Monitoring hub manually.
**Why this priority**: Silent completions break auditability — the core promise of the Ops-UX system. Missing terminal notifications mean admins have no persistent outcome record outside the Monitoring hub.
**Independent Test**: Trigger (or simulate) an inventory sync run to terminal state and assert exactly one `OperationRunCompleted` DB notification exists for the initiator.
**Acceptance Scenarios**:
1. **Given** an `inventory_sync` OperationRun with an initiator, **When** `InventorySyncService` transitions it to a terminal outcome, **Then** exactly one `OperationRunCompleted` DB notification is persisted for the initiator user.
2. **Given** an `apply_backup_retention` OperationRun with an initiator, **When** `ApplyBackupScheduleRetentionJob` completes (success or failure), **Then** exactly one `OperationRunCompleted` DB notification is persisted for the initiator user.
3. **Given** an OperationRun with **no initiator** (system/scheduled run), **When** the run transitions to terminal, **Then** zero DB notifications are emitted.
---
### User Story 2 — No Notification Spam (Priority: P1)
As a tenant admin, I never receive duplicate completion DB notifications for a single run, and I never receive queued/running state DB notifications for any operation.
**Why this priority**: Notification spam erodes trust in the notification surface. Even one duplicate makes admins ignore notifications — defeating the entire audit layer.
**Independent Test**: Simulate a `RunBackupScheduleJob` to completion and assert zero queued/running DB notifications exist, and exactly one terminal DB notification.
**Acceptance Scenarios**:
1. **Given** a backup schedule run job enqueued and completed, **When** all job code paths execute, **Then** zero queued/running DB notifications are persisted, and exactly one terminal `OperationRunCompleted` exists for the initiator.
2. **Given** a `BulkPolicyExportJob` that reaches terminal state via any path (success / abort / circuit-break), **Then** exactly one terminal `OperationRunCompleted` DB notification exists and zero notifications from job-level `sendToDatabase()` calls.
3. **Given** any bulk job that previously sent custom completion DB notifications (`BulkRestoreRunForceDeleteJob`, `AddPoliciesToBackupSetJob`, `RemovePoliciesFromBackupSetJob`), **When** the job completes, **Then** no job-level DB notifications are emitted.
---
### User Story 3 — Legacy Notification Removed (Priority: P1)
As a tenant admin running a restore operation, my completion feedback comes exclusively from the canonical `OperationRunCompleted` notification, not from a legacy `RunStatusChangedNotification` with inconsistent copy or link behavior.
**Why this priority**: The legacy class is an out-of-system notification that bypasses canonical delivery, creating inconsistent UX copy and a second notification channel that cannot be centrally controlled.
**Independent Test**: Confirm `RunStatusChangedNotification` class does not exist in `app/` and `ExecuteRestoreRunJob` no longer references it. Restore run completion produces exactly one `OperationRunCompleted`.
**Acceptance Scenarios**:
1. **Given** `ExecuteRestoreRunJob` completes a restore run to terminal state, **Then** no `RunStatusChangedNotification` is dispatched, and exactly one `OperationRunCompleted` DB notification is persisted for the initiator.
2. **Given** a developer searches `app/` and `tests/` for `RunStatusChangedNotification`, **Then** no results are found (class deleted, all references removed).
---
### User Story 4 — Regression Guards Enforce the Constitution (Priority: P1)
As a developer working on the repo, if I accidentally introduce a direct `$operationRun->update(['status' => ...])` outside `OperationRunService`, a CI guard test immediately fails with a clear file + snippet report so I know exactly what to fix.
**Why this priority**: Without automated guards, the enforcement degrades over time as new features are added. Guards are the only scalable way to maintain the constitution without manual code review on every PR.
**Independent Test**: Introduce a synthetic violation in a temp file, run the guard test, confirm it fails with actionable output. Remove the violation, confirm the test passes.
**Acceptance Scenarios**:
1. **Given** the codebase contains a direct `$operationRun->update(['status' => ...])` outside `OperationRunService`, **When** the guard test runs, **Then** it fails and outputs the violating file path and snippet.
2. **Given** a job file contains both an OperationRun reference and a `sendToDatabase()` call (not on the allowlist), **When** the DB-notification guard test runs, **Then** it fails with the file path.
3. **Given** any file in `app/` or `tests/` references `RunStatusChangedNotification`, **When** the legacy-class guard test runs, **Then** it fails.
4. **Given** the codebase has no violations, **When** all guard tests run, **Then** all pass green.
---
### User Story 5 — Canonical "Already Queued" Toast (Priority: P2)
As a tenant admin triggering an operation that is already queued, I receive a consistent, canonical "already queued" toast message rather than ad-hoc copy from individual feature components, so the experience is uniform across all operation types.
**Why this priority**: P2 polish — non-blocking, but addresses the last remaining ad-hoc UX copy point identified in the audit.
**Independent Test**: Trigger the "already queued" dedup path in `BackupSetPolicyPickerTable` and assert the toast uses the canonical presenter output.
**Acceptance Scenarios**:
1. **Given** an operation is already queued, **When** a user attempts to queue it again via the policy picker, **Then** a toast is shown using `OperationUxPresenter::alreadyQueuedToast(...)` with canonical copy.
---
### Edge Cases
- What happens when a job fails with an unhandled exception — does the OperationRun get stuck in a non-terminal state? *(Assumption: existing job `failed()` callback or `finally` block already transitions via service; confirm during implementation per-file.)*
- What happens when `OperationRunService` itself throws during terminal transition? *(Assumption: run stays non-terminal; pre-existing behavior outside scope of this spec.)*
- What if an OperationRun has no initiator and a job previously sent a DB notification — will removing that path cause silence? *(Confirmed acceptable: system runs are auditable via Monitoring hub only.)*
## Requirements *(mandatory)*
**Constitution alignment — OPS-UX-001**: This spec enforces the Ops-UX 3-surface contract. All status/outcome transitions must be routed through `OperationRunService`. Terminal DB notifications are sent exclusively via `OperationRunCompleted` from within the service. No job or feature code may send its own completion DB notification.
**Constitution alignment — Filament scope**: This spec introduces **no new Filament screens**. It DOES include targeted updates to existing Filament start surfaces (Resources/Pages) to remove queued/running DB notifications and ensure start-surface feedback is toast-only.
**UI Action Matrix (mandatory)**
| Surface | User action | Authorization | Start surface feedback | Progress surface | Terminal surface |
|---------|-------------|---------------|------------------------|------------------|-----------------|
| `PolicyResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
| `PolicyVersionResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
| `BackupScheduleResource` (tenant context) | Run / queue backup schedule operation | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
| `TenantResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
| `EntraGroupResource``ListEntraGroups` (tenant context) | Start manual sync operation | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
| Scheduled/system operations | Jobs run without an initiator | Existing schedule/lock semantics (unchanged) | N/A | Monitoring → Operations + run detail | No DB notification (initiator is null); Monitoring remains canonical |
**Constitution alignment — No new Graph calls / OperationRun types**: This spec modifies existing operation flows only; it does not introduce new operation types or Graph calls.
**Constitution alignment — BADGE-001**: No new status badge values introduced.
### Functional Requirements
- **FR-001**: All OperationRun `status` and `outcome` field transitions MUST go through `OperationRunService` canonical transition methods. Direct `$operationRun->update(['status' => ...])`, `$operationRun->status = ...`, `$operationRun->outcome = ...`, or bulk query updates on status/outcome are forbidden outside `OperationRunService`.
- **FR-002**: `OperationRunService` MUST emit exactly one `OperationRunCompleted` DB notification to the initiator when transitioning a run to a terminal outcome (when an initiator exists).
- **FR-003**: No job or service code outside `OperationRunService` MAY emit a DB notification representing operation completion, abort, or terminal state.
- **FR-004**: No code anywhere MAY emit a DB notification for queued or running operation states. This includes `OperationRunQueued` and any other “queued” / “started” / “running” database notifications.
- **FR-005**: `RunStatusChangedNotification` MUST be deleted. No references to it may remain in `app/` or `tests/`.
- **FR-006**: `InventorySyncService` MUST transition to terminal state exclusively via `OperationRunService`, not via direct model updates.
- **FR-007**: `ApplyBackupScheduleRetentionJob` MUST transition to terminal state exclusively via `OperationRunService`.
- **FR-008**: `TenantpilotBackfillWorkspaceIds` console command MUST use the canonical transition if it transitions OperationRun status (initiator may be null; no DB notification emitted in that case).
- **FR-009**: `RunBackupScheduleJob` MUST NOT emit any queued or completion DB notifications; outcome notification is handled by `OperationRunService` terminal transition.
- **FR-010**: `BulkPolicyExportJob`, `BulkRestoreRunForceDeleteJob`, `BulkRestoreRunRestoreJob`, `BulkPolicyUnignoreJob`, `AddPoliciesToBackupSetJob`, and `RemovePoliciesFromBackupSetJob` MUST NOT call `sendToDatabase()` for operation completion, abort, or queued/running feedback.
- **FR-010b**: Filament start surfaces that initiate operation-run-producing flows MUST NOT persist queued/running DB notifications (including any `sendToDatabase()` “queued” notifications). Start feedback is toast-only.
- **FR-011**: Context-only updates (e.g., updating `context`, `message`, `reason_code` fields without touching `status` or `outcome`) are permitted directly on the model outside `OperationRunService`.
- **FR-012**: Three Pest guard tests MUST exist and pass in CI:
- Guard A: Detects direct status/outcome transitions outside `OperationRunService`; reports file + snippet. Implementation MUST scan `app/**/*.php` for forbidden transition patterns, excluding `app/Services/OperationRunService.php`, including:
- `->update(` calls whose update array includes a `status` and/or `outcome` key (multi-line block match allowed)
- direct assignments to `->status` / `->outcome`
- query/builder/bulk `update([...])` calls that set `status` and/or `outcome`
Context-only updates without `status`/`outcome` MUST NOT fail this guard.
- Guard B: Detects DB-notification emissions in operation-flow code; reports file path. Implementation MUST scan `app/**/*.php` and fail when a file contains BOTH (a) an OperationRun signal (`use App\\Models\\OperationRun;` OR `OperationRun` token OR `$this->operationRun` OR `$operationRun`) AND (b) a DB-notification emission (`sendToDatabase(` OR `->notify(`). Allowed exceptions (explicit allowlist): `app/Services/OperationRunService.php`, `app/Notifications/OperationRunCompleted.php`.
- Guard C: Detects any reference to `RunStatusChangedNotification` in `app/` or `tests/`.
- **FR-012b**: Operation enqueue helpers MUST NOT emit queued DB notifications by default. Any helper param like `emitQueuedNotification` MUST default to `false`, and all current call sites MUST comply.
- **FR-013** *(P2)*: `OperationUxPresenter` MUST expose an `alreadyQueuedToast(...)` static helper returning canonical copy + duration (+ optional "View run" action).
- **FR-014** *(P2)*: `BackupSetPolicyPickerTable` dedup toast MUST use `OperationUxPresenter::alreadyQueuedToast(...)`.
## Scope (Known Violations — Remediation Targets)
### Status transition bypass (direct model update — silent completion)
| File | Violation | Priority |
|------|-----------|----------|
| `app/Services/Inventory/InventorySyncService.php` | Direct `update([status/outcome])` — silent completion | P0 |
| `app/Jobs/ApplyBackupScheduleRetentionJob.php` | Direct `update([...])` — silent completion | P0 |
| `app/Console/Commands/TenantpilotBackfillWorkspaceIds.php` | Direct status update | P1 |
### Job-level DB notifications (duplicates / queued spam)
| File | Violation | Priority |
|------|-----------|----------|
| `app/Jobs/RunBackupScheduleJob.php` | Queued DB notification + custom finished notification | P0 |
| `app/Jobs/BulkPolicyExportJob.php` | Multiple `sendToDatabase()` paths | P1 |
| `app/Jobs/BulkRestoreRunForceDeleteJob.php` | Multiple `sendToDatabase()` paths | P1 |
| `app/Jobs/BulkRestoreRunRestoreJob.php` | Multiple `sendToDatabase()` paths | P1 |
| `app/Jobs/BulkPolicyUnignoreJob.php` | Multiple `sendToDatabase()` paths | P1 |
| `app/Jobs/AddPoliciesToBackupSetJob.php` | Custom completion DB notifications | P1 |
| `app/Jobs/RemovePoliciesFromBackupSetJob.php` | Custom completion DB notifications | P1 |
### Start-surface queued DB notifications (forbidden)
| File | Violation | Priority |
|------|-----------|----------|
| `app/Filament/Resources/PolicyResource.php` | Queued/running DB notification emitted from a start surface | P1 |
| `app/Filament/Resources/PolicyVersionResource.php` | Queued/running DB notification emitted from a start surface | P1 |
| `app/Filament/Resources/BackupScheduleResource.php` | Queued/running DB notification emitted from a start surface | P1 |
| `app/Filament/Resources/TenantResource.php` | Queued/running DB notification emitted from a start surface | P1 |
| `app/Filament/Resources/EntraGroupResource/Pages/ListEntraGroups.php` | Queued/running DB notification emitted from a start surface | P1 |
### Legacy notification outside system
| File | Violation | Priority |
|------|-----------|----------|
| `app/Jobs/ExecuteRestoreRunJob.php` | References `RunStatusChangedNotification` | P0 |
| `app/Notifications/RunStatusChangedNotification.php` | Class to delete | P0 |
### Optional polish (P2)
| File | Violation | Priority |
|------|-----------|----------|
| `app/Livewire/BackupSetPolicyPickerTable.php` | Ad-hoc "already queued" toast (non-canonical copy) | P2 |
---
## Execution Plan Reference
The executable work breakdown for this spec lives in [specs/110-ops-ux-enforcement/tasks.md](specs/110-ops-ux-enforcement/tasks.md). The spec intentionally avoids duplicating a task list to prevent drift.
## Testing Plan (Pest)
### Guard tests (mandatory — CI enforcement layer)
- `tests/Feature/OpsUx/Constitution/DirectStatusTransitionGuardTest.php`
- `tests/Feature/OpsUx/Constitution/JobDbNotificationGuardTest.php`
- `tests/Feature/OpsUx/Constitution/LegacyNotificationGuardTest.php`
### Regression tests (mandatory for P0 fixes)
- `tests/Feature/OpsUx/Regression/InventorySyncTerminalNotificationTest.php` — exactly one `OperationRunCompleted` for initiator; none for system run
- `tests/Feature/OpsUx/Regression/BackupRetentionTerminalNotificationTest.php` — exactly one `OperationRunCompleted`; no direct model transitions
- `tests/Feature/OpsUx/Regression/BackupScheduleRunNotificationTest.php` — zero queued DB notifications; exactly one terminal notification
- `tests/Feature/OpsUx/Regression/BulkJobCircuitBreakerTest.php` — circuit-breaker / abort path yields exactly one terminal `OperationRunCompleted` and zero job-level DB notifications (representative bulk job)
### Recommended test locations
- Guards: `tests/Feature/OpsUx/Constitution/`
- Regressions: `tests/Feature/OpsUx/Regression/`
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Every tracked operation with an initiator produces exactly one persistent DB notification upon terminal completion — zero operations complete silently.
- **SC-002**: Zero queued/running DB notifications are emitted across all operation flows.
- **SC-003**: Zero duplicate completion DB notifications for any single operation run across all exit paths (success, failure, partial, circuit-break).
- **SC-004**: `RunStatusChangedNotification` class is fully deleted — zero references remain in the codebase.
- **SC-005**: All three CI guard tests pass green on a clean codebase and fail with actionable output when a synthetic violation is introduced.
- **SC-006**: All existing OpsUx test suites and related test files pass without regression after changes.
- **SC-007** *(P2)*: All "already queued" dedup feedback paths use identical canonical copy, verifiable by a single source-of-truth presenter call.
---
## Definition of Done (DoD)
- [ ] All in-scope files no longer directly update status/outcome outside `OperationRunService`
- [ ] All in-scope jobs no longer emit completion/abort/queued DB notifications
- [ ] `RunStatusChangedNotification` deleted; zero references in `app/` and `tests/`
- [ ] Three guard tests exist and pass in CI (and fail with actionable output on synthetic violations)
- [ ] All OpsUx regression tests pass
- [ ] Full test suite green (no regressions)
- [ ] Pint formatting clean for all touched files
- [ ] `AGENTS.md` / constitution updated to reference the non-negotiable Ops-UX rule if not already present from 055 follow-up
---
## Assumptions
1. `OperationRunService` already exposes a canonical terminal transition method (from Spec 055); this spec calls it, not redeclares it.
2. The `failed()` job lifecycle callback or try/finally blocks in affected jobs already exist (or will be added) to ensure terminal transitions even on unhandled exceptions — confirmed during implementation per-file.
3. Guard tests are static analysis (filesystem grep-based) Pest tests, not runtime tests. They do not require a running application.
4. The allowlist for Guard B (job DB notifications) is intentionally minimal. Any new entry requires justification in a spec comment.
5. P2 tasks (`T026`/`T027`) are optional and do not gate release of P0/P1 work.
---
## Rollout / PR Slicing
PR slicing is phase/story-based. `tasks.md` remains the source of truth for exact task IDs and sequencing.
| PR | Scope Slice | Priority |
|----|-------------|----------|
| PR-A | Phase 13 (Setup + Foundational + US1 “No Silent Completions”) | P0 |
| PR-B | Phase 4 (US2 “No Notification Spam” cleanup: jobs + start surfaces) | P0/P1 |
| PR-C | Phase 5 (US3 legacy notification removal) | P0 |
| PR-D | Phase 6 (US4 constitution guard tests A/B/C) | P0 |
| PR-E | Phase 7 (US5 canonical “already queued” toast polish) | P2 |
| PR-F | Phase 8 (docs alignment + validation execution) | P1/P2 |