275 lines
22 KiB
Markdown
275 lines
22 KiB
Markdown
# Feature Specification: Ops-UX Enforcement & Cleanup (Enterprise Standard Rollout)
|
||
|
||
**Feature Branch**: `110-ops-ux-enforcement`
|
||
**Created**: 2026-02-23
|
||
**Status**: Draft
|
||
**Depends On**: `specs/055-ops-ux-rollout/spec.md`
|
||
|
||
## Clarifications
|
||
|
||
### Session 2026-02-23
|
||
|
||
- Q: For Guard A, what detection strategy should we use to catch forbidden status/outcome transitions while allowing context-only updates? → A: Scan `app/` for `->update([...])` blocks that include `status` or `outcome` keys (same call block, multi-line), excluding `app/Services/OperationRunService.php`.
|
||
- Q: Should queued/running DB notifications be allowed (e.g., `OperationRunQueued`) or is the DB notification surface terminal-only? → A: Ban queued/running DB notifications entirely (including `OperationRunQueued`); the only DB notification for operations is the terminal `OperationRunCompleted` notification.
|
||
- Q: For Guard B (DB-notification guard), what scanning scope/heuristic should we use? → A: Scan `app/**/*.php` and fail when a file both references an OperationRun (import or variable/property usage) and emits any database notification (`sendToDatabase(` or `->notify(`), except for an explicit allowlist.
|
||
|
||
## Spec Scope Fields *(mandatory)*
|
||
|
||
- **Scope**: tenant runtime remediation + repo-wide enforcement guards (CI/static scan)
|
||
- **Primary Routes**: No new routes. Affected internal flows: Inventory Sync, Backup Schedule Retention, Backup Schedule Run, Bulk Policy Export, Bulk Restore Run Restore, Bulk Restore Run Force Delete, Bulk Policy Unignore, Entra Group manual sync, Add/Remove Policies to Backup Set, Restore Run execution.
|
||
- **Data Ownership**: `operation_runs` (tenant-scoped), `notifications` (tenant user-scoped). No schema changes required.
|
||
- **RBAC**: No new RBAC surfaces. Existing capability gates on triggering operations remain unchanged. Notification delivery is limited to the initiator user. System/scheduled runs with no initiator receive no DB notification.
|
||
|
||
*Canonical-view fields not applicable — no new views introduced.*
|
||
|
||
## User Scenarios & Testing *(mandatory)*
|
||
|
||
### User Story 1 — No Silent Completions (Priority: P1)
|
||
|
||
As a tenant admin, when I trigger any tracked operation (Inventory Sync, Retention, Backup Schedule Run), I always receive exactly one terminal DB notification upon completion, so I can audit the outcome without checking the Monitoring hub manually.
|
||
|
||
**Why this priority**: Silent completions break auditability — the core promise of the Ops-UX system. Missing terminal notifications mean admins have no persistent outcome record outside the Monitoring hub.
|
||
|
||
**Independent Test**: Trigger (or simulate) an inventory sync run to terminal state and assert exactly one `OperationRunCompleted` DB notification exists for the initiator.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** an `inventory_sync` OperationRun with an initiator, **When** `InventorySyncService` transitions it to a terminal outcome, **Then** exactly one `OperationRunCompleted` DB notification is persisted for the initiator user.
|
||
2. **Given** an `apply_backup_retention` OperationRun with an initiator, **When** `ApplyBackupScheduleRetentionJob` completes (success or failure), **Then** exactly one `OperationRunCompleted` DB notification is persisted for the initiator user.
|
||
3. **Given** an OperationRun with **no initiator** (system/scheduled run), **When** the run transitions to terminal, **Then** zero DB notifications are emitted.
|
||
|
||
---
|
||
|
||
### User Story 2 — No Notification Spam (Priority: P1)
|
||
|
||
As a tenant admin, I never receive duplicate completion DB notifications for a single run, and I never receive queued/running state DB notifications for any operation.
|
||
|
||
**Why this priority**: Notification spam erodes trust in the notification surface. Even one duplicate makes admins ignore notifications — defeating the entire audit layer.
|
||
|
||
**Independent Test**: Simulate a `RunBackupScheduleJob` to completion and assert zero queued/running DB notifications exist, and exactly one terminal DB notification.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a backup schedule run job enqueued and completed, **When** all job code paths execute, **Then** zero queued/running DB notifications are persisted, and exactly one terminal `OperationRunCompleted` exists for the initiator.
|
||
2. **Given** a `BulkPolicyExportJob` that reaches terminal state via any path (success / abort / circuit-break), **Then** exactly one terminal `OperationRunCompleted` DB notification exists and zero notifications from job-level `sendToDatabase()` calls.
|
||
3. **Given** any bulk job that previously sent custom completion DB notifications (`BulkRestoreRunForceDeleteJob`, `AddPoliciesToBackupSetJob`, `RemovePoliciesFromBackupSetJob`), **When** the job completes, **Then** no job-level DB notifications are emitted.
|
||
|
||
---
|
||
|
||
### User Story 3 — Legacy Notification Removed (Priority: P1)
|
||
|
||
As a tenant admin running a restore operation, my completion feedback comes exclusively from the canonical `OperationRunCompleted` notification, not from a legacy `RunStatusChangedNotification` with inconsistent copy or link behavior.
|
||
|
||
**Why this priority**: The legacy class is an out-of-system notification that bypasses canonical delivery, creating inconsistent UX copy and a second notification channel that cannot be centrally controlled.
|
||
|
||
**Independent Test**: Confirm `RunStatusChangedNotification` class does not exist in `app/` and `ExecuteRestoreRunJob` no longer references it. Restore run completion produces exactly one `OperationRunCompleted`.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** `ExecuteRestoreRunJob` completes a restore run to terminal state, **Then** no `RunStatusChangedNotification` is dispatched, and exactly one `OperationRunCompleted` DB notification is persisted for the initiator.
|
||
2. **Given** a developer searches `app/` and `tests/` for `RunStatusChangedNotification`, **Then** no results are found (class deleted, all references removed).
|
||
|
||
---
|
||
|
||
### User Story 4 — Regression Guards Enforce the Constitution (Priority: P1)
|
||
|
||
As a developer working on the repo, if I accidentally introduce a direct `$operationRun->update(['status' => ...])` outside `OperationRunService`, a CI guard test immediately fails with a clear file + snippet report so I know exactly what to fix.
|
||
|
||
**Why this priority**: Without automated guards, the enforcement degrades over time as new features are added. Guards are the only scalable way to maintain the constitution without manual code review on every PR.
|
||
|
||
**Independent Test**: Introduce a synthetic violation in a temp file, run the guard test, confirm it fails with actionable output. Remove the violation, confirm the test passes.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** the codebase contains a direct `$operationRun->update(['status' => ...])` outside `OperationRunService`, **When** the guard test runs, **Then** it fails and outputs the violating file path and snippet.
|
||
2. **Given** a job file contains both an OperationRun reference and a `sendToDatabase()` call (not on the allowlist), **When** the DB-notification guard test runs, **Then** it fails with the file path.
|
||
3. **Given** any file in `app/` or `tests/` references `RunStatusChangedNotification`, **When** the legacy-class guard test runs, **Then** it fails.
|
||
4. **Given** the codebase has no violations, **When** all guard tests run, **Then** all pass green.
|
||
|
||
---
|
||
|
||
### User Story 5 — Canonical "Already Queued" Toast (Priority: P2)
|
||
|
||
As a tenant admin triggering an operation that is already queued, I receive a consistent, canonical "already queued" toast message rather than ad-hoc copy from individual feature components, so the experience is uniform across all operation types.
|
||
|
||
**Why this priority**: P2 polish — non-blocking, but addresses the last remaining ad-hoc UX copy point identified in the audit.
|
||
|
||
**Independent Test**: Trigger the "already queued" dedup path in `BackupSetPolicyPickerTable` and assert the toast uses the canonical presenter output.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** an operation is already queued, **When** a user attempts to queue it again via the policy picker, **Then** a toast is shown using `OperationUxPresenter::alreadyQueuedToast(...)` with canonical copy.
|
||
|
||
---
|
||
|
||
### Edge Cases
|
||
|
||
- What happens when a job fails with an unhandled exception — does the OperationRun get stuck in a non-terminal state? *(Assumption: existing job `failed()` callback or `finally` block already transitions via service; confirm during implementation per-file.)*
|
||
- What happens when `OperationRunService` itself throws during terminal transition? *(Assumption: run stays non-terminal; pre-existing behavior outside scope of this spec.)*
|
||
- What if an OperationRun has no initiator and a job previously sent a DB notification — will removing that path cause silence? *(Confirmed acceptable: system runs are auditable via Monitoring hub only.)*
|
||
|
||
## Requirements *(mandatory)*
|
||
|
||
**Constitution alignment — OPS-UX-001**: This spec enforces the Ops-UX 3-surface contract. All status/outcome transitions must be routed through `OperationRunService`. Terminal DB notifications are sent exclusively via `OperationRunCompleted` from within the service. No job or feature code may send its own completion DB notification.
|
||
|
||
**Constitution alignment — Filament scope**: This spec introduces **no new Filament screens**. It DOES include targeted updates to existing Filament start surfaces (Resources/Pages) to remove queued/running DB notifications and ensure start-surface feedback is toast-only.
|
||
|
||
**UI Action Matrix (mandatory)**
|
||
|
||
| Surface | User action | Authorization | Start surface feedback | Progress surface | Terminal surface |
|
||
|---------|-------------|---------------|------------------------|------------------|-----------------|
|
||
| `PolicyResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
|
||
| `PolicyVersionResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
|
||
| `BackupScheduleResource` (tenant context) | Run / queue backup schedule operation | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
|
||
| `TenantResource` (tenant context) | Trigger an operation that creates/reuses an OperationRun | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
|
||
| `EntraGroupResource` → `ListEntraGroups` (tenant context) | Start manual sync operation | Existing capability checks (unchanged) | Toast-only (no DB notification) + optional “View run” link | Monitoring → Operations + run detail | Exactly one `OperationRunCompleted` DB notification to initiator (service-owned) |
|
||
| Scheduled/system operations | Jobs run without an initiator | Existing schedule/lock semantics (unchanged) | N/A | Monitoring → Operations + run detail | No DB notification (initiator is null); Monitoring remains canonical |
|
||
|
||
**Constitution alignment — No new Graph calls / OperationRun types**: This spec modifies existing operation flows only; it does not introduce new operation types or Graph calls.
|
||
|
||
**Constitution alignment — BADGE-001**: No new status badge values introduced.
|
||
|
||
### Functional Requirements
|
||
|
||
- **FR-001**: All OperationRun `status` and `outcome` field transitions MUST go through `OperationRunService` canonical transition methods. Direct `$operationRun->update(['status' => ...])`, `$operationRun->status = ...`, `$operationRun->outcome = ...`, or bulk query updates on status/outcome are forbidden outside `OperationRunService`.
|
||
- **FR-002**: `OperationRunService` MUST emit exactly one `OperationRunCompleted` DB notification to the initiator when transitioning a run to a terminal outcome (when an initiator exists).
|
||
- **FR-003**: No job or service code outside `OperationRunService` MAY emit a DB notification representing operation completion, abort, or terminal state.
|
||
- **FR-004**: No code anywhere MAY emit a DB notification for queued or running operation states. This includes `OperationRunQueued` and any other “queued” / “started” / “running” database notifications.
|
||
- **FR-005**: `RunStatusChangedNotification` MUST be deleted. No references to it may remain in `app/` or `tests/`.
|
||
- **FR-006**: `InventorySyncService` MUST transition to terminal state exclusively via `OperationRunService`, not via direct model updates.
|
||
- **FR-007**: `ApplyBackupScheduleRetentionJob` MUST transition to terminal state exclusively via `OperationRunService`.
|
||
- **FR-008**: `TenantpilotBackfillWorkspaceIds` console command MUST use the canonical transition if it transitions OperationRun status (initiator may be null; no DB notification emitted in that case).
|
||
- **FR-009**: `RunBackupScheduleJob` MUST NOT emit any queued or completion DB notifications; outcome notification is handled by `OperationRunService` terminal transition.
|
||
- **FR-010**: `BulkPolicyExportJob`, `BulkRestoreRunForceDeleteJob`, `BulkRestoreRunRestoreJob`, `BulkPolicyUnignoreJob`, `AddPoliciesToBackupSetJob`, and `RemovePoliciesFromBackupSetJob` MUST NOT call `sendToDatabase()` for operation completion, abort, or queued/running feedback.
|
||
- **FR-010b**: Filament start surfaces that initiate operation-run-producing flows MUST NOT persist queued/running DB notifications (including any `sendToDatabase()` “queued” notifications). Start feedback is toast-only.
|
||
- **FR-011**: Context-only updates (e.g., updating `context`, `message`, `reason_code` fields without touching `status` or `outcome`) are permitted directly on the model outside `OperationRunService`.
|
||
- **FR-012**: Three Pest guard tests MUST exist and pass in CI:
|
||
- Guard A: Detects direct status/outcome transitions outside `OperationRunService`; reports file + snippet. Implementation MUST scan `app/**/*.php` for forbidden transition patterns, excluding `app/Services/OperationRunService.php`, including:
|
||
- `->update(` calls whose update array includes a `status` and/or `outcome` key (multi-line block match allowed)
|
||
- direct assignments to `->status` / `->outcome`
|
||
- query/builder/bulk `update([...])` calls that set `status` and/or `outcome`
|
||
Context-only updates without `status`/`outcome` MUST NOT fail this guard.
|
||
- Guard B: Detects DB-notification emissions in operation-flow code; reports file path. Implementation MUST scan `app/**/*.php` and fail when a file contains BOTH (a) an OperationRun signal (`use App\\Models\\OperationRun;` OR `OperationRun` token OR `$this->operationRun` OR `$operationRun`) AND (b) a DB-notification emission (`sendToDatabase(` OR `->notify(`). Allowed exceptions (explicit allowlist): `app/Services/OperationRunService.php`, `app/Notifications/OperationRunCompleted.php`.
|
||
- Guard C: Detects any reference to `RunStatusChangedNotification` in `app/` or `tests/`.
|
||
- **FR-012b**: Operation enqueue helpers MUST NOT emit queued DB notifications by default. Any helper param like `emitQueuedNotification` MUST default to `false`, and all current call sites MUST comply.
|
||
- **FR-013** *(P2)*: `OperationUxPresenter` MUST expose an `alreadyQueuedToast(...)` static helper returning canonical copy + duration (+ optional "View run" action).
|
||
- **FR-014** *(P2)*: `BackupSetPolicyPickerTable` dedup toast MUST use `OperationUxPresenter::alreadyQueuedToast(...)`.
|
||
|
||
## Scope (Known Violations — Remediation Targets)
|
||
|
||
### Status transition bypass (direct model update — silent completion)
|
||
|
||
| File | Violation | Priority |
|
||
|------|-----------|----------|
|
||
| `app/Services/Inventory/InventorySyncService.php` | Direct `update([status/outcome])` — silent completion | P0 |
|
||
| `app/Jobs/ApplyBackupScheduleRetentionJob.php` | Direct `update([...])` — silent completion | P0 |
|
||
| `app/Console/Commands/TenantpilotBackfillWorkspaceIds.php` | Direct status update | P1 |
|
||
|
||
### Job-level DB notifications (duplicates / queued spam)
|
||
|
||
| File | Violation | Priority |
|
||
|------|-----------|----------|
|
||
| `app/Jobs/RunBackupScheduleJob.php` | Queued DB notification + custom finished notification | P0 |
|
||
| `app/Jobs/BulkPolicyExportJob.php` | Multiple `sendToDatabase()` paths | P1 |
|
||
| `app/Jobs/BulkRestoreRunForceDeleteJob.php` | Multiple `sendToDatabase()` paths | P1 |
|
||
| `app/Jobs/BulkRestoreRunRestoreJob.php` | Multiple `sendToDatabase()` paths | P1 |
|
||
| `app/Jobs/BulkPolicyUnignoreJob.php` | Multiple `sendToDatabase()` paths | P1 |
|
||
| `app/Jobs/AddPoliciesToBackupSetJob.php` | Custom completion DB notifications | P1 |
|
||
| `app/Jobs/RemovePoliciesFromBackupSetJob.php` | Custom completion DB notifications | P1 |
|
||
|
||
### Start-surface queued DB notifications (forbidden)
|
||
|
||
| File | Violation | Priority |
|
||
|------|-----------|----------|
|
||
| `app/Filament/Resources/PolicyResource.php` | Queued/running DB notification emitted from a start surface | P1 |
|
||
| `app/Filament/Resources/PolicyVersionResource.php` | Queued/running DB notification emitted from a start surface | P1 |
|
||
| `app/Filament/Resources/BackupScheduleResource.php` | Queued/running DB notification emitted from a start surface | P1 |
|
||
| `app/Filament/Resources/TenantResource.php` | Queued/running DB notification emitted from a start surface | P1 |
|
||
| `app/Filament/Resources/EntraGroupResource/Pages/ListEntraGroups.php` | Queued/running DB notification emitted from a start surface | P1 |
|
||
|
||
### Legacy notification outside system
|
||
|
||
| File | Violation | Priority |
|
||
|------|-----------|----------|
|
||
| `app/Jobs/ExecuteRestoreRunJob.php` | References `RunStatusChangedNotification` | P0 |
|
||
| `app/Notifications/RunStatusChangedNotification.php` | Class to delete | P0 |
|
||
|
||
### Optional polish (P2)
|
||
|
||
| File | Violation | Priority |
|
||
|------|-----------|----------|
|
||
| `app/Livewire/BackupSetPolicyPickerTable.php` | Ad-hoc "already queued" toast (non-canonical copy) | P2 |
|
||
|
||
---
|
||
|
||
## Execution Plan Reference
|
||
|
||
The executable work breakdown for this spec lives in [specs/110-ops-ux-enforcement/tasks.md](specs/110-ops-ux-enforcement/tasks.md). The spec intentionally avoids duplicating a task list to prevent drift.
|
||
|
||
## Testing Plan (Pest)
|
||
|
||
### Guard tests (mandatory — CI enforcement layer)
|
||
- `tests/Feature/OpsUx/Constitution/DirectStatusTransitionGuardTest.php`
|
||
- `tests/Feature/OpsUx/Constitution/JobDbNotificationGuardTest.php`
|
||
- `tests/Feature/OpsUx/Constitution/LegacyNotificationGuardTest.php`
|
||
|
||
### Regression tests (mandatory for P0 fixes)
|
||
- `tests/Feature/OpsUx/Regression/InventorySyncTerminalNotificationTest.php` — exactly one `OperationRunCompleted` for initiator; none for system run
|
||
- `tests/Feature/OpsUx/Regression/BackupRetentionTerminalNotificationTest.php` — exactly one `OperationRunCompleted`; no direct model transitions
|
||
- `tests/Feature/OpsUx/Regression/BackupScheduleRunNotificationTest.php` — zero queued DB notifications; exactly one terminal notification
|
||
- `tests/Feature/OpsUx/Regression/BulkJobCircuitBreakerTest.php` — circuit-breaker / abort path yields exactly one terminal `OperationRunCompleted` and zero job-level DB notifications (representative bulk job)
|
||
|
||
### Recommended test locations
|
||
- Guards: `tests/Feature/OpsUx/Constitution/`
|
||
- Regressions: `tests/Feature/OpsUx/Regression/`
|
||
|
||
## Success Criteria *(mandatory)*
|
||
|
||
### Measurable Outcomes
|
||
|
||
- **SC-001**: Every tracked operation with an initiator produces exactly one persistent DB notification upon terminal completion — zero operations complete silently.
|
||
- **SC-002**: Zero queued/running DB notifications are emitted across all operation flows.
|
||
- **SC-003**: Zero duplicate completion DB notifications for any single operation run across all exit paths (success, failure, partial, circuit-break).
|
||
- **SC-004**: `RunStatusChangedNotification` class is fully deleted — zero references remain in the codebase.
|
||
- **SC-005**: All three CI guard tests pass green on a clean codebase and fail with actionable output when a synthetic violation is introduced.
|
||
- **SC-006**: All existing OpsUx test suites and related test files pass without regression after changes.
|
||
- **SC-007** *(P2)*: All "already queued" dedup feedback paths use identical canonical copy, verifiable by a single source-of-truth presenter call.
|
||
|
||
---
|
||
|
||
## Definition of Done (DoD)
|
||
|
||
- [ ] All in-scope files no longer directly update status/outcome outside `OperationRunService`
|
||
- [ ] All in-scope jobs no longer emit completion/abort/queued DB notifications
|
||
- [ ] `RunStatusChangedNotification` deleted; zero references in `app/` and `tests/`
|
||
- [ ] Three guard tests exist and pass in CI (and fail with actionable output on synthetic violations)
|
||
- [ ] All OpsUx regression tests pass
|
||
- [ ] Full test suite green (no regressions)
|
||
- [ ] Pint formatting clean for all touched files
|
||
- [ ] `AGENTS.md` / constitution updated to reference the non-negotiable Ops-UX rule if not already present from 055 follow-up
|
||
|
||
---
|
||
|
||
## Assumptions
|
||
|
||
1. `OperationRunService` already exposes a canonical terminal transition method (from Spec 055); this spec calls it, not redeclares it.
|
||
2. The `failed()` job lifecycle callback or try/finally blocks in affected jobs already exist (or will be added) to ensure terminal transitions even on unhandled exceptions — confirmed during implementation per-file.
|
||
3. Guard tests are static analysis (filesystem grep-based) Pest tests, not runtime tests. They do not require a running application.
|
||
4. The allowlist for Guard B (job DB notifications) is intentionally minimal. Any new entry requires justification in a spec comment.
|
||
5. P2 tasks (`T026`/`T027`) are optional and do not gate release of P0/P1 work.
|
||
|
||
---
|
||
|
||
## Rollout / PR Slicing
|
||
|
||
PR slicing is phase/story-based. `tasks.md` remains the source of truth for exact task IDs and sequencing.
|
||
|
||
| PR | Scope Slice | Priority |
|
||
|----|-------------|----------|
|
||
| PR-A | Phase 1–3 (Setup + Foundational + US1 “No Silent Completions”) | P0 |
|
||
| PR-B | Phase 4 (US2 “No Notification Spam” cleanup: jobs + start surfaces) | P0/P1 |
|
||
| PR-C | Phase 5 (US3 legacy notification removal) | P0 |
|
||
| PR-D | Phase 6 (US4 constitution guard tests A/B/C) | P0 |
|
||
| PR-E | Phase 7 (US5 canonical “already queued” toast polish) | P2 |
|
||
| PR-F | Phase 8 (docs alignment + validation execution) | P1/P2 |
|