191 lines
13 KiB
Markdown
191 lines
13 KiB
Markdown
# Feature Specification: Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)
|
||
|
||
**Feature Branch**: `056-remove-legacy-bulkops`
|
||
**Created**: 2026-01-18
|
||
**Status**: Draft
|
||
**Input**: User description: "Feature 056 — Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)"
|
||
|
||
## Clarifications
|
||
|
||
### Session 2026-01-18
|
||
|
||
- Q: What should be the default max concurrency per target scope (entra_tenant_id / directory_context_id) for bulk operations? → A: Config-driven, default=1
|
||
- Q: How should Selection Identity be determined for idempotency fingerprinting? → A: Hybrid (IDs-hash for explicit selection; query-hash for “select all via filter/query”)
|
||
|
||
## User Scenarios & Testing *(mandatory)*
|
||
|
||
### User Story 1 - Run-backed bulk actions are always observable (Priority: P1)
|
||
|
||
An admin performs a bulk action (e.g., apply/ignore/restore/prune across many records). The system records a single canonical run that can be monitored end-to-end, including partial failures, and provides consistent user feedback.
|
||
|
||
**Why this priority**: Bulk changes are operationally significant and must be traceable, support partial outcomes, and have a consistent mental model for admins.
|
||
|
||
**Independent Test**: Trigger a representative bulk action and verify that a run record exists, appears in the Monitoring list, has a detail view, and emits the correct feedback surfaces.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** an admin selects multiple items for a bulk action, **When** the action is confirmed and submitted, **Then** a canonical run record is created or reused and the UI confirms the enqueue/queued state via a toast.
|
||
2. **Given** a bulk run is queued or running, **When** the admin opens Monitoring → Operations, **Then** the run appears in the list and can be opened via a canonical “View run” link.
|
||
3. **Given** a bulk run completes with a mix of successes and failures, **When** the run reaches a terminal state, **Then** the initiator receives a terminal notification and the run detail shows a summary of outcomes.
|
||
|
||
---
|
||
|
||
### User Story 2 - Monitoring is the single source of run history (Priority: P2)
|
||
|
||
An admin (or operator) relies on Monitoring → Operations to see the full history of operational work (including bulk). There are no separate legacy run surfaces; links from anywhere in the app point to the canonical run detail.
|
||
|
||
**Why this priority**: Multiple run systems lead to missed incidents, inconsistent retention, and developer confusion. One canonical surface improves operational clarity and reduces support overhead.
|
||
|
||
**Independent Test**: Navigate from a bulk action result to “View run” and confirm it lands in Monitoring’s run detail; confirm there is no legacy “bulk runs” navigation or pages.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** any UI element offers a “View run” link, **When** it is clicked, **Then** it opens the canonical Monitoring → Operations → Run Detail page for that run.
|
||
2. **Given** the app navigation, **When** an admin searches for legacy bulk-run screens, **Then** no legacy bulk-run navigation or pages exist.
|
||
|
||
---
|
||
|
||
### User Story 3 - Developers can’t accidentally reintroduce legacy patterns (Priority: P3)
|
||
|
||
A developer adds or modifies an admin action. They can clearly determine whether it is an audit-only action or a run-backed operation, and the repository enforces the single-run model by preventing legacy references and UX drift.
|
||
|
||
**Why this priority**: Preventing regression is essential for suite readiness and long-term maintainability.
|
||
|
||
**Independent Test**: Introduce a legacy reference or a bulk action without a run-backed record and confirm CI/automated checks fail.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a change introduces any reference to the legacy bulk-run system, **When** tests/CI run, **Then** the pipeline fails with a clear message.
|
||
2. **Given** a security-relevant DB-only action that is eligible for audit-only classification, **When** the action runs, **Then** an audit log entry is recorded and no run record is created.
|
||
|
||
### Edge Cases
|
||
|
||
- Bulk selection is empty or resolves to zero items: the system does not start work and provides a clear non-destructive result.
|
||
- A bulk selection is very large: the system remains responsive and continues to show progress via run summary metrics.
|
||
- Target scope is required but missing: the system fails safely, records a terminal run with a stable reason code, and does not execute remote/bulk mutations.
|
||
- Remote calls experience throttling: the system applies bounded retries with jittered backoff and records failures without losing overall run visibility.
|
||
- Duplicate submissions (double click / retry / re-run): idempotency prevents duplicate processing and preserves a single canonical outcome per selection identity.
|
||
- Tenant isolation: no run, selection, summary, or notifications leak across tenants.
|
||
|
||
## Requirements *(mandatory)*
|
||
|
||
**Constitution alignment (required):** This feature consolidates operational work onto a single canonical run model and a single monitoring surface. It must preserve the defined user feedback surfaces (queued toast, active widget, terminal notification), ensure tenant-scoped observability, and maintain stable, sanitized messages and reason codes.
|
||
|
||
### Functional Requirements
|
||
|
||
- **FR-001 Single run model**: The system MUST use a single canonical run model (`OperationRun`) for all run-backed operations; the legacy bulk-run model MUST not exist after this feature.
|
||
- **FR-002 Bulk actions are run-backed**: Any bulk action (apply to N records, chunked work, mass ignore/restore/prune/delete) MUST create or reuse an `OperationRun` and MUST be visible in Monitoring → Operations.
|
||
- **FR-003 Action taxonomy**: Every admin action MUST be classified as exactly one of:
|
||
- **Audit-only DB action**: DB-only, no remote/external calls, no queued work, and bounded DB work; typically completes within ~2 seconds (guidance, not a hard rule). MUST write an audit log for security/ops-relevant state changes; MUST NOT create an `OperationRun`.
|
||
- **Run-backed operation**: queued/long-running/remote/bulk/scheduled or otherwise operationally significant; MUST create or reuse an `OperationRun`.
|
||
|
||
**Decision rule**: If classification is uncertain, default to **Run-backed operation**.
|
||
- **FR-004 Canonical UX surfaces**: For run-backed operations, the system MUST use only these feedback surfaces:
|
||
- **Queued**: toast-only
|
||
- **Active**: tenant-wide active widget
|
||
- **Terminal**: database-backed notification to the initiator only
|
||
- **FR-005 Canonical routing**: All “View run” links MUST route to Monitoring → Operations → Run Detail.
|
||
- **FR-006 Legacy removal**: The system MUST remove legacy bulk-run tables/models/services/routes/widgets/navigation and MUST prevent any new legacy writes.
|
||
- **FR-007 Canonical summary metrics**: The run’s summary metrics MUST use a single canonical set of keys and MUST be presented consistently in the run detail view.
|
||
- **FR-008 Target scope recording**: For operations targeting a directory/remote tenant, the run context MUST record the target scope (directory identifier) and Monitoring/Run Detail MUST display it in a human-friendly way when available.
|
||
- **FR-009 Per-target throttling**: Bulk orchestration MUST enforce concurrency limits per target scope to reduce throttling risk and provide predictable execution; the limit MUST be configuration-driven with a default of 1 per target scope.
|
||
- **FR-010 Idempotency for bulk**: Bulk operations MUST be idempotent using a deterministic fingerprint that includes operation type, target scope, and selection identity; retries MUST NOT duplicate work.
|
||
- **FR-011 Discovery completeness**: The implementation MUST include a repo-wide discovery sweep of legacy references and bulk-like actions; findings MUST be recorded in a discovery report with classification and migration/deferral decisions.
|
||
- **FR-012 Regression guardrails**: Automated checks MUST fail if legacy bulk-run references reappear or if bulk actions bypass the canonical run-backed model.
|
||
|
||
### Non-Functional Requirements (NFR)
|
||
|
||
#### NFR-01 Monitoring is DB-only at render time (Constitution Gate)
|
||
|
||
All Monitoring → Operations pages (index and run detail) MUST be DB-only at render time:
|
||
|
||
- No Graph/remote calls during initial render or reactive renders.
|
||
- No side-effectful work triggered by view rendering.
|
||
|
||
**Verification**:
|
||
|
||
- Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.
|
||
|
||
- Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.
|
||
|
||
#### NFR-02 Failure reason codes and message sanitization
|
||
|
||
Run-backed operations MUST store failures as stable, machine-readable `reason_code` values plus a sanitized, user-facing message.
|
||
|
||
**Minimal required reason_code set (baseline)**:
|
||
|
||
| reason_code | Meaning |
|
||
|------------|---------|
|
||
| graph_throttled | Remote service throttled (e.g., rate limited) |
|
||
| graph_timeout | Remote call timed out |
|
||
| permission_denied | Missing/insufficient permissions |
|
||
| validation_error | Input/selection validation failure |
|
||
| conflict_detected | Conflict detected (concurrency/version/resource state) |
|
||
| unknown_error | Fallback when no specific code applies |
|
||
|
||
**Rules**:
|
||
|
||
- `reason_code` is stable over time and safe to use in programmatic filters/alerts.
|
||
- Failure messages are sanitized and bounded in length; failures/notifications MUST NOT persist secrets/tokens/PII or raw payload dumps.
|
||
|
||
#### NFR-03 Retry/backoff/jitter for remote throttling
|
||
|
||
When worker jobs perform remote calls, they MUST handle transient failures (including 429/503) via a shared policy:
|
||
|
||
- bounded retries
|
||
- exponential backoff with jitter
|
||
- no hand-rolled `sleep()` loops or ad-hoc random retry logic in feature code
|
||
|
||
### Implementation Shape (decision)
|
||
|
||
**Decision: standard orchestrator + item workers**
|
||
|
||
- 1 orchestrator job per run:
|
||
- resolves selection deterministically
|
||
- chunks work
|
||
- dispatches item worker jobs (idempotent per item)
|
||
- Worker jobs update `operation_runs.summary_counts` via canonical normalization.
|
||
- Finalization sets terminal status once.
|
||
|
||
### Target Scope (canonical keys)
|
||
|
||
**Canonical context keys**:
|
||
|
||
- `entra_tenant_id` (Azure AD tenant GUID)
|
||
- optional `entra_tenant_name` (human-friendly; if available)
|
||
- optional `directory_context_id` (internal directory context identifier, if/when introduced)
|
||
|
||
For operations targeting a directory/remote tenant, the run context MUST record target scope using the canonical keys above, and Monitoring/Run Detail MUST display the target scope (human-friendly name if available).
|
||
|
||
#### Assumptions
|
||
|
||
- Existing run status semantics remain unchanged (queued/running/succeeded/partial/failed).
|
||
- Existing monitoring experience is not redesigned; it is aligned so that all operational work is represented consistently.
|
||
|
||
#### Dependencies
|
||
|
||
- Prior consolidation work establishing `OperationRun` as the canonical run model and Monitoring → Operations as the canonical surface.
|
||
- Existing audit logging conventions for security/ops-relevant DB-only actions.
|
||
|
||
#### Legacy History Decision (recorded)
|
||
|
||
- Default path: legacy bulk-run history is not migrated into the canonical run model. The legacy tables are removed after cutover, relying on database backups/exports if historical investigation is needed.
|
||
|
||
### Key Entities *(include if feature involves data)*
|
||
|
||
- **OperationRun**: A tenant-scoped record of operational work with status, timestamps, sanitized user-facing message/reason code, summary metrics, and context.
|
||
- **Operation Type**: A stable identifier describing the kind of operation (used for categorization, labeling, and governance).
|
||
- **Target Scope**: The directory / remote tenant scope that the operation targets (when applicable).
|
||
- **Selection Identity**: The deterministic definition of “what the bulk action applies to” used for idempotency and traceability.
|
||
- **Audit Log Entry**: A record of security/ops-relevant state changes for audit-only DB actions.
|
||
|
||
## Success Criteria *(mandatory)*
|
||
|
||
### Measurable Outcomes
|
||
|
||
- **SC-001**: 100% of bulk actions in the admin UI create or reuse a canonical run record and appear in Monitoring → Operations.
|
||
- **SC-002**: Repository contains 0 references to the legacy bulk-run system after completion, enforced by automated checks.
|
||
- **SC-003**: For directory-targeted operations, 100% of run records display a target scope in Monitoring/Run Detail.
|
||
- **SC-004**: For bulk operations, duplicate submissions do not increase processed item count beyond one idempotent execution per selection identity.
|
||
- **SC-005**: Admins can locate a completed bulk run in Monitoring within 30 seconds using standard navigation and filters, without relying on legacy pages.
|