# Feature Specification: Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0) **Feature Branch**: `056-remove-legacy-bulkops` **Created**: 2026-01-18 **Status**: Draft **Input**: User description: "Feature 056 — Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)" ## Clarifications ### Session 2026-01-18 - Q: What should be the default max concurrency per target scope (entra_tenant_id / directory_context_id) for bulk operations? → A: Config-driven, default=1 - Q: How should Selection Identity be determined for idempotency fingerprinting? → A: Hybrid (IDs-hash for explicit selection; query-hash for “select all via filter/query”) ## User Scenarios & Testing *(mandatory)* ### User Story 1 - Run-backed bulk actions are always observable (Priority: P1) An admin performs a bulk action (e.g., apply/ignore/restore/prune across many records). The system records a single canonical run that can be monitored end-to-end, including partial failures, and provides consistent user feedback. **Why this priority**: Bulk changes are operationally significant and must be traceable, support partial outcomes, and have a consistent mental model for admins. **Independent Test**: Trigger a representative bulk action and verify that a run record exists, appears in the Monitoring list, has a detail view, and emits the correct feedback surfaces. **Acceptance Scenarios**: 1. **Given** an admin selects multiple items for a bulk action, **When** the action is confirmed and submitted, **Then** a canonical run record is created or reused and the UI confirms the enqueue/queued state via a toast. 2. **Given** a bulk run is queued or running, **When** the admin opens Monitoring → Operations, **Then** the run appears in the list and can be opened via a canonical “View run” link. 3. **Given** a bulk run completes with a mix of successes and failures, **When** the run reaches a terminal state, **Then** the initiator receives a terminal notification and the run detail shows a summary of outcomes. --- ### User Story 2 - Monitoring is the single source of run history (Priority: P2) An admin (or operator) relies on Monitoring → Operations to see the full history of operational work (including bulk). There are no separate legacy run surfaces; links from anywhere in the app point to the canonical run detail. **Why this priority**: Multiple run systems lead to missed incidents, inconsistent retention, and developer confusion. One canonical surface improves operational clarity and reduces support overhead. **Independent Test**: Navigate from a bulk action result to “View run” and confirm it lands in Monitoring’s run detail; confirm there is no legacy “bulk runs” navigation or pages. **Acceptance Scenarios**: 1. **Given** any UI element offers a “View run” link, **When** it is clicked, **Then** it opens the canonical Monitoring → Operations → Run Detail page for that run. 2. **Given** the app navigation, **When** an admin searches for legacy bulk-run screens, **Then** no legacy bulk-run navigation or pages exist. --- ### User Story 3 - Developers can’t accidentally reintroduce legacy patterns (Priority: P3) A developer adds or modifies an admin action. They can clearly determine whether it is an audit-only action or a run-backed operation, and the repository enforces the single-run model by preventing legacy references and UX drift. **Why this priority**: Preventing regression is essential for suite readiness and long-term maintainability. **Independent Test**: Introduce a legacy reference or a bulk action without a run-backed record and confirm CI/automated checks fail. **Acceptance Scenarios**: 1. **Given** a change introduces any reference to the legacy bulk-run system, **When** tests/CI run, **Then** the pipeline fails with a clear message. 2. **Given** a security-relevant DB-only action that is eligible for audit-only classification, **When** the action runs, **Then** an audit log entry is recorded and no run record is created. ### Edge Cases - Bulk selection is empty or resolves to zero items: the system does not start work and provides a clear non-destructive result. - A bulk selection is very large: the system remains responsive and continues to show progress via run summary metrics. - Target scope is required but missing: the system fails safely, records a terminal run with a stable reason code, and does not execute remote/bulk mutations. - Remote calls experience throttling: the system applies bounded retries with jittered backoff and records failures without losing overall run visibility. - Duplicate submissions (double click / retry / re-run): idempotency prevents duplicate processing and preserves a single canonical outcome per selection identity. - Tenant isolation: no run, selection, summary, or notifications leak across tenants. ## Requirements *(mandatory)* **Constitution alignment (required):** This feature consolidates operational work onto a single canonical run model and a single monitoring surface. It must preserve the defined user feedback surfaces (queued toast, active widget, terminal notification), ensure tenant-scoped observability, and maintain stable, sanitized messages and reason codes. ### Functional Requirements - **FR-001 Single run model**: The system MUST use a single canonical run model (`OperationRun`) for all run-backed operations; the legacy bulk-run model MUST not exist after this feature. - **FR-002 Bulk actions are run-backed**: Any bulk action (apply to N records, chunked work, mass ignore/restore/prune/delete) MUST create or reuse an `OperationRun` and MUST be visible in Monitoring → Operations. - **FR-003 Action taxonomy**: Every admin action MUST be classified as exactly one of: - **Audit-only DB action**: DB-only, no remote/external calls, no queued work, and bounded DB work; typically completes within ~2 seconds (guidance, not a hard rule). MUST write an audit log for security/ops-relevant state changes; MUST NOT create an `OperationRun`. - **Run-backed operation**: queued/long-running/remote/bulk/scheduled or otherwise operationally significant; MUST create or reuse an `OperationRun`. **Decision rule**: If classification is uncertain, default to **Run-backed operation**. - **FR-004 Canonical UX surfaces**: For run-backed operations, the system MUST use only these feedback surfaces: - **Queued**: toast-only - **Active**: tenant-wide active widget - **Terminal**: database-backed notification to the initiator only - **FR-005 Canonical routing**: All “View run” links MUST route to Monitoring → Operations → Run Detail. - **FR-006 Legacy removal**: The system MUST remove legacy bulk-run tables/models/services/routes/widgets/navigation and MUST prevent any new legacy writes. - **FR-007 Canonical summary metrics**: The run’s summary metrics MUST use a single canonical set of keys and MUST be presented consistently in the run detail view. - **FR-008 Target scope recording**: For operations targeting a directory/remote tenant, the run context MUST record the target scope (directory identifier) and Monitoring/Run Detail MUST display it in a human-friendly way when available. - **FR-009 Per-target throttling**: Bulk orchestration MUST enforce concurrency limits per target scope to reduce throttling risk and provide predictable execution; the limit MUST be configuration-driven with a default of 1 per target scope. - **FR-010 Idempotency for bulk**: Bulk operations MUST be idempotent using a deterministic fingerprint that includes operation type, target scope, and selection identity; retries MUST NOT duplicate work. - **FR-011 Discovery completeness**: The implementation MUST include a repo-wide discovery sweep of legacy references and bulk-like actions; findings MUST be recorded in a discovery report with classification and migration/deferral decisions. - **FR-012 Regression guardrails**: Automated checks MUST fail if legacy bulk-run references reappear or if bulk actions bypass the canonical run-backed model. ### Non-Functional Requirements (NFR) #### NFR-01 Monitoring is DB-only at render time (Constitution Gate) All Monitoring → Operations pages (index and run detail) MUST be DB-only at render time: - No Graph/remote calls during initial render or reactive renders. - No side-effectful work triggered by view rendering. **Verification**: - Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders. - Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders. #### NFR-02 Failure reason codes and message sanitization Run-backed operations MUST store failures as stable, machine-readable `reason_code` values plus a sanitized, user-facing message. **Minimal required reason_code set (baseline)**: | reason_code | Meaning | |------------|---------| | graph_throttled | Remote service throttled (e.g., rate limited) | | graph_timeout | Remote call timed out | | permission_denied | Missing/insufficient permissions | | validation_error | Input/selection validation failure | | conflict_detected | Conflict detected (concurrency/version/resource state) | | unknown_error | Fallback when no specific code applies | **Rules**: - `reason_code` is stable over time and safe to use in programmatic filters/alerts. - Failure messages are sanitized and bounded in length; failures/notifications MUST NOT persist secrets/tokens/PII or raw payload dumps. #### NFR-03 Retry/backoff/jitter for remote throttling When worker jobs perform remote calls, they MUST handle transient failures (including 429/503) via a shared policy: - bounded retries - exponential backoff with jitter - no hand-rolled `sleep()` loops or ad-hoc random retry logic in feature code ### Implementation Shape (decision) **Decision: standard orchestrator + item workers** - 1 orchestrator job per run: - resolves selection deterministically - chunks work - dispatches item worker jobs (idempotent per item) - Worker jobs update `operation_runs.summary_counts` via canonical normalization. - Finalization sets terminal status once. ### Target Scope (canonical keys) **Canonical context keys**: - `entra_tenant_id` (Azure AD tenant GUID) - optional `entra_tenant_name` (human-friendly; if available) - optional `directory_context_id` (internal directory context identifier, if/when introduced) For operations targeting a directory/remote tenant, the run context MUST record target scope using the canonical keys above, and Monitoring/Run Detail MUST display the target scope (human-friendly name if available). #### Assumptions - Existing run status semantics remain unchanged (queued/running/succeeded/partial/failed). - Existing monitoring experience is not redesigned; it is aligned so that all operational work is represented consistently. #### Dependencies - Prior consolidation work establishing `OperationRun` as the canonical run model and Monitoring → Operations as the canonical surface. - Existing audit logging conventions for security/ops-relevant DB-only actions. #### Legacy History Decision (recorded) - Default path: legacy bulk-run history is not migrated into the canonical run model. The legacy tables are removed after cutover, relying on database backups/exports if historical investigation is needed. ### Key Entities *(include if feature involves data)* - **OperationRun**: A tenant-scoped record of operational work with status, timestamps, sanitized user-facing message/reason code, summary metrics, and context. - **Operation Type**: A stable identifier describing the kind of operation (used for categorization, labeling, and governance). - **Target Scope**: The directory / remote tenant scope that the operation targets (when applicable). - **Selection Identity**: The deterministic definition of “what the bulk action applies to” used for idempotency and traceability. - **Audit Log Entry**: A record of security/ops-relevant state changes for audit-only DB actions. ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: 100% of bulk actions in the admin UI create or reuse a canonical run record and appear in Monitoring → Operations. - **SC-002**: Repository contains 0 references to the legacy bulk-run system after completion, enforced by automated checks. - **SC-003**: For directory-targeted operations, 100% of run records display a target scope in Monitoring/Run Detail. - **SC-004**: For bulk operations, duplicate submissions do not increase processed item count beyond one idempotent execution per selection identity. - **SC-005**: Admins can locate a completed bulk run in Monitoring within 30 seconds using standard navigation and filters, without relying on legacy pages.