TenantAtlas/specs/056-remove-legacy-bulkops/spec.md

# Feature Specification: Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)

**Feature Branch**: `056-remove-legacy-bulkops`
**Created**: 2026-01-18
**Status**: Draft
**Input**: User description: "Feature 056 — Remove Legacy BulkOperationRun & Canonicalize Operations (v1.0)"

## Clarifications

### Session 2026-01-18

- Q: What should be the default max concurrency per target scope (entra_tenant_id / directory_context_id) for bulk operations? → A: Config-driven, default=1
- Q: How should Selection Identity be determined for idempotency fingerprinting? → A: Hybrid (IDs-hash for explicit selection; query-hash for “select all via filter/query”)

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Run-backed bulk actions are always observable (Priority: P1)

An admin performs a bulk action (e.g., apply/ignore/restore/prune across many records). The system records a single canonical run that can be monitored end-to-end, including partial failures, and provides consistent user feedback.

**Why this priority**: Bulk changes are operationally significant and must be traceable, support partial outcomes, and have a consistent mental model for admins.

**Independent Test**: Trigger a representative bulk action and verify that a run record exists, appears in the Monitoring list, has a detail view, and emits the correct feedback surfaces.

**Acceptance Scenarios**:

1. **Given** an admin selects multiple items for a bulk action, **When** the action is confirmed and submitted, **Then** a canonical run record is created or reused and the UI confirms the enqueue/queued state via a toast.
2. **Given** a bulk run is queued or running, **When** the admin opens Monitoring → Operations, **Then** the run appears in the list and can be opened via a canonical “View run” link.
3. **Given** a bulk run completes with a mix of successes and failures, **When** the run reaches a terminal state, **Then** the initiator receives a terminal notification and the run detail shows a summary of outcomes.

---

### User Story 2 - Monitoring is the single source of run history (Priority: P2)

An admin (or operator) relies on Monitoring → Operations to see the full history of operational work (including bulk). There are no separate legacy run surfaces; links from anywhere in the app point to the canonical run detail.

**Why this priority**: Multiple run systems lead to missed incidents, inconsistent retention, and developer confusion. One canonical surface improves operational clarity and reduces support overhead.

**Independent Test**: Navigate from a bulk action result to “View run” and confirm it lands in Monitoring’s run detail; confirm there is no legacy “bulk runs” navigation or pages.

**Acceptance Scenarios**:

1. **Given** any UI element offers a “View run” link, **When** it is clicked, **Then** it opens the canonical Monitoring → Operations → Run Detail page for that run.
2. **Given** the app navigation, **When** an admin searches for legacy bulk-run screens, **Then** no legacy bulk-run navigation or pages exist.

---

### User Story 3 - Developers can’t accidentally reintroduce legacy patterns (Priority: P3)

A developer adds or modifies an admin action. They can clearly determine whether it is an audit-only action or a run-backed operation, and the repository enforces the single-run model by preventing legacy references and UX drift.

**Why this priority**: Preventing regression is essential for suite readiness and long-term maintainability.

**Independent Test**: Introduce a legacy reference or a bulk action without a run-backed record and confirm CI/automated checks fail.

**Acceptance Scenarios**:

1. **Given** a change introduces any reference to the legacy bulk-run system, **When** tests/CI run, **Then** the pipeline fails with a clear message.
2. **Given** a security-relevant DB-only action that is eligible for audit-only classification, **When** the action runs, **Then** an audit log entry is recorded and no run record is created.

### Edge Cases

- Bulk selection is empty or resolves to zero items: the system does not start work and provides a clear non-destructive result.
- A bulk selection is very large: the system remains responsive and continues to show progress via run summary metrics.
- Target scope is required but missing: the system fails safely, records a terminal run with a stable reason code, and does not execute remote/bulk mutations.
- Remote calls experience throttling: the system applies bounded retries with jittered backoff and records failures without losing overall run visibility.
- Duplicate submissions (double click / retry / re-run): idempotency prevents duplicate processing and preserves a single canonical outcome per selection identity.
- Tenant isolation: no run, selection, summary, or notifications leak across tenants.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature consolidates operational work onto a single canonical run model and a single monitoring surface. It must preserve the defined user feedback surfaces (queued toast, active widget, terminal notification), ensure tenant-scoped observability, and maintain stable, sanitized messages and reason codes.

### Functional Requirements

- **FR-001 Single run model**: The system MUST use a single canonical run model (`OperationRun`) for all run-backed operations; the legacy bulk-run model MUST not exist after this feature.
- **FR-002 Bulk actions are run-backed**: Any bulk action (apply to N records, chunked work, mass ignore/restore/prune/delete) MUST create or reuse an `OperationRun` and MUST be visible in Monitoring → Operations.
- **FR-003 Action taxonomy**: Every admin action MUST be classified as exactly one of:
  - **Audit-only DB action**: DB-only, no remote/external calls, no queued work, and bounded DB work; typically completes within ~2 seconds (guidance, not a hard rule). MUST write an audit log for security/ops-relevant state changes; MUST NOT create an `OperationRun`.
  - **Run-backed operation**: queued/long-running/remote/bulk/scheduled or otherwise operationally significant; MUST create or reuse an `OperationRun`.

**Decision rule**: If classification is uncertain, default to **Run-backed operation**.
- **FR-004 Canonical UX surfaces**: For run-backed operations, the system MUST use only these feedback surfaces:
  - **Queued**: toast-only
  - **Active**: tenant-wide active widget
  - **Terminal**: database-backed notification to the initiator only
- **FR-005 Canonical routing**: All “View run” links MUST route to Monitoring → Operations → Run Detail.
- **FR-006 Legacy removal**: The system MUST remove legacy bulk-run tables/models/services/routes/widgets/navigation and MUST prevent any new legacy writes.
- **FR-007 Canonical summary metrics**: The run’s summary metrics MUST use a single canonical set of keys and MUST be presented consistently in the run detail view.
- **FR-008 Target scope recording**: For operations targeting a directory/remote tenant, the run context MUST record the target scope (directory identifier) and Monitoring/Run Detail MUST display it in a human-friendly way when available.
- **FR-009 Per-target throttling**: Bulk orchestration MUST enforce concurrency limits per target scope to reduce throttling risk and provide predictable execution; the limit MUST be configuration-driven with a default of 1 per target scope.
- **FR-010 Idempotency for bulk**: Bulk operations MUST be idempotent using a deterministic fingerprint that includes operation type, target scope, and selection identity; retries MUST NOT duplicate work.
- **FR-011 Discovery completeness**: The implementation MUST include a repo-wide discovery sweep of legacy references and bulk-like actions; findings MUST be recorded in a discovery report with classification and migration/deferral decisions.
- **FR-012 Regression guardrails**: Automated checks MUST fail if legacy bulk-run references reappear or if bulk actions bypass the canonical run-backed model.

### Non-Functional Requirements (NFR)

#### NFR-01 Monitoring is DB-only at render time (Constitution Gate)

All Monitoring → Operations pages (index and run detail) MUST be DB-only at render time:

- No Graph/remote calls during initial render or reactive renders.
- No side-effectful work triggered by view rendering.

**Verification**:

- Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.

- Add a regression test/guard that mocks the Graph client (or equivalent remote client) and asserts it is not called during Monitoring renders.

#### NFR-02 Failure reason codes and message sanitization

Run-backed operations MUST store failures as stable, machine-readable `reason_code` values plus a sanitized, user-facing message.

**Minimal required reason_code set (baseline)**:

| reason_code | Meaning |
|------------|---------|
| graph_throttled | Remote service throttled (e.g., rate limited) |
| graph_timeout | Remote call timed out |
| permission_denied | Missing/insufficient permissions |
| validation_error | Input/selection validation failure |
| conflict_detected | Conflict detected (concurrency/version/resource state) |
| unknown_error | Fallback when no specific code applies |

**Rules**:

- `reason_code` is stable over time and safe to use in programmatic filters/alerts.
- Failure messages are sanitized and bounded in length; failures/notifications MUST NOT persist secrets/tokens/PII or raw payload dumps.

#### NFR-03 Retry/backoff/jitter for remote throttling

When worker jobs perform remote calls, they MUST handle transient failures (including 429/503) via a shared policy:

- bounded retries
- exponential backoff with jitter
- no hand-rolled `sleep()` loops or ad-hoc random retry logic in feature code

### Implementation Shape (decision)

**Decision: standard orchestrator + item workers**

- 1 orchestrator job per run:
  - resolves selection deterministically
  - chunks work
  - dispatches item worker jobs (idempotent per item)
- Worker jobs update `operation_runs.summary_counts` via canonical normalization.
- Finalization sets terminal status once.

### Target Scope (canonical keys)

**Canonical context keys**:

- `entra_tenant_id` (Azure AD tenant GUID)
- optional `entra_tenant_name` (human-friendly; if available)
- optional `directory_context_id` (internal directory context identifier, if/when introduced)

For operations targeting a directory/remote tenant, the run context MUST record target scope using the canonical keys above, and Monitoring/Run Detail MUST display the target scope (human-friendly name if available).

#### Assumptions

- Existing run status semantics remain unchanged (queued/running/succeeded/partial/failed).
- Existing monitoring experience is not redesigned; it is aligned so that all operational work is represented consistently.

#### Dependencies

- Prior consolidation work establishing `OperationRun` as the canonical run model and Monitoring → Operations as the canonical surface.
- Existing audit logging conventions for security/ops-relevant DB-only actions.

#### Legacy History Decision (recorded)

- Default path: legacy bulk-run history is not migrated into the canonical run model. The legacy tables are removed after cutover, relying on database backups/exports if historical investigation is needed.

### Key Entities *(include if feature involves data)*

- **OperationRun**: A tenant-scoped record of operational work with status, timestamps, sanitized user-facing message/reason code, summary metrics, and context.
- **Operation Type**: A stable identifier describing the kind of operation (used for categorization, labeling, and governance).
- **Target Scope**: The directory / remote tenant scope that the operation targets (when applicable).
- **Selection Identity**: The deterministic definition of “what the bulk action applies to” used for idempotency and traceability.
- **Audit Log Entry**: A record of security/ops-relevant state changes for audit-only DB actions.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: 100% of bulk actions in the admin UI create or reuse a canonical run record and appear in Monitoring → Operations.
- **SC-002**: Repository contains 0 references to the legacy bulk-run system after completion, enforced by automated checks.
- **SC-003**: For directory-targeted operations, 100% of run records display a target scope in Monitoring/Run Detail.
- **SC-004**: For bulk operations, duplicate submissions do not increase processed item count beyond one idempotent execution per selection identity.
- **SC-005**: Admins can locate a completed bulk run in Monitoring within 30 seconds using standard navigation and filters, without relying on legacy pages.