TenantAtlas/specs/096-ops-polish-assignment-dedupe-system-tracking/spec.md

# Feature Specification: 096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)

**Feature Branch**: `096-ops-polish-assignment-dedupe-system-tracking`
**Created**: 2026-02-15
**Status**: Draft
**Input**: User description: "096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)"

## Spec Scope Fields *(mandatory)*

- **Scope**: tenant
- **Primary Routes**: None (background operations only)
- **Data Ownership**: tenant-owned operational records (run tracking + run outcomes) and tenant seed data
- **RBAC**: No new permissions; no changes to user-facing authorization behavior (existing gates/policies remain the source of truth for starting operations)

## Clarifications

### Session 2026-02-15

- Q: What is the dedupe identity rule for assignment jobs? → A: Use `operation_run_id` when available; otherwise dedupe by `tenant_id + job_type + stable input fingerprint` (no secrets).
- Q: If an assignment job retries, how should `OperationRun` summary counters be persisted? → A: On completion (success or terminal failure), write/overwrite the run’s counters so they reflect the final attempt.
- Q: What deduplication duration should we enforce for assignment jobs? → A: 15 minutes.
- Q: For seeded tenants, what format should `tenants.external_id` use? → A: UUID string (v4).
- Q: What `OperationRun.type` should we use for `ReconcileAdapterRunsJob` tracking? → A: `ops.reconcile_adapter_runs`.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Assignment runs show durable summaries (Priority: P2)

As an operator, I want assignment-related background runs to persist consistent summary counters so that run outcomes can be audited reliably (especially under retries).

**Why this priority**: These jobs already run in production; missing summaries reduce observability and make incident triage slower.

**Independent Test**: Dispatch one assignment job run, then verify the recorded run summary includes total/processed/failed and remains correct after retry simulation.

**Acceptance Scenarios**:

1. **Given** an assignment fetch run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed.
2. **Given** an assignment restore run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed.
3. **Given** an assignment run retries due to transient failure, **When** it ultimately completes, **Then** its summary counters do not double-count work across attempts.

---

### User Story 2 - Duplicate dispatches do not overlap (Priority: P2)

As an operator, I want assignment jobs to be deduplicated by identity so accidental double dispatch (or queue redelivery) does not cause concurrent overlapping work.

**Why this priority**: Duplicate concurrency increases the risk of conflicting writes, rate limiting, and confusing operational outcomes.

**Independent Test**: Attempt to dispatch the same job identity twice and assert only one execution proceeds while the other is deduped/skipped.

**Acceptance Scenarios**:

1. **Given** an assignment job with a stable identity is dispatched twice in a short window, **When** workers attempt to execute both, **Then** only one execution proceeds (no concurrent overlap) for that identity.
2. **Given** an assignment job with a stable identity completed recently, **When** the same identity is dispatched again within the deduplication duration, **Then** the new execution is skipped/deduped.
3. **Given** the dedupe duration has elapsed since the last terminal completion for an identity, **When** the same identity is legitimately run again, **Then** the new execution is allowed to proceed.

---

### User Story 3 - Housekeeping runs are tracked like everything else (Priority: P3)

As an operator, I want housekeeping/system jobs to produce the same run tracking as other operations so that the operations ledger is complete.

**Why this priority**: This is operational completeness. It is not ship-blocking but improves consistency and reduces blind spots.

**Independent Test**: Execute a housekeeping run and verify a run record is created/updated with success/failure outcome details.

**Acceptance Scenarios**:

1. **Given** the reconcile-adapter-runs job executes successfully, **When** the operations ledger is inspected, **Then** a run record exists with a success outcome.
2. **Given** the reconcile-adapter-runs job fails, **When** the operations ledger is inspected, **Then** the run record includes a stable reason code and a sanitized error message suitable for operators.

---

### User Story 4 - Fresh seed flows work without manual intervention (Priority: P3)

As a developer, I want local and CI seed flows to run successfully so that onboarding and test environments are reproducible.

**Why this priority**: Broken seeding slows development and increases setup variability.

**Independent Test**: Run a clean database reset with seeding and confirm it completes without constraint errors.

**Acceptance Scenarios**:

1. **Given** a clean database, **When** a full reset-and-seed workflow is executed, **Then** it succeeds without requiring manual edits.
2. **Given** seeded tenants exist, **When** their records are validated, **Then** required identifiers are populated, and `external_id` is a UUID v4 string.

### Edge Cases

- Retries: If a job attempt fails and retries, summary counters must remain consistent (no double counting).
- Partial failure: If some items fail, the run must record failures without obscuring successful processing.
- Deduplication window: Dedupe should block concurrent overlap but must not prevent legitimate future runs after the window.
- Error reporting: Failure messages must be sanitized and stable enough to support searching and alerting.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature touches queued/background work and run observability. It must:
- maintain tenant isolation (no cross-tenant leakage in run tracking),
- ensure run observability is complete for the jobs in scope,
- and add automated tests for the changed operational behavior.

**Constitution alignment (RBAC-UX):** No new UI surfaces are added and no authorization behavior is changed. Any existing authorization checks for starting operations remain server-side and unchanged.

**Constitution alignment (Filament Action Surfaces):** Not applicable (no Filament Resources/Pages/RelationManagers are added or modified).

### Functional Requirements

- **FR-001**: The system MUST persist summary counters (total, processed, failed) for assignment fetch runs upon completion.
- **FR-002**: The system MUST persist summary counters (total, processed, failed) for assignment restore runs upon completion.
- **FR-003**: Summary counters MUST be idempotent-friendly: on completion (success or terminal failure), the run summary counters MUST reflect the final attempt and retries MUST NOT double-count totals for the same run identity.
- **FR-004**: Assignment jobs MUST enforce server-side deduplication by a stable, non-secret job identity to prevent concurrent overlap.
- **FR-004a**: The job identity MUST be derived as: `operation_run_id` when available; otherwise `tenant_id + job_type + stable input fingerprint`.
- **FR-004b**: The stable input fingerprint MUST be deterministic and MUST NOT include secrets.
- **FR-005**: The deduplication duration MUST cover expected worst-case runtime while allowing legitimate future executions after it expires.
- **FR-005a**: The deduplication duration MUST be 15 minutes.
- **FR-006**: Deduplication MUST be enforced at execution time (not solely by UI gating or caller-side checks).
- **FR-007**: The reconcile-adapter-runs housekeeping job MUST create/update an operational run record each time it executes.
- **FR-007a**: The reconcile-adapter-runs housekeeping job MUST use `OperationRun.type = ops.reconcile_adapter_runs`.
- **FR-008**: On failure, the housekeeping run record MUST include a stable reason code and a sanitized operator-facing error message.
- **FR-009**: The seed workflow MUST populate required tenant identifiers so database constraints are satisfied.
- **FR-009a**: Seeded tenants MUST have `external_id` populated as a UUID string (v4).
- **FR-010**: A clean reset-and-seed workflow MUST succeed without manual intervention.

### Assumptions & Dependencies

- The system already has a durable operations ledger concept (run tracking) that can store outcomes and summary counters.
- Assignment fetch/restore jobs already produce item-level totals (or can derive them) without changing business behavior.
- Dedupe is evaluated based on a stable identity that is safe to store and log (no secrets).
- No new UI, routes, or end-user workflows are introduced by this work.

### Key Entities *(include if feature involves data)*

- **Operation Run**: A durable record of a background operation execution, its outcome, and its summary counters.
- **Job Identity**: A stable identifier used to deduplicate concurrent executions of the same logical work.
- **Tenant**: The scope boundary for operational records and seeded data.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: 100% of completed assignment fetch and restore runs show persisted summary counters (total/processed/failed) in the operations ledger.
- **SC-002**: Duplicate dispatch attempts for the same assignment job identity result in at most one concurrent execution within the deduplication window.
- **SC-003**: 100% of reconcile-adapter-runs executions produce an operational run record with a success or failure outcome.
- **SC-004**: A clean reset-and-seed workflow completes successfully in CI and locally without database constraint failures.