# Feature Specification: 096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX) **Feature Branch**: `096-ops-polish-assignment-dedupe-system-tracking` **Created**: 2026-02-15 **Status**: Draft **Input**: User description: "096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)" ## Spec Scope Fields *(mandatory)* - **Scope**: tenant - **Primary Routes**: None (background operations only) - **Data Ownership**: tenant-owned operational records (run tracking + run outcomes) and tenant seed data - **RBAC**: No new permissions; no changes to user-facing authorization behavior (existing gates/policies remain the source of truth for starting operations) ## Clarifications ### Session 2026-02-15 - Q: What is the dedupe identity rule for assignment jobs? → A: Use `operation_run_id` when available; otherwise dedupe by `tenant_id + job_type + stable input fingerprint` (no secrets). - Q: If an assignment job retries, how should `OperationRun` summary counters be persisted? → A: On completion (success or terminal failure), write/overwrite the run’s counters so they reflect the final attempt. - Q: What deduplication duration should we enforce for assignment jobs? → A: 15 minutes. - Q: For seeded tenants, what format should `tenants.external_id` use? → A: UUID string (v4). - Q: What `OperationRun.type` should we use for `ReconcileAdapterRunsJob` tracking? → A: `ops.reconcile_adapter_runs`. ## User Scenarios & Testing *(mandatory)* ### User Story 1 - Assignment runs show durable summaries (Priority: P2) As an operator, I want assignment-related background runs to persist consistent summary counters so that run outcomes can be audited reliably (especially under retries). **Why this priority**: These jobs already run in production; missing summaries reduce observability and make incident triage slower. **Independent Test**: Dispatch one assignment job run, then verify the recorded run summary includes total/processed/failed and remains correct after retry simulation. **Acceptance Scenarios**: 1. **Given** an assignment fetch run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed. 2. **Given** an assignment restore run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed. 3. **Given** an assignment run retries due to transient failure, **When** it ultimately completes, **Then** its summary counters do not double-count work across attempts. --- ### User Story 2 - Duplicate dispatches do not overlap (Priority: P2) As an operator, I want assignment jobs to be deduplicated by identity so accidental double dispatch (or queue redelivery) does not cause concurrent overlapping work. **Why this priority**: Duplicate concurrency increases the risk of conflicting writes, rate limiting, and confusing operational outcomes. **Independent Test**: Attempt to dispatch the same job identity twice and assert only one execution proceeds while the other is deduped/skipped. **Acceptance Scenarios**: 1. **Given** an assignment job with a stable identity is dispatched twice in a short window, **When** workers attempt to execute both, **Then** only one execution proceeds (no concurrent overlap) for that identity. 2. **Given** an assignment job with a stable identity completed recently, **When** the same identity is dispatched again within the deduplication duration, **Then** the new execution is skipped/deduped. 3. **Given** the dedupe duration has elapsed since the last terminal completion for an identity, **When** the same identity is legitimately run again, **Then** the new execution is allowed to proceed. --- ### User Story 3 - Housekeeping runs are tracked like everything else (Priority: P3) As an operator, I want housekeeping/system jobs to produce the same run tracking as other operations so that the operations ledger is complete. **Why this priority**: This is operational completeness. It is not ship-blocking but improves consistency and reduces blind spots. **Independent Test**: Execute a housekeeping run and verify a run record is created/updated with success/failure outcome details. **Acceptance Scenarios**: 1. **Given** the reconcile-adapter-runs job executes successfully, **When** the operations ledger is inspected, **Then** a run record exists with a success outcome. 2. **Given** the reconcile-adapter-runs job fails, **When** the operations ledger is inspected, **Then** the run record includes a stable reason code and a sanitized error message suitable for operators. --- ### User Story 4 - Fresh seed flows work without manual intervention (Priority: P3) As a developer, I want local and CI seed flows to run successfully so that onboarding and test environments are reproducible. **Why this priority**: Broken seeding slows development and increases setup variability. **Independent Test**: Run a clean database reset with seeding and confirm it completes without constraint errors. **Acceptance Scenarios**: 1. **Given** a clean database, **When** a full reset-and-seed workflow is executed, **Then** it succeeds without requiring manual edits. 2. **Given** seeded tenants exist, **When** their records are validated, **Then** required identifiers are populated, and `external_id` is a UUID v4 string. ### Edge Cases - Retries: If a job attempt fails and retries, summary counters must remain consistent (no double counting). - Partial failure: If some items fail, the run must record failures without obscuring successful processing. - Deduplication window: Dedupe should block concurrent overlap but must not prevent legitimate future runs after the window. - Error reporting: Failure messages must be sanitized and stable enough to support searching and alerting. ## Requirements *(mandatory)* **Constitution alignment (required):** This feature touches queued/background work and run observability. It must: - maintain tenant isolation (no cross-tenant leakage in run tracking), - ensure run observability is complete for the jobs in scope, - and add automated tests for the changed operational behavior. **Constitution alignment (RBAC-UX):** No new UI surfaces are added and no authorization behavior is changed. Any existing authorization checks for starting operations remain server-side and unchanged. **Constitution alignment (Filament Action Surfaces):** Not applicable (no Filament Resources/Pages/RelationManagers are added or modified). ### Functional Requirements - **FR-001**: The system MUST persist summary counters (total, processed, failed) for assignment fetch runs upon completion. - **FR-002**: The system MUST persist summary counters (total, processed, failed) for assignment restore runs upon completion. - **FR-003**: Summary counters MUST be idempotent-friendly: on completion (success or terminal failure), the run summary counters MUST reflect the final attempt and retries MUST NOT double-count totals for the same run identity. - **FR-004**: Assignment jobs MUST enforce server-side deduplication by a stable, non-secret job identity to prevent concurrent overlap. - **FR-004a**: The job identity MUST be derived as: `operation_run_id` when available; otherwise `tenant_id + job_type + stable input fingerprint`. - **FR-004b**: The stable input fingerprint MUST be deterministic and MUST NOT include secrets. - **FR-005**: The deduplication duration MUST cover expected worst-case runtime while allowing legitimate future executions after it expires. - **FR-005a**: The deduplication duration MUST be 15 minutes. - **FR-006**: Deduplication MUST be enforced at execution time (not solely by UI gating or caller-side checks). - **FR-007**: The reconcile-adapter-runs housekeeping job MUST create/update an operational run record each time it executes. - **FR-007a**: The reconcile-adapter-runs housekeeping job MUST use `OperationRun.type = ops.reconcile_adapter_runs`. - **FR-008**: On failure, the housekeeping run record MUST include a stable reason code and a sanitized operator-facing error message. - **FR-009**: The seed workflow MUST populate required tenant identifiers so database constraints are satisfied. - **FR-009a**: Seeded tenants MUST have `external_id` populated as a UUID string (v4). - **FR-010**: A clean reset-and-seed workflow MUST succeed without manual intervention. ### Assumptions & Dependencies - The system already has a durable operations ledger concept (run tracking) that can store outcomes and summary counters. - Assignment fetch/restore jobs already produce item-level totals (or can derive them) without changing business behavior. - Dedupe is evaluated based on a stable identity that is safe to store and log (no secrets). - No new UI, routes, or end-user workflows are introduced by this work. ### Key Entities *(include if feature involves data)* - **Operation Run**: A durable record of a background operation execution, its outcome, and its summary counters. - **Job Identity**: A stable identifier used to deduplicate concurrent executions of the same logical work. - **Tenant**: The scope boundary for operational records and seeded data. ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: 100% of completed assignment fetch and restore runs show persisted summary counters (total/processed/failed) in the operations ledger. - **SC-002**: Duplicate dispatch attempts for the same assignment job identity result in at most one concurrent execution within the deduplication window. - **SC-003**: 100% of reconcile-adapter-runs executions produce an operational run record with a success or failure outcome. - **SC-004**: A clean reset-and-seed workflow completes successfully in CI and locally without database constraint failures.