Implements Spec 096 ops polish bundle: - Persist durable OperationRun.summary_counts for assignment fetch/restore (final attempt wins) - Server-side dedupe for assignment jobs (15-minute cooldown + non-canonical skip) - Track ReconcileAdapterRunsJob via workspace-scoped OperationRun + stable failure codes + overlap prevention - Seed DX: ensure seeded tenants use UUID v4 external_id and seed satisfies workspace_id NOT NULL constraints Verification (local / evidence-based): - `vendor/bin/sail artisan test --compact tests/Feature/Operations/AssignmentRunSummaryCountsTest.php tests/Feature/Operations/AssignmentJobDedupeTest.php tests/Feature/Operations/ReconcileAdapterRunsJobTrackingTest.php tests/Feature/Seed/PoliciesSeederExternalIdTest.php` - `vendor/bin/sail bin pint --dirty` Spec artifacts included under `specs/096-ops-polish-assignment-dedupe-system-tracking/` (spec/plan/tasks/checklists). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #115
144 lines
9.7 KiB
Markdown
144 lines
9.7 KiB
Markdown
# Feature Specification: 096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)
|
||
|
||
**Feature Branch**: `096-ops-polish-assignment-dedupe-system-tracking`
|
||
**Created**: 2026-02-15
|
||
**Status**: Draft
|
||
**Input**: User description: "096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)"
|
||
|
||
## Spec Scope Fields *(mandatory)*
|
||
|
||
- **Scope**: tenant
|
||
- **Primary Routes**: None (background operations only)
|
||
- **Data Ownership**: tenant-owned operational records (run tracking + run outcomes) and tenant seed data
|
||
- **RBAC**: No new permissions; no changes to user-facing authorization behavior (existing gates/policies remain the source of truth for starting operations)
|
||
|
||
## Clarifications
|
||
|
||
### Session 2026-02-15
|
||
|
||
- Q: What is the dedupe identity rule for assignment jobs? → A: Use `operation_run_id` when available; otherwise dedupe by `tenant_id + job_type + stable input fingerprint` (no secrets).
|
||
- Q: If an assignment job retries, how should `OperationRun` summary counters be persisted? → A: On completion (success or terminal failure), write/overwrite the run’s counters so they reflect the final attempt.
|
||
- Q: What deduplication duration should we enforce for assignment jobs? → A: 15 minutes.
|
||
- Q: For seeded tenants, what format should `tenants.external_id` use? → A: UUID string (v4).
|
||
- Q: What `OperationRun.type` should we use for `ReconcileAdapterRunsJob` tracking? → A: `ops.reconcile_adapter_runs`.
|
||
|
||
## User Scenarios & Testing *(mandatory)*
|
||
|
||
### User Story 1 - Assignment runs show durable summaries (Priority: P2)
|
||
|
||
As an operator, I want assignment-related background runs to persist consistent summary counters so that run outcomes can be audited reliably (especially under retries).
|
||
|
||
**Why this priority**: These jobs already run in production; missing summaries reduce observability and make incident triage slower.
|
||
|
||
**Independent Test**: Dispatch one assignment job run, then verify the recorded run summary includes total/processed/failed and remains correct after retry simulation.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** an assignment fetch run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed.
|
||
2. **Given** an assignment restore run completes, **When** its run record is inspected, **Then** it includes non-null summary counters for total, processed, and failed.
|
||
3. **Given** an assignment run retries due to transient failure, **When** it ultimately completes, **Then** its summary counters do not double-count work across attempts.
|
||
|
||
---
|
||
|
||
### User Story 2 - Duplicate dispatches do not overlap (Priority: P2)
|
||
|
||
As an operator, I want assignment jobs to be deduplicated by identity so accidental double dispatch (or queue redelivery) does not cause concurrent overlapping work.
|
||
|
||
**Why this priority**: Duplicate concurrency increases the risk of conflicting writes, rate limiting, and confusing operational outcomes.
|
||
|
||
**Independent Test**: Attempt to dispatch the same job identity twice and assert only one execution proceeds while the other is deduped/skipped.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** an assignment job with a stable identity is dispatched twice in a short window, **When** workers attempt to execute both, **Then** only one execution proceeds (no concurrent overlap) for that identity.
|
||
2. **Given** an assignment job with a stable identity completed recently, **When** the same identity is dispatched again within the deduplication duration, **Then** the new execution is skipped/deduped.
|
||
3. **Given** the dedupe duration has elapsed since the last terminal completion for an identity, **When** the same identity is legitimately run again, **Then** the new execution is allowed to proceed.
|
||
|
||
---
|
||
|
||
### User Story 3 - Housekeeping runs are tracked like everything else (Priority: P3)
|
||
|
||
As an operator, I want housekeeping/system jobs to produce the same run tracking as other operations so that the operations ledger is complete.
|
||
|
||
**Why this priority**: This is operational completeness. It is not ship-blocking but improves consistency and reduces blind spots.
|
||
|
||
**Independent Test**: Execute a housekeeping run and verify a run record is created/updated with success/failure outcome details.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** the reconcile-adapter-runs job executes successfully, **When** the operations ledger is inspected, **Then** a run record exists with a success outcome.
|
||
2. **Given** the reconcile-adapter-runs job fails, **When** the operations ledger is inspected, **Then** the run record includes a stable reason code and a sanitized error message suitable for operators.
|
||
|
||
---
|
||
|
||
### User Story 4 - Fresh seed flows work without manual intervention (Priority: P3)
|
||
|
||
As a developer, I want local and CI seed flows to run successfully so that onboarding and test environments are reproducible.
|
||
|
||
**Why this priority**: Broken seeding slows development and increases setup variability.
|
||
|
||
**Independent Test**: Run a clean database reset with seeding and confirm it completes without constraint errors.
|
||
|
||
**Acceptance Scenarios**:
|
||
|
||
1. **Given** a clean database, **When** a full reset-and-seed workflow is executed, **Then** it succeeds without requiring manual edits.
|
||
2. **Given** seeded tenants exist, **When** their records are validated, **Then** required identifiers are populated, and `external_id` is a UUID v4 string.
|
||
|
||
### Edge Cases
|
||
|
||
- Retries: If a job attempt fails and retries, summary counters must remain consistent (no double counting).
|
||
- Partial failure: If some items fail, the run must record failures without obscuring successful processing.
|
||
- Deduplication window: Dedupe should block concurrent overlap but must not prevent legitimate future runs after the window.
|
||
- Error reporting: Failure messages must be sanitized and stable enough to support searching and alerting.
|
||
|
||
## Requirements *(mandatory)*
|
||
|
||
**Constitution alignment (required):** This feature touches queued/background work and run observability. It must:
|
||
- maintain tenant isolation (no cross-tenant leakage in run tracking),
|
||
- ensure run observability is complete for the jobs in scope,
|
||
- and add automated tests for the changed operational behavior.
|
||
|
||
**Constitution alignment (RBAC-UX):** No new UI surfaces are added and no authorization behavior is changed. Any existing authorization checks for starting operations remain server-side and unchanged.
|
||
|
||
**Constitution alignment (Filament Action Surfaces):** Not applicable (no Filament Resources/Pages/RelationManagers are added or modified).
|
||
|
||
### Functional Requirements
|
||
|
||
- **FR-001**: The system MUST persist summary counters (total, processed, failed) for assignment fetch runs upon completion.
|
||
- **FR-002**: The system MUST persist summary counters (total, processed, failed) for assignment restore runs upon completion.
|
||
- **FR-003**: Summary counters MUST be idempotent-friendly: on completion (success or terminal failure), the run summary counters MUST reflect the final attempt and retries MUST NOT double-count totals for the same run identity.
|
||
- **FR-004**: Assignment jobs MUST enforce server-side deduplication by a stable, non-secret job identity to prevent concurrent overlap.
|
||
- **FR-004a**: The job identity MUST be derived as: `operation_run_id` when available; otherwise `tenant_id + job_type + stable input fingerprint`.
|
||
- **FR-004b**: The stable input fingerprint MUST be deterministic and MUST NOT include secrets.
|
||
- **FR-005**: The deduplication duration MUST cover expected worst-case runtime while allowing legitimate future executions after it expires.
|
||
- **FR-005a**: The deduplication duration MUST be 15 minutes.
|
||
- **FR-006**: Deduplication MUST be enforced at execution time (not solely by UI gating or caller-side checks).
|
||
- **FR-007**: The reconcile-adapter-runs housekeeping job MUST create/update an operational run record each time it executes.
|
||
- **FR-007a**: The reconcile-adapter-runs housekeeping job MUST use `OperationRun.type = ops.reconcile_adapter_runs`.
|
||
- **FR-008**: On failure, the housekeeping run record MUST include a stable reason code and a sanitized operator-facing error message.
|
||
- **FR-009**: The seed workflow MUST populate required tenant identifiers so database constraints are satisfied.
|
||
- **FR-009a**: Seeded tenants MUST have `external_id` populated as a UUID string (v4).
|
||
- **FR-010**: A clean reset-and-seed workflow MUST succeed without manual intervention.
|
||
|
||
### Assumptions & Dependencies
|
||
|
||
- The system already has a durable operations ledger concept (run tracking) that can store outcomes and summary counters.
|
||
- Assignment fetch/restore jobs already produce item-level totals (or can derive them) without changing business behavior.
|
||
- Dedupe is evaluated based on a stable identity that is safe to store and log (no secrets).
|
||
- No new UI, routes, or end-user workflows are introduced by this work.
|
||
|
||
### Key Entities *(include if feature involves data)*
|
||
|
||
- **Operation Run**: A durable record of a background operation execution, its outcome, and its summary counters.
|
||
- **Job Identity**: A stable identifier used to deduplicate concurrent executions of the same logical work.
|
||
- **Tenant**: The scope boundary for operational records and seeded data.
|
||
|
||
## Success Criteria *(mandatory)*
|
||
|
||
### Measurable Outcomes
|
||
|
||
- **SC-001**: 100% of completed assignment fetch and restore runs show persisted summary counters (total/processed/failed) in the operations ledger.
|
||
- **SC-002**: Duplicate dispatch attempts for the same assignment job identity result in at most one concurrent execution within the deduplication window.
|
||
- **SC-003**: 100% of reconcile-adapter-runs executions produce an operational run record with a success or failure outcome.
|
||
- **SC-004**: A clean reset-and-seed workflow completes successfully in CI and locally without database constraint failures.
|