TenantAtlas/specs/096-ops-polish-assignment-dedupe-system-tracking/spec.md
ahmido 03127a670b Spec 096: Ops polish (assignment summaries + dedupe + reconcile tracking + seed DX) (#115)
Implements Spec 096 ops polish bundle:

- Persist durable OperationRun.summary_counts for assignment fetch/restore (final attempt wins)
- Server-side dedupe for assignment jobs (15-minute cooldown + non-canonical skip)
- Track ReconcileAdapterRunsJob via workspace-scoped OperationRun + stable failure codes + overlap prevention
- Seed DX: ensure seeded tenants use UUID v4 external_id and seed satisfies workspace_id NOT NULL constraints

Verification (local / evidence-based):
- `vendor/bin/sail artisan test --compact tests/Feature/Operations/AssignmentRunSummaryCountsTest.php tests/Feature/Operations/AssignmentJobDedupeTest.php tests/Feature/Operations/ReconcileAdapterRunsJobTrackingTest.php tests/Feature/Seed/PoliciesSeederExternalIdTest.php`
- `vendor/bin/sail bin pint --dirty`

Spec artifacts included under `specs/096-ops-polish-assignment-dedupe-system-tracking/` (spec/plan/tasks/checklists).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #115
2026-02-15 20:49:38 +00:00

9.7 KiB
Raw Permalink Blame History

Feature Specification: 096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)

Feature Branch: 096-ops-polish-assignment-dedupe-system-tracking
Created: 2026-02-15
Status: Draft
Input: User description: "096 — Ops Polish Bundle (Assignment job summaries + job dedupe + system job tracking + seeder DX)"

Spec Scope Fields (mandatory)

  • Scope: tenant
  • Primary Routes: None (background operations only)
  • Data Ownership: tenant-owned operational records (run tracking + run outcomes) and tenant seed data
  • RBAC: No new permissions; no changes to user-facing authorization behavior (existing gates/policies remain the source of truth for starting operations)

Clarifications

Session 2026-02-15

  • Q: What is the dedupe identity rule for assignment jobs? → A: Use operation_run_id when available; otherwise dedupe by tenant_id + job_type + stable input fingerprint (no secrets).
  • Q: If an assignment job retries, how should OperationRun summary counters be persisted? → A: On completion (success or terminal failure), write/overwrite the runs counters so they reflect the final attempt.
  • Q: What deduplication duration should we enforce for assignment jobs? → A: 15 minutes.
  • Q: For seeded tenants, what format should tenants.external_id use? → A: UUID string (v4).
  • Q: What OperationRun.type should we use for ReconcileAdapterRunsJob tracking? → A: ops.reconcile_adapter_runs.

User Scenarios & Testing (mandatory)

User Story 1 - Assignment runs show durable summaries (Priority: P2)

As an operator, I want assignment-related background runs to persist consistent summary counters so that run outcomes can be audited reliably (especially under retries).

Why this priority: These jobs already run in production; missing summaries reduce observability and make incident triage slower.

Independent Test: Dispatch one assignment job run, then verify the recorded run summary includes total/processed/failed and remains correct after retry simulation.

Acceptance Scenarios:

  1. Given an assignment fetch run completes, When its run record is inspected, Then it includes non-null summary counters for total, processed, and failed.
  2. Given an assignment restore run completes, When its run record is inspected, Then it includes non-null summary counters for total, processed, and failed.
  3. Given an assignment run retries due to transient failure, When it ultimately completes, Then its summary counters do not double-count work across attempts.

User Story 2 - Duplicate dispatches do not overlap (Priority: P2)

As an operator, I want assignment jobs to be deduplicated by identity so accidental double dispatch (or queue redelivery) does not cause concurrent overlapping work.

Why this priority: Duplicate concurrency increases the risk of conflicting writes, rate limiting, and confusing operational outcomes.

Independent Test: Attempt to dispatch the same job identity twice and assert only one execution proceeds while the other is deduped/skipped.

Acceptance Scenarios:

  1. Given an assignment job with a stable identity is dispatched twice in a short window, When workers attempt to execute both, Then only one execution proceeds (no concurrent overlap) for that identity.
  2. Given an assignment job with a stable identity completed recently, When the same identity is dispatched again within the deduplication duration, Then the new execution is skipped/deduped.
  3. Given the dedupe duration has elapsed since the last terminal completion for an identity, When the same identity is legitimately run again, Then the new execution is allowed to proceed.

User Story 3 - Housekeeping runs are tracked like everything else (Priority: P3)

As an operator, I want housekeeping/system jobs to produce the same run tracking as other operations so that the operations ledger is complete.

Why this priority: This is operational completeness. It is not ship-blocking but improves consistency and reduces blind spots.

Independent Test: Execute a housekeeping run and verify a run record is created/updated with success/failure outcome details.

Acceptance Scenarios:

  1. Given the reconcile-adapter-runs job executes successfully, When the operations ledger is inspected, Then a run record exists with a success outcome.
  2. Given the reconcile-adapter-runs job fails, When the operations ledger is inspected, Then the run record includes a stable reason code and a sanitized error message suitable for operators.

User Story 4 - Fresh seed flows work without manual intervention (Priority: P3)

As a developer, I want local and CI seed flows to run successfully so that onboarding and test environments are reproducible.

Why this priority: Broken seeding slows development and increases setup variability.

Independent Test: Run a clean database reset with seeding and confirm it completes without constraint errors.

Acceptance Scenarios:

  1. Given a clean database, When a full reset-and-seed workflow is executed, Then it succeeds without requiring manual edits.
  2. Given seeded tenants exist, When their records are validated, Then required identifiers are populated, and external_id is a UUID v4 string.

Edge Cases

  • Retries: If a job attempt fails and retries, summary counters must remain consistent (no double counting).
  • Partial failure: If some items fail, the run must record failures without obscuring successful processing.
  • Deduplication window: Dedupe should block concurrent overlap but must not prevent legitimate future runs after the window.
  • Error reporting: Failure messages must be sanitized and stable enough to support searching and alerting.

Requirements (mandatory)

Constitution alignment (required): This feature touches queued/background work and run observability. It must:

  • maintain tenant isolation (no cross-tenant leakage in run tracking),
  • ensure run observability is complete for the jobs in scope,
  • and add automated tests for the changed operational behavior.

Constitution alignment (RBAC-UX): No new UI surfaces are added and no authorization behavior is changed. Any existing authorization checks for starting operations remain server-side and unchanged.

Constitution alignment (Filament Action Surfaces): Not applicable (no Filament Resources/Pages/RelationManagers are added or modified).

Functional Requirements

  • FR-001: The system MUST persist summary counters (total, processed, failed) for assignment fetch runs upon completion.
  • FR-002: The system MUST persist summary counters (total, processed, failed) for assignment restore runs upon completion.
  • FR-003: Summary counters MUST be idempotent-friendly: on completion (success or terminal failure), the run summary counters MUST reflect the final attempt and retries MUST NOT double-count totals for the same run identity.
  • FR-004: Assignment jobs MUST enforce server-side deduplication by a stable, non-secret job identity to prevent concurrent overlap.
  • FR-004a: The job identity MUST be derived as: operation_run_id when available; otherwise tenant_id + job_type + stable input fingerprint.
  • FR-004b: The stable input fingerprint MUST be deterministic and MUST NOT include secrets.
  • FR-005: The deduplication duration MUST cover expected worst-case runtime while allowing legitimate future executions after it expires.
  • FR-005a: The deduplication duration MUST be 15 minutes.
  • FR-006: Deduplication MUST be enforced at execution time (not solely by UI gating or caller-side checks).
  • FR-007: The reconcile-adapter-runs housekeeping job MUST create/update an operational run record each time it executes.
  • FR-007a: The reconcile-adapter-runs housekeeping job MUST use OperationRun.type = ops.reconcile_adapter_runs.
  • FR-008: On failure, the housekeeping run record MUST include a stable reason code and a sanitized operator-facing error message.
  • FR-009: The seed workflow MUST populate required tenant identifiers so database constraints are satisfied.
  • FR-009a: Seeded tenants MUST have external_id populated as a UUID string (v4).
  • FR-010: A clean reset-and-seed workflow MUST succeed without manual intervention.

Assumptions & Dependencies

  • The system already has a durable operations ledger concept (run tracking) that can store outcomes and summary counters.
  • Assignment fetch/restore jobs already produce item-level totals (or can derive them) without changing business behavior.
  • Dedupe is evaluated based on a stable identity that is safe to store and log (no secrets).
  • No new UI, routes, or end-user workflows are introduced by this work.

Key Entities (include if feature involves data)

  • Operation Run: A durable record of a background operation execution, its outcome, and its summary counters.
  • Job Identity: A stable identifier used to deduplicate concurrent executions of the same logical work.
  • Tenant: The scope boundary for operational records and seeded data.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: 100% of completed assignment fetch and restore runs show persisted summary counters (total/processed/failed) in the operations ledger.
  • SC-002: Duplicate dispatch attempts for the same assignment job identity result in at most one concurrent execution within the deduplication window.
  • SC-003: 100% of reconcile-adapter-runs executions produce an operational run record with a success or failure outcome.
  • SC-004: A clean reset-and-seed workflow completes successfully in CI and locally without database constraint failures.