TenantAtlas/specs/096-ops-polish-assignment-dedupe-system-tracking/research.md
ahmido 03127a670b Spec 096: Ops polish (assignment summaries + dedupe + reconcile tracking + seed DX) (#115)
Implements Spec 096 ops polish bundle:

- Persist durable OperationRun.summary_counts for assignment fetch/restore (final attempt wins)
- Server-side dedupe for assignment jobs (15-minute cooldown + non-canonical skip)
- Track ReconcileAdapterRunsJob via workspace-scoped OperationRun + stable failure codes + overlap prevention
- Seed DX: ensure seeded tenants use UUID v4 external_id and seed satisfies workspace_id NOT NULL constraints

Verification (local / evidence-based):
- `vendor/bin/sail artisan test --compact tests/Feature/Operations/AssignmentRunSummaryCountsTest.php tests/Feature/Operations/AssignmentJobDedupeTest.php tests/Feature/Operations/ReconcileAdapterRunsJobTrackingTest.php tests/Feature/Seed/PoliciesSeederExternalIdTest.php`
- `vendor/bin/sail bin pint --dirty`

Spec artifacts included under `specs/096-ops-polish-assignment-dedupe-system-tracking/` (spec/plan/tasks/checklists).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #115
2026-02-15 20:49:38 +00:00

4.0 KiB
Raw Blame History

Phase 0 — Research (096 Ops Polish Bundle)

This feature is an operations / background-job hardening pass. The design intentionally reuses existing run observability and dedupe primitives already present in the codebase.

Decision 1 — Use OperationRunService + DB unique indexes for dedupe

Decision: Use OperationRunService::ensureRunWithIdentity(...) (tenant-scoped) and OperationRunService::ensureWorkspaceRunWithIdentity(...) (workspace-scoped) as the canonical dedupe mechanism, backed by the existing partial unique indexes for active runs.

Rationale:

  • The repo already enforces “active-run dedupe MUST be enforced at DB level” via partial unique indexes and the ensureRun* helpers.
  • DB enforcement remains correct under concurrency and across multiple workers.
  • Keeps the single source of truth in operation_runs (Monitoring → Operations) rather than adding a second dedupe store.

Alternatives considered:

  • Laravel job uniqueness (e.g., ShouldBeUnique / cache lock) — rejected because it introduces a second dedupe primitive outside the canonical OperationRun ledger and may behave differently across environments.
  • Scheduler-level overlap prevention only — rejected because the spec requires dedupe at execution time and must handle duplicate dispatch / redelivery.

Decision 2 — Dedupe identity rule (per spec clarifications)

Decision: Derive job identity as:

  • Prefer operation_run_id when available.
  • Otherwise tenant_id + job_type + stable input fingerprint.

Rationale:

  • operation_run_id is already stable and non-secret.
  • Fallback fingerprint avoids secrets, stays deterministic, and is suitable for both logging and DB identity hashing.

Alternatives considered:

  • Fingerprinting full payloads — rejected to avoid secrets/PII and to keep dedupe stable even if non-essential context changes.

Decision 3 — Enforce dedupe at execute time (not just dispatch)

Decision: Add an execution-time guard so a job skips early when it is not the canonical active run for its identity.

Rationale:

  • Covers duplicate job dispatch/redelivery even if a caller fails to reuse the same OperationRun at dispatch time.
  • Aligns with spec FR-006 and the constitutions “queued/scheduled ops use locks + idempotency”.

Alternatives considered:

  • Rely on dispatch-only dedupe — rejected because duplicate jobs can still be enqueued and run concurrently.

Decision 4 — Summary counters use “final attempt wins” semantics

Decision: Persist OperationRun.summary_counts at terminal completion by overwriting with normalized counts (final attempt reflects truth; retries do not double count).

Rationale:

  • Matches clarified requirement (“final attempt”) and avoids needing cross-attempt reconciliation.
  • Fits existing patterns in OperationRunService::updateRun(...) which sanitizes/normalizes summary keys.

Alternatives considered:

  • Incremental counters (incrementSummaryCounts) across attempts — rejected because it risks double-counting under retries unless attempt IDs are tracked.

Decision 5 — Housekeeping job tracking is workspace-scoped

Decision: Track ReconcileAdapterRunsJob via a workspace-scoped OperationRun (tenant_id = null) using type = ops.reconcile_adapter_runs.

Rationale:

  • The job is not tenant-specific and reconciles across runs; workspace-scoped runs are explicitly supported by the schema + service.

Alternatives considered:

  • Create one run per tenant — rejected because it would misrepresent the jobs actual unit of work and inflate noise.

Decision 6 — Seed tenant external ID must be UUID v4

Decision: Ensure the seed tenants external_id is generated as UUID v4 regardless of INTUNE_TENANT_ID.

Rationale:

  • Matches clarified requirement and avoids coupling a human-readable env value to a UUID-constrained field.

Alternatives considered:

  • Reuse INTUNE_TENANT_ID for external_id — rejected because it is not guaranteed UUID formatted.