TenantAtlas/specs/096-ops-polish-assignment-dedupe-system-tracking/research.md

# Phase 0 — Research (096 Ops Polish Bundle)

This feature is an operations / background-job hardening pass. The design intentionally reuses existing run observability and dedupe primitives already present in the codebase.

## Decision 1 — Use `OperationRunService` + DB unique indexes for dedupe

**Decision:** Use `OperationRunService::ensureRunWithIdentity(...)` (tenant-scoped) and `OperationRunService::ensureWorkspaceRunWithIdentity(...)` (workspace-scoped) as the canonical dedupe mechanism, backed by the existing partial unique indexes for active runs.

**Rationale:**
- The repo already enforces “active-run dedupe MUST be enforced at DB level” via partial unique indexes and the `ensureRun*` helpers.
- DB enforcement remains correct under concurrency and across multiple workers.
- Keeps the single source of truth in `operation_runs` (Monitoring → Operations) rather than adding a second dedupe store.

**Alternatives considered:**
- Laravel job uniqueness (e.g., `ShouldBeUnique` / cache lock) — rejected because it introduces a second dedupe primitive outside the canonical `OperationRun` ledger and may behave differently across environments.
- Scheduler-level overlap prevention only — rejected because the spec requires dedupe at execution time and must handle duplicate dispatch / redelivery.

## Decision 2 — Dedupe identity rule (per spec clarifications)

**Decision:** Derive job identity as:
- Prefer `operation_run_id` when available.
- Otherwise `tenant_id + job_type + stable input fingerprint`.

**Rationale:**
- `operation_run_id` is already stable and non-secret.
- Fallback fingerprint avoids secrets, stays deterministic, and is suitable for both logging and DB identity hashing.

**Alternatives considered:**
- Fingerprinting full payloads — rejected to avoid secrets/PII and to keep dedupe stable even if non-essential context changes.

## Decision 3 — Enforce dedupe at execute time (not just dispatch)

**Decision:** Add an execution-time guard so a job skips early when it is not the canonical active run for its identity.

**Rationale:**
- Covers duplicate job dispatch/redelivery even if a caller fails to reuse the same `OperationRun` at dispatch time.
- Aligns with spec FR-006 and the constitution’s “queued/scheduled ops use locks + idempotency”.

**Alternatives considered:**
- Rely on dispatch-only dedupe — rejected because duplicate jobs can still be enqueued and run concurrently.

## Decision 4 — Summary counters use “final attempt wins” semantics

**Decision:** Persist `OperationRun.summary_counts` at terminal completion by overwriting with normalized counts (final attempt reflects truth; retries do not double count).

**Rationale:**
- Matches clarified requirement (“final attempt”) and avoids needing cross-attempt reconciliation.
- Fits existing patterns in `OperationRunService::updateRun(...)` which sanitizes/normalizes summary keys.

**Alternatives considered:**
- Incremental counters (`incrementSummaryCounts`) across attempts — rejected because it risks double-counting under retries unless attempt IDs are tracked.

## Decision 5 — Housekeeping job tracking is workspace-scoped

**Decision:** Track `ReconcileAdapterRunsJob` via a workspace-scoped `OperationRun` (`tenant_id = null`) using `type = ops.reconcile_adapter_runs`.

**Rationale:**
- The job is not tenant-specific and reconciles across runs; workspace-scoped runs are explicitly supported by the schema + service.

**Alternatives considered:**
- Create one run per tenant — rejected because it would misrepresent the job’s actual unit of work and inflate noise.

## Decision 6 — Seed tenant external ID must be UUID v4

**Decision:** Ensure the seed tenant’s `external_id` is generated as UUID v4 regardless of `INTUNE_TENANT_ID`.

**Rationale:**
- Matches clarified requirement and avoids coupling a human-readable env value to a UUID-constrained field.

**Alternatives considered:**
- Reuse `INTUNE_TENANT_ID` for `external_id` — rejected because it is not guaranteed UUID formatted.