4.0 KiB
Phase 0 — Research (096 Ops Polish Bundle)
This feature is an operations / background-job hardening pass. The design intentionally reuses existing run observability and dedupe primitives already present in the codebase.
Decision 1 — Use OperationRunService + DB unique indexes for dedupe
Decision: Use OperationRunService::ensureRunWithIdentity(...) (tenant-scoped) and OperationRunService::ensureWorkspaceRunWithIdentity(...) (workspace-scoped) as the canonical dedupe mechanism, backed by the existing partial unique indexes for active runs.
Rationale:
- The repo already enforces “active-run dedupe MUST be enforced at DB level” via partial unique indexes and the
ensureRun*helpers. - DB enforcement remains correct under concurrency and across multiple workers.
- Keeps the single source of truth in
operation_runs(Monitoring → Operations) rather than adding a second dedupe store.
Alternatives considered:
- Laravel job uniqueness (e.g.,
ShouldBeUnique/ cache lock) — rejected because it introduces a second dedupe primitive outside the canonicalOperationRunledger and may behave differently across environments. - Scheduler-level overlap prevention only — rejected because the spec requires dedupe at execution time and must handle duplicate dispatch / redelivery.
Decision 2 — Dedupe identity rule (per spec clarifications)
Decision: Derive job identity as:
- Prefer
operation_run_idwhen available. - Otherwise
tenant_id + job_type + stable input fingerprint.
Rationale:
operation_run_idis already stable and non-secret.- Fallback fingerprint avoids secrets, stays deterministic, and is suitable for both logging and DB identity hashing.
Alternatives considered:
- Fingerprinting full payloads — rejected to avoid secrets/PII and to keep dedupe stable even if non-essential context changes.
Decision 3 — Enforce dedupe at execute time (not just dispatch)
Decision: Add an execution-time guard so a job skips early when it is not the canonical active run for its identity.
Rationale:
- Covers duplicate job dispatch/redelivery even if a caller fails to reuse the same
OperationRunat dispatch time. - Aligns with spec FR-006 and the constitution’s “queued/scheduled ops use locks + idempotency”.
Alternatives considered:
- Rely on dispatch-only dedupe — rejected because duplicate jobs can still be enqueued and run concurrently.
Decision 4 — Summary counters use “final attempt wins” semantics
Decision: Persist OperationRun.summary_counts at terminal completion by overwriting with normalized counts (final attempt reflects truth; retries do not double count).
Rationale:
- Matches clarified requirement (“final attempt”) and avoids needing cross-attempt reconciliation.
- Fits existing patterns in
OperationRunService::updateRun(...)which sanitizes/normalizes summary keys.
Alternatives considered:
- Incremental counters (
incrementSummaryCounts) across attempts — rejected because it risks double-counting under retries unless attempt IDs are tracked.
Decision 5 — Housekeeping job tracking is workspace-scoped
Decision: Track ReconcileAdapterRunsJob via a workspace-scoped OperationRun (tenant_id = null) using type = ops.reconcile_adapter_runs.
Rationale:
- The job is not tenant-specific and reconciles across runs; workspace-scoped runs are explicitly supported by the schema + service.
Alternatives considered:
- Create one run per tenant — rejected because it would misrepresent the job’s actual unit of work and inflate noise.
Decision 6 — Seed tenant external ID must be UUID v4
Decision: Ensure the seed tenant’s external_id is generated as UUID v4 regardless of INTUNE_TENANT_ID.
Rationale:
- Matches clarified requirement and avoids coupling a human-readable env value to a UUID-constrained field.
Alternatives considered:
- Reuse
INTUNE_TENANT_IDforexternal_id— rejected because it is not guaranteed UUID formatted.