TenantAtlas/specs/096-ops-polish-assignment-dedupe-system-tracking/research.md
ahmido 03127a670b Spec 096: Ops polish (assignment summaries + dedupe + reconcile tracking + seed DX) (#115)
Implements Spec 096 ops polish bundle:

- Persist durable OperationRun.summary_counts for assignment fetch/restore (final attempt wins)
- Server-side dedupe for assignment jobs (15-minute cooldown + non-canonical skip)
- Track ReconcileAdapterRunsJob via workspace-scoped OperationRun + stable failure codes + overlap prevention
- Seed DX: ensure seeded tenants use UUID v4 external_id and seed satisfies workspace_id NOT NULL constraints

Verification (local / evidence-based):
- `vendor/bin/sail artisan test --compact tests/Feature/Operations/AssignmentRunSummaryCountsTest.php tests/Feature/Operations/AssignmentJobDedupeTest.php tests/Feature/Operations/ReconcileAdapterRunsJobTrackingTest.php tests/Feature/Seed/PoliciesSeederExternalIdTest.php`
- `vendor/bin/sail bin pint --dirty`

Spec artifacts included under `specs/096-ops-polish-assignment-dedupe-system-tracking/` (spec/plan/tasks/checklists).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #115
2026-02-15 20:49:38 +00:00

72 lines
4.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 0 — Research (096 Ops Polish Bundle)
This feature is an operations / background-job hardening pass. The design intentionally reuses existing run observability and dedupe primitives already present in the codebase.
## Decision 1 — Use `OperationRunService` + DB unique indexes for dedupe
**Decision:** Use `OperationRunService::ensureRunWithIdentity(...)` (tenant-scoped) and `OperationRunService::ensureWorkspaceRunWithIdentity(...)` (workspace-scoped) as the canonical dedupe mechanism, backed by the existing partial unique indexes for active runs.
**Rationale:**
- The repo already enforces “active-run dedupe MUST be enforced at DB level” via partial unique indexes and the `ensureRun*` helpers.
- DB enforcement remains correct under concurrency and across multiple workers.
- Keeps the single source of truth in `operation_runs` (Monitoring → Operations) rather than adding a second dedupe store.
**Alternatives considered:**
- Laravel job uniqueness (e.g., `ShouldBeUnique` / cache lock) — rejected because it introduces a second dedupe primitive outside the canonical `OperationRun` ledger and may behave differently across environments.
- Scheduler-level overlap prevention only — rejected because the spec requires dedupe at execution time and must handle duplicate dispatch / redelivery.
## Decision 2 — Dedupe identity rule (per spec clarifications)
**Decision:** Derive job identity as:
- Prefer `operation_run_id` when available.
- Otherwise `tenant_id + job_type + stable input fingerprint`.
**Rationale:**
- `operation_run_id` is already stable and non-secret.
- Fallback fingerprint avoids secrets, stays deterministic, and is suitable for both logging and DB identity hashing.
**Alternatives considered:**
- Fingerprinting full payloads — rejected to avoid secrets/PII and to keep dedupe stable even if non-essential context changes.
## Decision 3 — Enforce dedupe at execute time (not just dispatch)
**Decision:** Add an execution-time guard so a job skips early when it is not the canonical active run for its identity.
**Rationale:**
- Covers duplicate job dispatch/redelivery even if a caller fails to reuse the same `OperationRun` at dispatch time.
- Aligns with spec FR-006 and the constitutions “queued/scheduled ops use locks + idempotency”.
**Alternatives considered:**
- Rely on dispatch-only dedupe — rejected because duplicate jobs can still be enqueued and run concurrently.
## Decision 4 — Summary counters use “final attempt wins” semantics
**Decision:** Persist `OperationRun.summary_counts` at terminal completion by overwriting with normalized counts (final attempt reflects truth; retries do not double count).
**Rationale:**
- Matches clarified requirement (“final attempt”) and avoids needing cross-attempt reconciliation.
- Fits existing patterns in `OperationRunService::updateRun(...)` which sanitizes/normalizes summary keys.
**Alternatives considered:**
- Incremental counters (`incrementSummaryCounts`) across attempts — rejected because it risks double-counting under retries unless attempt IDs are tracked.
## Decision 5 — Housekeeping job tracking is workspace-scoped
**Decision:** Track `ReconcileAdapterRunsJob` via a workspace-scoped `OperationRun` (`tenant_id = null`) using `type = ops.reconcile_adapter_runs`.
**Rationale:**
- The job is not tenant-specific and reconciles across runs; workspace-scoped runs are explicitly supported by the schema + service.
**Alternatives considered:**
- Create one run per tenant — rejected because it would misrepresent the jobs actual unit of work and inflate noise.
## Decision 6 — Seed tenant external ID must be UUID v4
**Decision:** Ensure the seed tenants `external_id` is generated as UUID v4 regardless of `INTUNE_TENANT_ID`.
**Rationale:**
- Matches clarified requirement and avoids coupling a human-readable env value to a UUID-constrained field.
**Alternatives considered:**
- Reuse `INTUNE_TENANT_ID` for `external_id` — rejected because it is not guaranteed UUID formatted.