TenantAtlas/specs/044-drift-mvp/research.md
2026-01-13 23:28:02 +01:00

3.3 KiB
Raw Blame History

Phase 0 Output: Research (044)

Decisions

1) scope_key reuse

  • Decision: Use the existing Inventory selection hash as scope_key.
    • Concretely: scope_key = InventorySyncRun.selection_hash.
  • Rationale:
    • Inventory already normalizes + hashes selection payload deterministically (via InventorySelectionHasher).
    • It is already used for concurrency/deduping inventory runs, so its the right stable scope identifier.
  • Alternatives considered:
    • Compute a second hash (duplicate of selection_hash) → adds drift without benefit.
    • Store the raw selection payload as the primary key → not stable without strict normalization.

2) Baseline selection (MVP)

  • Decision: Baseline run = previous successful inventory sync run for the same scope_key; comparison run = latest successful inventory sync run for the same scope_key.
  • Rationale:
    • Matches “run at least twice” scenario.
    • Deterministic and explainable.
  • Alternatives considered:
    • User-pinned baselines → valuable, but deferred (design must allow later via scope_key).

3) Persisted generic Findings

  • Decision: Persist Findings in a generic findings table.
  • Rationale:
    • Enables stable triage (acknowledged) without recomputation drift.
    • Reusable pipeline for Drift now, Audit/Compare later.
  • Alternatives considered:
    • Compute-on-demand and store only acknowledgements by fingerprint → harder operationally and can surprise users when diff rules evolve.

4) Generation trigger (MVP)

  • Decision: On opening Drift, if findings for (tenant, scope_key, baseline_run_id, current_run_id) do not exist, dispatch an async job to generate them.
  • Rationale:
    • Avoids long request times.
    • Avoids scheduled complexity in MVP.
  • Alternatives considered:
    • Generate after every inventory run → may be expensive; can be added later.
    • Nightly schedule → hides immediacy and complicates operations.

5) Fingerprint and state hashing

  • Decision: Use a deterministic fingerprint that changes when the underlying state changes.
    • Fingerprint = sha256(tenant_id + scope_key + subject_type + subject_external_id + change_type + baseline_hash + current_hash).
    • baseline_hash/current_hash are computed over normalized, sanitized comparison data (exclude volatile fields like timestamps).
  • Rationale:
    • Stable identity for triage and audit.
    • Supports future generators (audit/compare) using same semantics.
  • Alternatives considered:
    • Fingerprint without baseline/current hash → cannot distinguish changed vs unchanged findings.

6) Evidence minimization

  • Decision: Store small, sanitized evidence_jsonb with an allowlist shape; no raw payload dumps.
  • Rationale:
    • Aligns with data minimization + safe logging.
    • Avoids storing secrets/tokens.

7) Name resolution and Graph safety

  • Decision: UI resolves human-readable labels using DB-backed Inventory + Foundations (047) + Groups Cache (051). No render-time Graph calls.
  • Rationale:
    • Works offline / when tokens are broken.
    • Keeps UI safe and predictable.

Notes / Follow-ups for Phase 1

  • Define the findings table indexes carefully for tenant-scoped filtering (status, type, scope_key, run_ids).
  • Consider using existing observable run patterns (BulkOperationRun + AuditLogger) for drift generation jobs.