TenantAtlas/specs/119-baseline-drift-engine/research.md

# Research — Drift Golden Master Cutover (Spec 119)

This document resolves planning unknowns and records implementation decisions for making Baseline Compare the single source of truth for drift findings while preserving the existing diff UI.

## Decisions

### 1) Golden-master drift source
- Decision: All drift findings generated by Baseline Compare will use `findings.source = baseline.compare`.
- Rationale: This is the single “origin label” used across the spec and is already set in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CompareBaselineToTenantJob.php` when upserting findings.
- Alternatives considered:
  - Keep `source` nullable / optional → rejected because it enables mixed states and breaks the single-source contract.

### 2) Drift navigation entry point (post-cutover)
- Decision: The Drift navigation entry point becomes the Baseline Compare landing page (`/admin/t/{tenant}/baseline-compare-landing`).
- Rationale: This preserves a single operational entry point for drift generation and reduces duplicated UI “landing” surfaces.
- Alternatives considered:
  - Keep a separate Drift landing page and repurpose it → rejected (extra surface to maintain and re-explain).

### 3) Evidence contract for diff UX compatibility
- Decision: Baseline Compare drift findings will write `evidence_jsonb` keys required by the existing diff renderer:
  - `summary.kind` with allowed values: `policy_snapshot`, `policy_assignments`, `policy_scope_tags`
  - `baseline.policy_version_id` and `current.policy_version_id` when content evidence exists
  - Explicit fidelity labeling + explicit compare provenance (baseline profile/snapshot + compare run id + inventory sync run id when available)
- Rationale: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/FindingResource.php` uses `summary.kind` to decide which diff UI to render and reads the policy version IDs from `baseline.policy_version_id` and `current.policy_version_id`.
- Alternatives considered:
  - Introduce a new diff UI for Baseline Compare evidence → rejected (scope; requires new UI + new contract).

### 4) Diff renderability rule (avoid misleading empty diffs)
- Decision: Only render a detailed diff when both `baseline.policy_version_id` and `current.policy_version_id` are present; otherwise show “diff unavailable”.
- Rationale: The diff builder can otherwise compare empty/null versions and display misleading results; the spec requires an explicit “diff unavailable” explanation.
- Alternatives considered:
  - Render diffs even when one side is missing → rejected (misleading output; violates clarified rule).

### 8) One-sided diff rendering for policy presence changes
- Decision: Render diffs against an empty side for `missing_policy` (baseline-only reference) and `unexpected_policy` (current-only reference). Keep the stricter two-reference rule only for `different_version`.
- Rationale: Policy presence changes are easier to understand when operators can inspect the captured policy content that exists, instead of receiving a generic “diff unavailable” message.
- Alternatives considered:
  - Keep treating all single-reference findings as non-renderable → rejected (hides useful evidence even when one side is fully captured).

### 9) Baseline capture must ignore stale inventory rows
- Decision: When a latest completed Inventory Sync exists, Baseline Snapshot capture scopes `inventory_items` to that run before deriving `subject_key` matches.
- Rationale: Capture and Compare must agree on the same “current observed state” boundary; otherwise deleted/renamed policies from older syncs can create false `ambiguous_match` gaps and omit valid baseline subjects.
- Alternatives considered:
  - Continue scanning all tenant inventory rows during capture → rejected (nondeterministic snapshot gaps as historical rows accumulate).
  - Hard-fail capture when no completed Inventory Sync exists → deferred (larger product behavior change than this fix; current fallback remains acceptable).

### 10) Full-content compare must reuse same-run deduplicated evidence
- Decision: When the compare-time content capture fetches current policy content successfully but reuses an older identical `policy_version` row instead of inserting a new one, the compare run will consume that returned version directly as current evidence for the run.
- Rationale: The capture step has already validated current Graph content. Re-querying only by `captured_at >= snapshot.captured_at` misclassifies these successful deduplicated captures as `missing_current`, which incorrectly downgrades fidelity and emits `evidence_capture_incomplete`.
- Alternatives considered:
  - Always insert a new `policy_version` row per compare run → rejected (breaks immutable dedupe strategy and inflates storage).
  - Keep relying only on the post-capture `since` query → rejected (produces false partial-success outcomes when content is unchanged).

### 11) Landing-page duplicate warnings must use the latest sync boundary
- Decision: The Baseline Compare landing-page duplicate-name warning uses the latest completed Inventory Sync run when one exists, matching compare/capture subject selection.
- Rationale: Operators should not keep seeing a duplicate-name warning after the duplicate only survives in stale historical inventory rows; the landing page must reflect the same current boundary as the underlying compare logic.
- Alternatives considered:
  - Keep scanning all tenant inventory rows for the warning → rejected (UI keeps reporting already-resolved duplicates until stale rows are cleaned up out-of-band).

### 12) Compliance noncompliance actions belong in the policy drift signal
- Decision: `deviceCompliancePolicy.scheduledActionsForRule` participates in `policy_snapshot` drift through a canonical semantic projection of each configured action.
- Rationale: A compliance policy’s security effect depends on both the rule and its enforcement timeline/consequences. Changing `gracePeriodHours`, removing `retire`, or swapping notification templates changes governance behavior and must produce drift.
- Alternatives considered:
  - Ignore noncompliance actions entirely → rejected (false negatives on meaningful governance changes).
  - Hash the raw Graph array directly → rejected (opaque IDs and order churn would create false positives).

### 13) Expand the drift signal without forcing baseline recapture
- Decision: When baseline content provenance resolves to a tenant `policy_version`, Compare recomputes the effective baseline content hash from that immutable version instead of trusting only the stored snapshot hash.
- Rationale: Existing baseline snapshots were captured under older normalization semantics. Recomputing from the resolved baseline version keeps those snapshots comparable as the canonical drift signal expands, which avoids rollout-time false positives and avoids forcing operators to recapture unchanged baselines.
- Alternatives considered:
  - Require every tenant to recapture their baseline after signal changes → rejected (operationally brittle and easy to miss).
  - Keep comparing only the stored snapshot hash → rejected (old snapshots would flap as soon as the drift signal grows).

### 5) How policy version references are populated
- Decision:
  - Current-side `policy_version_id`: taken from content evidence (`ResolvedEvidence.meta.policy_version_id`) when content fidelity is used.
  - Baseline-side `policy_version_id`: resolved opportunistically for the same tenant policy when baseline-side evidence is content-based (e.g., via baseline-capture policy versions), otherwise set to null.
- Rationale: Baseline snapshots are workspace-owned and intentionally avoid persisting tenant-owned identifiers; the finding (tenant-owned) is the correct place to attach tenant-specific policy version references.
- Alternatives considered:
  - Persist baseline policy version IDs in baseline snapshots → rejected (violates scope/ownership model for workspace-owned snapshots).

### 6) Legacy drift findings deletion criteria
- Decision: One-time cleanup deletes drift findings where `source` is null or not equal to `baseline.compare` (scoped to `finding_type = drift`), and keeps `source = baseline.compare` rows.
- Rationale: Legacy drift generator rows often have `source = NULL`; this filter removes mixed evidence formats without risking Baseline Compare drift data.
- Alternatives considered:
  - Delete by “old evidence shape” heuristics only → rejected (brittle; source is the canonical differentiator post-cutover).

### 7) Legacy drift generator removal scope
- Decision: Remove legacy run-to-run drift generation end-to-end:
  - `GenerateDriftFindingsJob` + generator-only services
  - Drift landing UI surface that triggers legacy drift generation
  - Operation run type catalog entries and any related UI/widget/alert producer references
  - Legacy tests that assert drift generation dispatch/notifications
- Rationale: Hard cut means no dual-write/no feature flags; leaving legacy entry points risks reintroducing “two truths”.
- Alternatives considered:
  - Leave legacy components present but unreachable → rejected (dead code + drift risk).

## Notes / Repo Facts Used
- Baseline Compare upserts findings and already hard-sets `source = baseline.compare` in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CompareBaselineToTenantJob.php`.
- The existing diff UI reads:
  - `evidence_jsonb.summary.kind`
  - `evidence_jsonb.baseline.policy_version_id`
  - `evidence_jsonb.current.policy_version_id`
  in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/FindingResource.php`.
- Content evidence already carries `policy_version_id` in `ResolvedEvidence.meta` via `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Baselines/Evidence/ContentEvidenceProvider.php`.