TenantAtlas/specs/116-baseline-drift-engine/research.md

# Phase 0 — Research (Baseline Drift Engine)

This document resolves the open design/implementation questions needed to produce a concrete implementation plan for Spec 116, grounded in the current codebase.

## Repo Reality Check (What already exists)

- Baseline domain tables exist: `baseline_profiles`, `baseline_snapshots`, `baseline_snapshot_items`, `baseline_tenant_assignments`.
- Baseline ops exist:
  - Capture: `App\Services\Baselines\BaselineCaptureService` → `App\Jobs\CaptureBaselineSnapshotJob`.
  - Compare: `App\Services\Baselines\BaselineCompareService` → `App\Jobs\CompareBaselineToTenantJob`.
- Findings lifecycle primitives exist (times_seen/first_seen/last_seen) and recurrence support exists (`findings.recurrence_key`).
- An existing recurrence-based drift generator exists: `App\Services\Drift\DriftFindingGenerator` (uses `recurrence_key` and also sets `fingerprint = recurrence_key` to satisfy the unique constraint).
- Inventory sync is OperationRun-based and stamps `inventory_items.last_seen_operation_run_id`.

## Decisions

### 1) Finding identity for baseline compare

**Decision:** Baseline compare findings MUST use a stable recurrence key derived from:
- `tenant_id`
- `baseline_snapshot_id` (not baseline profile id)
- `policy_type`
- `subject_external_id`
- `change_type`

This recurrence key is stored in `findings.recurrence_key` and ALSO used as `findings.fingerprint` (to keep the existing unique constraint `unique(tenant_id, fingerprint)` effective).

**Rationale:**
- Matches Spec 116 (identity tied to `baseline_snapshot_id` and independent of evidence hashes).
- Aligns with existing, proven pattern in `DriftFindingGenerator` (recurrence_key-based upsert; fingerprint reused).

**Alternatives considered:**
- Keep `DriftHasher::fingerprint(...)` with baseline/current hashes included → rejected because it changes identity when evidence changes (violates FR-116v1-09).
- Add a new unique DB constraint on `(tenant_id, recurrence_key)` → possible later hardening; not required initially because fingerprint uniqueness already enforces dedupe when `fingerprint = recurrence_key`.

### 2) Scope key for baseline compare findings

**Decision:** Keep findings grouped by baseline profile using `scope_key = baseline_profile:{baselineProfileId}`.

**Rationale:**
- Spec 116 requires snapshot-scoped *identity* (via `baseline_snapshot_id` in the recurrence key), but does not require snapshot-scoped grouping.
- The repository already has UI widgets/stats and auto-close behavior keyed to `baseline_profile:{id}`; keeping scope_key stable minimizes churn and preserves existing semantics.
- Re-captures still create new finding identities because the recurrence key includes `baseline_snapshot_id`.

**Alternatives considered:**
- Snapshot-scoped `scope_key = baseline_snapshot:{id}` → rejected for v1 because it would require larger refactors to stats, widgets, and auto-close queries, without being mandated by the spec.

### 3) Coverage guard (prevent false missing policies)

**Decision:** Coverage MUST be derived from the latest completed `inventory_sync` OperationRun for the tenant:
- Record per-policy-type processing outcomes into that run’s context (coverage payload).
- Baseline compare MUST compute `uncovered_policy_types = effective_scope - covered_policy_types`.
- Baseline compare MUST emit **no findings of any kind** for uncovered policy types.
- The compare OperationRun outcome should be `partially_succeeded` when uncovered types exist ("completed with warnings" in Ops UX), and summary counts should include `errors_recorded = count(uncovered_policy_types)`.

**Rationale:**
- Spec FR-116v1-07 and SC-116-03.
- Current compare logic uses `inventory_items.last_seen_operation_run_id` filter only; without an explicit coverage list, a missing type looks identical to a truly empty tenant.

**Alternatives considered:**
- Infer coverage purely from "were there any inventory items for this policy type in the last sync run" → rejected because a legitimately empty type would be indistinguishable from "not synced".

### 4) Inventory meta contract hashing (v1 fidelity=meta)

**Decision:** Introduce an explicit "Inventory Meta Contract" builder used by BOTH capture and compare in v1.

- Inputs: `policy_type`, `external_id`, and a whitelist of stable signals from inventory/meta (etag, last modified, scope tags, assignment target count, version marker when available).
- Output: a normalized associative array, hashed deterministically.

**Rationale:**
- Spec FR-116v1-04: hashing must be based on a stable contract, not arbitrary meta.
- Current `BaselineSnapshotIdentity::hashItemContent()` hashes the entire `meta_jsonb` (including keys like `etag` which may be noisy and keys that may expand over time).

**Alternatives considered:**
- Keep current hashing of `meta_jsonb` → rejected because it is not an explicit contract and may drift as we add inventory metadata.

### 5) Baseline scope + foundations

**Decision:** Extend baseline scope JSON to include:
- `policy_types: []` (empty means default "all supported policy types excluding foundations")
- `foundation_types: []` (empty means default "none")

Foundations list must be derived from the same canonical foundation list used by inventory sync selection logic.

**Rationale:**
- Spec FR-116v1-01.
- Current `BaselineScope` only supports `policy_types` and treats empty as "all" (including foundations) which conflicts with the spec default.

### 6) v2 architecture strategy (content fidelity)

**Decision:** v2 is implemented as an extension of the same pipeline via a provider precedence chain:

`PolicyVersion (if available) → Inventory content (if available) → Meta contract fallback (degraded)`

The baseline compare engine stores dimension flags on the same finding (no additional finding identities).

**Rationale:**
- Spec FR-116v2-01 and FR-116v2-05.
- There is already a content-normalization + hashing stack in `DriftFindingGenerator` (policy snapshot / assignments / scope tags) which can inform the content fidelity provider.

## Notes / Risks

- Existing baseline compare findings are currently keyed by `fingerprint` that includes baseline/current hashes and uses `scope_key = baseline_profile:{id}`. The v1 migration should plan for “old findings become stale” behavior; do not attempt silent in-place identity rewriting without an explicit migration/backfill plan.
- Coverage persistence must remain numeric-only in `summary_counts` (per Ops-UX). Detailed coverage lists belong in `operation_runs.context`.