# Data Model — Spec 117 Baseline Drift Engine This document describes the data shapes required to implement deep settings drift via a provider chain and to satisfy provenance requirements (baseline + current). ## Entities (existing) ### `baseline_snapshots` - Purpose: immutable reference snapshot for a baseline capture. - Key fields (known from repo): - `id` - `captured_at` (timestamp; the “since” reference) - `baseline_profile_id` (profile reference) ### `baseline_snapshot_items` - Purpose: per-subject snapshot item, stored without tenant identifiers. - Fields (known from repo): - `baseline_snapshot_id` - `subject_type` - `subject_id` - `baseline_hash` (currently meta contract hash) - `meta_jsonb` (currently holds provenance-like info) ### `operation_runs` - Purpose: operational lifecycle for queued capture/compare. - Contract: summary counts are numeric-only and key-whitelisted; extended detail goes in `context`. ### `findings` - Purpose: drift findings produced by compare. - Current: uses `evidence_jsonb` for drift evidence shape. ## Proposed changes ### 1) Findings: add `evidence_fidelity` **Add column**: `findings.evidence_fidelity` (string) - Allowed values: `content`, `meta` - Index: `index_findings_evidence_fidelity` (and/or composite with tenant/status if common) **Why**: supports fast filtering and stable semantics, while provenance remains in JSON. ### 2) Evidence JSON shape: include provenance for both sides Store under `findings.evidence_jsonb` (existing column) with a stable top-level shape: ```json { "change_type": "created|updated|deleted|unchanged", "baseline": { "hash": "...", "provenance": { "fidelity": "content|meta", "source": "policy_version|inventory", "observed_at": "2026-03-02T10:11:12Z", "observed_operation_run_id": "uuid-or-int-or-null" } }, "current": { "hash": "...", "provenance": { "fidelity": "content|meta", "source": "policy_version|inventory", "observed_at": "2026-03-02T10:11:12Z", "observed_operation_run_id": "uuid-or-int-or-null" } } } ``` Notes: - `source` is intentionally constrained to the two v1.5 sources. - `observed_operation_run_id` is optional; include when available for traceability. ### 3) Baseline snapshot item provenance Baseline capture should persist provenance for the baseline-side evidence: - Continue storing `baseline_hash` on `baseline_snapshot_items`. - Store baseline-side provenance in `baseline_snapshot_items.meta_jsonb` (existing) in a stable structure: ```json { "evidence": { "fidelity": "content|meta", "source": "policy_version|inventory", "observed_at": "...", "observed_operation_run_id": "..." } } ``` Notes: - This does not add columns to snapshot items (keeps schema minimal). - Snapshot items remain tenant-identifier-free. ### 4) Operation run context for compare coverage Store compare coverage and evidence gaps in `operation_runs.context`: ```json { "baseline_compare": { "since": "...baseline captured_at...", "coverage": { "subjects_total": 500, "resolved_total": 480, "resolved_content": 120, "resolved_meta": 360 }, "evidence_gaps": { "missing_baseline": 0, "missing_current": 20, "missing_both": 0 } } } ``` Notes: - Keep this out of `summary_counts` due to key restrictions. ## Validation rules - `evidence_fidelity` must be either `content` or `meta`. - Findings must include both `baseline.provenance` and `current.provenance`. - When no evidence exists for a subject (per spec), record evidence gap in run context and do not create a finding. ## Migration strategy - Add a single migration to add `evidence_fidelity` to `findings` + backfill existing rows to `meta`. - Keep backward compatibility for older findings by defaulting missing JSON paths to `meta`/`inventory` at render time (until backfill completes).