TenantAtlas/specs/117-baseline-drift-engine/data-model.md

# Data Model — Spec 117 Baseline Drift Engine

This document describes the data shapes required to implement deep settings drift via a provider chain and to satisfy provenance requirements (baseline + current).

## Entities (existing)

### `baseline_snapshots`

- Purpose: immutable reference snapshot for a baseline capture.
- Key fields (known from repo):
  - `id`
  - `captured_at` (timestamp; the “since” reference)
  - `baseline_profile_id` (profile reference)

### `baseline_snapshot_items`

- Purpose: per-subject snapshot item, stored without tenant identifiers.
- Fields (known from repo):
  - `baseline_snapshot_id`
  - `subject_type`
  - `subject_id`
  - `baseline_hash` (currently meta contract hash)
  - `meta_jsonb` (currently holds provenance-like info)

### `operation_runs`

- Purpose: operational lifecycle for queued capture/compare.
- Contract: summary counts are numeric-only and key-whitelisted; extended detail goes in `context`.

### `findings`

- Purpose: drift findings produced by compare.
- Current: uses `evidence_jsonb` for drift evidence shape.

## Proposed changes

### 1) Findings: add `evidence_fidelity`

**Add column**: `findings.evidence_fidelity` (string)
- Allowed values: `content`, `meta`
- Index: `index_findings_evidence_fidelity` (and/or composite with tenant/status if common)

**Why**: supports fast filtering and stable semantics, while provenance remains in JSON.

### 2) Evidence JSON shape: include provenance for both sides

Store under `findings.evidence_jsonb` (existing column) with a stable top-level shape:

```json
{
  "change_type": "created|updated|deleted|unchanged",
  "baseline": {
    "hash": "...",
    "provenance": {
      "fidelity": "content|meta",
      "source": "policy_version|inventory",
      "observed_at": "2026-03-02T10:11:12Z",
      "observed_operation_run_id": "uuid-or-int-or-null"
    }
  },
  "current": {
    "hash": "...",
    "provenance": {
      "fidelity": "content|meta",
      "source": "policy_version|inventory",
      "observed_at": "2026-03-02T10:11:12Z",
      "observed_operation_run_id": "uuid-or-int-or-null"
    }
  }
}
```

Notes:
- `source` is intentionally constrained to the two v1.5 sources.
- `observed_operation_run_id` is optional; include when available for traceability.

### 3) Baseline snapshot item provenance

Baseline capture should persist provenance for the baseline-side evidence:

- Continue storing `baseline_hash` on `baseline_snapshot_items`.
- Store baseline-side provenance in `baseline_snapshot_items.meta_jsonb` (existing) in a stable structure:

```json
{
  "evidence": {
    "fidelity": "content|meta",
    "source": "policy_version|inventory",
    "observed_at": "...",
    "observed_operation_run_id": "..."
  }
}
```

Notes:
- This does not add columns to snapshot items (keeps schema minimal).
- Snapshot items remain tenant-identifier-free.

### 4) Operation run context for compare coverage

Store compare coverage and evidence gaps in `operation_runs.context`:

```json
{
  "baseline_compare": {
    "since": "...baseline captured_at...",
    "coverage": {
      "subjects_total": 500,
      "resolved_total": 480,
      "resolved_content": 120,
      "resolved_meta": 360
    },
    "evidence_gaps": {
      "missing_baseline": 0,
      "missing_current": 20,
      "missing_both": 0
    }
  }
}
```

Notes:
- Keep this out of `summary_counts` due to key restrictions.

## Validation rules

- `evidence_fidelity` must be either `content` or `meta`.
- Findings must include both `baseline.provenance` and `current.provenance`.
- When no evidence exists for a subject (per spec), record evidence gap in run context and do not create a finding.

## Migration strategy

- Add a single migration to add `evidence_fidelity` to `findings` + backfill existing rows to `meta`.
- Keep backward compatibility for older findings by defaulting missing JSON paths to `meta`/`inventory` at render time (until backfill completes).