TenantAtlas/specs/116-baseline-drift-engine/data-model.md

# Phase 1 — Data Model (Baseline Drift Engine)

This document identifies the data/entities involved in Spec 116 and the minimal schema/config changes needed to implement it in this repository.

## Existing Entities (Confirmed)

### BaselineProfile
Represents a baseline definition.

- Fields (confirmed in migrations): `id`, `workspace_id`, `name`, `description`, `version_label`, `status`, `scope_jsonb` (jsonb), `active_snapshot_id`, `created_by_user_id`, timestamps
- Relationships: has many snapshots; assigned to tenants via `BaselineTenantAssignment`

### BaselineSnapshot
Immutable capture of baseline state at a point in time.

- Fields (confirmed in migrations): `id`, `workspace_id`, `baseline_profile_id`, `snapshot_identity_hash`, `captured_at`, `summary_jsonb` (jsonb), timestamps
- Relationships: has many items; belongs to baseline profile

### BaselineSnapshotItem
One item in a baseline snapshot.

- Fields (confirmed in migrations):
  - `id`, `baseline_snapshot_id`
  - `subject_type`
  - `subject_external_id`
  - `policy_type`
  - `baseline_hash` (string)
  - `meta_jsonb` (jsonb)
  - timestamps

### Finding
Generic drift finding storage.

- Fields (confirmed by usage): `tenant_id`, `fingerprint` (unique with tenant), `recurrence_key` (nullable), `scope_key`, lifecycle fields (`first_seen_at`, `last_seen_at`, `times_seen`), evidence (jsonb)

### OperationRun
Tracks long-running operations.

- Fields (by convention): `type`, `status/outcome`, `summary_counts` (numeric map), `context` (jsonb)

## New / Adjusted Data Requirements

### 1) Inventory sync coverage context

**Goal:** Baseline compare must know which policy types were actually processed successfully by inventory sync.

**Where:** `operation_runs.context` for the latest inventory sync run.

**Shape (proposed):**

```json
{
  "inventory": {
    "coverage": {
      "policy_types": {
        "deviceConfigurations": {"status": "succeeded", "item_count": 123},
        "compliancePolicies": {"status": "failed", "error": "..."}
      },
      "foundation_types": {
        "securityBaselines": {"status": "succeeded", "item_count": 4}
      }
    }
  }
}
```

**Notes:**
- Only `summary_counts` must remain numeric; detailed coverage lists live in `context`.
- For Spec 116 v1, it’s sufficient to store `policy_types` coverage; adding `foundation_types` coverage at the same time keeps parity with scope rules.

### 2) Baseline scope schema

**Goal:** Support both policy and foundation scope with correct defaults.

**Current:** `policy_types` only.

**Target:**

```json
{
  "policy_types": ["deviceConfigurations", "compliancePolicies"],
  "foundation_types": ["securityBaselines"]
}
```

**Default semantics:**
- Empty `policy_types` means “all supported policy types excluding foundations”.
- Empty `foundation_types` means “none”.

### 3) Findings recurrence strategy

**Goal:** Stable identity per snapshot and per subject.

- `findings.recurrence_key`: populated for baseline compare findings.
- `findings.fingerprint`: set to the same recurrence key (to satisfy existing uniqueness constraint).

**Recurrence key inputs:**
- `tenant_id`
- `baseline_snapshot_id`
- `policy_type`
- `subject_external_id`
- `change_type`

**Grouping (scope_key):**
- Keep `findings.scope_key = baseline_profile:{baselineProfileId}` for baseline compare findings.

### 4) Inventory meta contract

**Goal:** Explicitly define what is hashed for v1 comparisons.

- Implemented as a dedicated builder class (no schema change required).
- Used by baseline capture to compute `baseline_hash` and by compare to compute `current_hash`.
- Persist the exact contract payload used for hashing to `baseline_snapshot_items.meta_jsonb.meta_contract` (versioned) for auditability/reproducibility.

## Potential Migrations (Likely)

- If `baseline_profiles.scope` is not jsonb or does not include foundation types → migration to adjust structure (jsonb stays the same, but add support in code; DB change may be optional).
- If coverage context needs persistence beyond operation run context → avoid adding tables unless proven necessary; context-based is sufficient for v1.

## Index / Performance Notes

- Findings queries commonly filter by `tenant_id` + `scope_key`; ensure there is an index on `(tenant_id, scope_key)`.
- Baseline snapshot items must be efficiently loaded by `(baseline_snapshot_id, policy_type)`.