# Phase 1 — Data Model (Baseline Drift Engine) This document identifies the data/entities involved in Spec 116 and the minimal schema/config changes needed to implement it in this repository. ## Existing Entities (Confirmed) ### BaselineProfile Represents a baseline definition. - Fields (confirmed in migrations): `id`, `workspace_id`, `name`, `description`, `version_label`, `status`, `scope_jsonb` (jsonb), `active_snapshot_id`, `created_by_user_id`, timestamps - Relationships: has many snapshots; assigned to tenants via `BaselineTenantAssignment` ### BaselineSnapshot Immutable capture of baseline state at a point in time. - Fields (confirmed in migrations): `id`, `workspace_id`, `baseline_profile_id`, `snapshot_identity_hash`, `captured_at`, `summary_jsonb` (jsonb), timestamps - Relationships: has many items; belongs to baseline profile ### BaselineSnapshotItem One item in a baseline snapshot. - Fields (confirmed in migrations): - `id`, `baseline_snapshot_id` - `subject_type` - `subject_external_id` - `policy_type` - `baseline_hash` (string) - `meta_jsonb` (jsonb) - timestamps ### Finding Generic drift finding storage. - Fields (confirmed by usage): `tenant_id`, `fingerprint` (unique with tenant), `recurrence_key` (nullable), `scope_key`, lifecycle fields (`first_seen_at`, `last_seen_at`, `times_seen`), evidence (jsonb) ### OperationRun Tracks long-running operations. - Fields (by convention): `type`, `status/outcome`, `summary_counts` (numeric map), `context` (jsonb) ## New / Adjusted Data Requirements ### 1) Inventory sync coverage context **Goal:** Baseline compare must know which policy types were actually processed successfully by inventory sync. **Where:** `operation_runs.context` for the latest inventory sync run. **Shape (proposed):** ```json { "inventory": { "coverage": { "policy_types": { "deviceConfigurations": {"status": "succeeded", "item_count": 123}, "compliancePolicies": {"status": "failed", "error": "..."} }, "foundation_types": { "securityBaselines": {"status": "succeeded", "item_count": 4} } } } } ``` **Notes:** - Only `summary_counts` must remain numeric; detailed coverage lists live in `context`. - For Spec 116 v1, it’s sufficient to store `policy_types` coverage; adding `foundation_types` coverage at the same time keeps parity with scope rules. ### 2) Baseline scope schema **Goal:** Support both policy and foundation scope with correct defaults. **Current:** `policy_types` only. **Target:** ```json { "policy_types": ["deviceConfigurations", "compliancePolicies"], "foundation_types": ["securityBaselines"] } ``` **Default semantics:** - Empty `policy_types` means “all supported policy types excluding foundations”. - Empty `foundation_types` means “none”. ### 3) Findings recurrence strategy **Goal:** Stable identity per snapshot and per subject. - `findings.recurrence_key`: populated for baseline compare findings. - `findings.fingerprint`: set to the same recurrence key (to satisfy existing uniqueness constraint). **Recurrence key inputs:** - `tenant_id` - `baseline_snapshot_id` - `policy_type` - `subject_external_id` - `change_type` **Grouping (scope_key):** - Keep `findings.scope_key = baseline_profile:{baselineProfileId}` for baseline compare findings. ### 4) Inventory meta contract **Goal:** Explicitly define what is hashed for v1 comparisons. - Implemented as a dedicated builder class (no schema change required). - Used by baseline capture to compute `baseline_hash` and by compare to compute `current_hash`. - Persist the exact contract payload used for hashing to `baseline_snapshot_items.meta_jsonb.meta_contract` (versioned) for auditability/reproducibility. ## Potential Migrations (Likely) - If `baseline_profiles.scope` is not jsonb or does not include foundation types → migration to adjust structure (jsonb stays the same, but add support in code; DB change may be optional). - If coverage context needs persistence beyond operation run context → avoid adding tables unless proven necessary; context-based is sufficient for v1. ## Index / Performance Notes - Findings queries commonly filter by `tenant_id` + `scope_key`; ensure there is an index on `(tenant_id, scope_key)`. - Baseline snapshot items must be efficiently loaded by `(baseline_snapshot_id, policy_type)`.