# Phase 1 — Data Model

This is the proposed data model for **101 — Golden Master / Baseline Governance v1**.

## Ownership Model

- **Workspace-owned**
  - Baseline profiles
  - Baseline snapshots + snapshot items (data-minimized, reusable across tenants)

- **Tenant-owned**
  - Tenant assignments (joins a tenant to the workspace baseline standard)
  - Operation runs for capture/compare
  - Findings produced by compares

This follows constitution SCOPE-001 conventions: tenant-owned tables include `workspace_id` + `tenant_id` NOT NULL; workspace-owned tables include `workspace_id` and do not include `tenant_id`.

## Entities

## 1) baseline_profiles (workspace-owned)

**Purpose**: Defines the baseline (“what good looks like”) and its scope.

**Fields**
- `id` (pk)
- `workspace_id` (fk workspaces, NOT NULL)
- `name` (string, NOT NULL)
- `description` (text, nullable)
- `version_label` (string, nullable)
- `status` (string enum: `draft|active|archived`, NOT NULL)
- `scope_jsonb` (jsonb, NOT NULL)
  - v1 schema: `{ "policy_types": ["string", ...] }` — array of policy type keys from `InventoryPolicyTypeMeta`
  - An empty array means "all types" (no filtering); each string must be a known policy type key
  - Future versions may add additional filter dimensions (e.g., `platforms`, `tags`)
- `active_snapshot_id` (fk baseline_snapshots, nullable)
- `created_by_user_id` (fk users, nullable)
- timestamps

**Indexes/constraints**
- index: `(workspace_id, status)`
- uniqueness: `(workspace_id, name)` (optional but recommended if UI expects names unique per workspace)

**Validation notes**
- status transitions: `draft ↔ active → archived` (archived is terminal in v1; deactivate returns active → draft)

## 2) baseline_snapshots (workspace-owned)

**Purpose**: Immutable baseline snapshot captured from a tenant.

**Fields**
- `id` (pk)
- `workspace_id` (fk workspaces, NOT NULL)
- `baseline_profile_id` (fk baseline_profiles, NOT NULL)
- `snapshot_identity_hash` (string(64), NOT NULL)
  - sha256 of normalized captured content
- `captured_at` (timestamp tz, NOT NULL)
- `summary_jsonb` (jsonb, nullable)
  - counts/metadata (e.g., total items)
- timestamps

**Indexes/constraints**
- unique: `(workspace_id, baseline_profile_id, snapshot_identity_hash)` (dedupe)
- index: `(workspace_id, baseline_profile_id, captured_at desc)`

## 3) baseline_snapshot_items (workspace-owned)

**Purpose**: Immutable items within a snapshot.

**Fields**
- `id` (pk)
- `baseline_snapshot_id` (fk baseline_snapshots, NOT NULL)
- `subject_type` (string, NOT NULL) — e.g. `policy`
- `subject_external_id` (string, NOT NULL) — stable key for the policy within the tenant inventory
- `policy_type` (string, NOT NULL) — for filtering and summary
- `baseline_hash` (string(64), NOT NULL) — stable content hash for the baseline version
- `meta_jsonb` (jsonb, nullable) — minimized display metadata (no secrets)
- timestamps

**Indexes/constraints**
- unique: `(baseline_snapshot_id, subject_type, subject_external_id)`
- index: `(baseline_snapshot_id, policy_type)`

## 4) baseline_tenant_assignments (tenant-owned)

**Purpose**: Assigns exactly one baseline profile per tenant (v1), with optional scope override that can only narrow.

**Fields**
- `id` (pk)
- `workspace_id` (fk workspaces, NOT NULL)
- `tenant_id` (fk tenants, NOT NULL)
- `baseline_profile_id` (fk baseline_profiles, NOT NULL)
- `override_scope_jsonb` (jsonb, nullable)
- `assigned_by_user_id` (fk users, nullable)
- timestamps

**Indexes/constraints**
- unique: `(workspace_id, tenant_id)`
- index: `(workspace_id, baseline_profile_id)`

**Validation notes**
- Override must be subset of profile scope (enforced server-side; store the final effective scope hash in the compare run context for traceability).

## 5) findings additions (tenant-owned)

**Purpose**: Persist baseline compare drift findings using existing findings system.

**Required v1 additions**
- add `findings.source` (string, **nullable**, default `NULL`)
  - baseline compare uses `source='baseline.compare'`
  - existing findings receive `NULL`; legacy drift-generate findings are queried with `whereNull('source')` or unconditionally
- index: `(tenant_id, source)`

**Baseline compare finding conventions**
- `finding_type = 'drift'`
- `scope_key = 'baseline_profile:<id>'`
- `fingerprint = sha256(tenant_id|scope_key|subject_type|subject_external_id|change_type|baseline_hash|current_hash)` (using `DriftHasher`)
- `evidence_jsonb` includes:
  - `source` (until DB column exists)
  - `baseline.profile_id`, `baseline.snapshot_id`
  - `baseline.hash`, `current.hash`
  - `change_type` (missing_policy|different_version|unexpected_policy)
  - any minimized diff pointers required for UI

## 6) operation_runs conventions

**Operation types**
- `baseline_capture`
- `baseline_compare`

**Run context** (json)
- capture:
  - `baseline_profile_id`
  - `source_tenant_id`
  - `effective_scope` / `selection_hash`
- compare:
  - `baseline_profile_id`
  - `baseline_snapshot_id` (frozen at enqueue time)
  - `effective_scope` / `selection_hash`

**Summary counts**
- compare should set a compact breakdown for dashboards:
  - totals and severity breakdowns