TenantAtlas/specs/118-baseline-drift-engine/data-model.md

6.9 KiB

Data Model — Spec 118 Golden Master Deep Drift v2

This document describes the data shapes required to implement full-content baseline capture/compare with quota-aware, resumable evidence capture.

Spec reference: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md

Entities (existing)

baseline_profiles (workspace-owned)

  • Purpose: defines baseline name, scope, and (new) capture mode.
  • Current fields (from repo):
    • id, workspace_id, name, description, version_label, status
    • scope_jsonb
    • active_snapshot_id
    • created_by_user_id

baseline_snapshots (workspace-owned)

  • Purpose: immutable baseline snapshot, deduped by a snapshot identity hash.
  • Current fields:
    • id, workspace_id, baseline_profile_id
    • snapshot_identity_hash (sha256 string)
    • captured_at
    • summary_jsonb

baseline_snapshot_items (workspace-owned; no tenant identifiers)

  • Purpose: per-subject baseline evidence for drift evaluation.
  • Current fields:
    • baseline_snapshot_id
    • subject_type (currently policy)
    • subject_external_id (legacy column name; MUST NOT store tenant external IDs in Spec 118 flows)
    • policy_type
    • baseline_hash (fingerprint)
    • meta_jsonb (metadata + provenance)

policy_versions (tenant-owned evidence)

  • Purpose: immutable captured policy content with assignments/scope tags and hashes, used as content-fidelity evidence.
  • Current fields (selected):
    • tenant_id, policy_id, policy_type, platform
    • captured_at
    • snapshot, metadata, assignments, scope_tags
    • assignments_hash, scope_tags_hash

operation_runs (tenant-owned operational record)

  • Purpose: observable lifecycle for capture/compare operations; summary_counts is numeric-only and key-whitelisted; diagnostics go in context.

findings (tenant-owned drift outcomes)

  • Purpose: drift findings produced by compare; recurrence/lifecycle fields already exist in the repo (incl. recurrence_key).

Proposed changes (Spec 118)

1) BaselineProfile: add capture mode

Add column: baseline_profiles.capture_mode (string)

  • Allowed values: meta_only | opportunistic | full_content
  • Default: opportunistic (maintains current behavior unless explicitly enabled)
  • Validation: only allow known values

2) Baseline snapshot item: introduce a cross-tenant subject key

Add column: baseline_snapshot_items.subject_key (string)

  • Meaning: cross-tenant match key for a subject: normalized_display_name
  • Normalization rules: trim, collapse internal whitespace, lowercase
  • Index: index(baseline_snapshot_id, policy_type, subject_key)

Notes:

  • Workspace-owned snapshot items MUST NOT persist tenant identifiers. In Spec 118 flows:
    • baseline_snapshot_items.subject_external_id is treated as an opaque, workspace-safe subject id derived from policy_type + subject_key (e.g. sha256(policy_type|subject_key)), solely to satisfy existing uniqueness/lookup needs.
    • Tenant-specific external IDs remain tenant-scoped and live only in tenant-owned tables (policies, inventory_items, policy_versions) and in tenant-scoped operation_runs.context.
  • meta_jsonb stored on snapshot items MUST be baseline-safe (no tenant external IDs, no operation run IDs, no policy version IDs). It should include only cross-tenant metadata like display_name, policy_type, and a fidelity indicator (content vs meta).
  • Duplicate/ambiguous subject_key values within the same policy type are treated as evidence gaps and are not evaluated for drift.

3) PolicyVersion: purpose tagging + traceability

Add columns (all nullable except purpose):

  • policy_versions.capture_purpose (string)
    • Allowed: backup | baseline_capture | baseline_compare
    • Default for existing rows: backup (or null → treated as backup at read time; exact backfill strategy documented in migration plan)
  • policy_versions.operation_run_id (unsigned bigint, nullable) → FK to operation_runs.id
  • policy_versions.baseline_profile_id (unsigned bigint, nullable) → FK to baseline_profiles.id

Indexes (for audit/debug + idempotency checks):

  • (tenant_id, policy_id, capture_purpose, captured_at desc)
  • (tenant_id, capture_purpose, operation_run_id)
  • (tenant_id, capture_purpose, baseline_profile_id)

Retention:

  • Baseline-purpose evidence is eligible for shorter retention (configurable) than long-term backup evidence.

4) OperationRun context: baseline capture/compare contract

Baseline runs should populate operation_runs.context with stable, operator-facing keys:

{
  "target_scope": {
    "entra_tenant_id": "...",
    "entra_tenant_name": "...",
    "directory_context_id": "..."
  },
  "baseline_profile_id": 123,
  "baseline_snapshot_id": 456,
  "capture_mode": "full_content",
  "effective_scope": {
    "policy_types": ["..."],
    "foundation_types": ["..."],
    "all_types": ["..."]
  },
  "baseline_capture": {
    "subjects_total": 500,
    "evidence_capture": {
      "requested": 200,
      "succeeded": 180,
      "skipped": 10,
      "failed": 10,
      "throttled": 0
    },
    "gaps": {
      "count": 25,
      "top_reasons": ["forbidden", "throttled", "ambiguous_match"]
    },
    "resume_token": "opaque_token_string"
  },
  "baseline_compare": {
    "inventory_sync_run_id": 999,
    "since": "2026-03-03T09:00:00Z",
    "coverage": {
      "proof": true,
      "effective_types": ["..."],
      "covered_types": ["..."],
      "uncovered_types": ["..."]
    },
    "fidelity": "content|meta|mixed",
    "evidence_capture": {
      "requested": 200,
      "succeeded": 180,
      "skipped": 10,
      "failed": 10,
      "throttled": 0
    },
    "evidence_gaps": {
      "missing_current": 20,
      "ambiguous_match": 3
    },
    "reason_code": "no_subjects_in_scope|coverage_unproven|evidence_capture_incomplete|rollout_disabled|no_drift_detected|..."
  }
}

Notes:

  • target_scope is required for Monitoring UI (“Target” display).
  • Rich diagnostics remain in context; summary_counts stays within the numeric key whitelist.

Migration strategy

  1. Add baseline_profiles.capture_mode.
  2. Add baseline_snapshot_items.subject_key + index.
  3. Add policy_versions.capture_purpose, operation_run_id, baseline_profile_id + indexes.
  4. Backfill strategy:
    • Existing policy_versions rows: set capture_purpose = backup (or treat null as backup in code until backfill finishes).
    • Existing baseline snapshot items: set subject_key from stored meta_jsonb.display_name when available (else empty; treated as gap in new logic).

Validation rules

  • capture_mode must be one of: meta_only, opportunistic, full_content.
  • subject_key must be non-empty to be eligible for drift evaluation.
  • For full-content capture mode:
    • Capture/compare runs must record evidence capture stats and gaps.
    • Compare must not emit “missing policy” findings for uncovered policy types.