TenantAtlas/specs/116-baseline-drift-engine/data-model.md
2026-03-02 02:08:28 +01:00

4.0 KiB
Raw Blame History

Phase 1 — Data Model (Baseline Drift Engine)

This document identifies the data/entities involved in Spec 116 and the minimal schema/config changes needed to implement it in this repository.

Existing Entities (Confirmed)

BaselineProfile

Represents a baseline definition.

  • Fields (expected): id, name, description, scope (jsonb), created_by, timestamps
  • Relationships: has many snapshots; assigned to tenants via BaselineTenantAssignment

BaselineSnapshot

Immutable capture of baseline state at a point in time.

  • Fields (expected): id, baseline_profile_id, captured_at, status, operation_run_id, timestamps
  • Relationships: has many items; belongs to baseline profile

BaselineSnapshotItem

One item in a baseline snapshot.

  • Fields (expected):
    • id, baseline_snapshot_id
    • policy_type
    • external_id
    • subject_json (jsonb) or subject fields
    • baseline_hash (string)
    • meta_jsonb (jsonb)
    • timestamps

Finding

Generic drift finding storage.

  • Fields (confirmed by usage): tenant_id, fingerprint (unique with tenant), recurrence_key (nullable), scope_key, lifecycle fields (first_seen_at, last_seen_at, times_seen), evidence (jsonb)

OperationRun

Tracks long-running operations.

  • Fields (by convention): type, status/outcome, summary_counts (numeric map), context (jsonb)

New / Adjusted Data Requirements

1) Inventory sync coverage context

Goal: Baseline compare must know which policy types were actually processed successfully by inventory sync.

Where: operation_runs.context for the latest inventory sync run.

Shape (proposed):

{
  "inventory": {
    "coverage": {
      "policy_types": {
        "deviceConfigurations": {"status": "succeeded", "item_count": 123},
        "compliancePolicies": {"status": "failed", "error": "..."}
      },
      "foundation_types": {
        "securityBaselines": {"status": "succeeded", "item_count": 4}
      }
    }
  }
}

Notes:

  • Only summary_counts must remain numeric; detailed coverage lists live in context.
  • For Spec 116 v1, its sufficient to store policy_types coverage; adding foundation_types coverage at the same time keeps parity with scope rules.

2) Baseline scope schema

Goal: Support both policy and foundation scope with correct defaults.

Current: policy_types only.

Target:

{
  "policy_types": ["deviceConfigurations", "compliancePolicies"],
  "foundation_types": ["securityBaselines"]
}

Default semantics:

  • Empty policy_types means “all supported policy types excluding foundations”.
  • Empty foundation_types means “none”.

3) Findings recurrence strategy

Goal: Stable identity per snapshot and per subject.

  • findings.recurrence_key: populated for baseline compare findings.
  • findings.fingerprint: set to the same recurrence key (to satisfy existing uniqueness constraint).

Recurrence key inputs:

  • tenant_id
  • baseline_snapshot_id
  • policy_type
  • subject_external_id
  • change_type

Grouping (scope_key):

  • Keep findings.scope_key = baseline_profile:{baselineProfileId} for baseline compare findings.

4) Inventory meta contract

Goal: Explicitly define what is hashed for v1 comparisons.

  • Implemented as a dedicated builder class (no schema change required).
  • Used by baseline capture to compute baseline_hash and by compare to compute current_hash.

Potential Migrations (Likely)

  • If baseline_profiles.scope is not jsonb or does not include foundation types → migration to adjust structure (jsonb stays the same, but add support in code; DB change may be optional).
  • If coverage context needs persistence beyond operation run context → avoid adding tables unless proven necessary; context-based is sufficient for v1.

Index / Performance Notes

  • Findings queries commonly filter by tenant_id + scope_key; ensure there is an index on (tenant_id, scope_key).
  • Baseline snapshot items must be efficiently loaded by (baseline_snapshot_id, policy_type).