Implements Spec 118 baseline drift engine improvements: - Resumable, budget-aware evidence capture for baseline capture/compare runs (resume token + UI action) - “Why no findings?” reason-code driven explanations and richer run context panels - Baseline Snapshot resource (list/detail) with fidelity visibility - Retention command + schedule for pruning baseline-purpose PolicyVersions - i18n strings for Baseline Compare landing Verification: - `vendor/bin/sail bin pint --dirty --format agent` - `vendor/bin/sail artisan test --compact --filter=Baseline` (159 passed) Note: - `docs/audits/redaction-audit-2026-03-04.md` left untracked (not part of PR). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #143
6.9 KiB
6.9 KiB
Data Model — Spec 118 Golden Master Deep Drift v2
This document describes the data shapes required to implement full-content baseline capture/compare with quota-aware, resumable evidence capture.
Spec reference: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md
Entities (existing)
baseline_profiles (workspace-owned)
- Purpose: defines baseline name, scope, and (new) capture mode.
- Current fields (from repo):
id,workspace_id,name,description,version_label,statusscope_jsonbactive_snapshot_idcreated_by_user_id
baseline_snapshots (workspace-owned)
- Purpose: immutable baseline snapshot, deduped by a snapshot identity hash.
- Current fields:
id,workspace_id,baseline_profile_idsnapshot_identity_hash(sha256 string)captured_atsummary_jsonb
baseline_snapshot_items (workspace-owned; no tenant identifiers)
- Purpose: per-subject baseline evidence for drift evaluation.
- Current fields:
baseline_snapshot_idsubject_type(currentlypolicy)subject_external_id(legacy column name; MUST NOT store tenant external IDs in Spec 118 flows)policy_typebaseline_hash(fingerprint)meta_jsonb(metadata + provenance)
policy_versions (tenant-owned evidence)
- Purpose: immutable captured policy content with assignments/scope tags and hashes, used as content-fidelity evidence.
- Current fields (selected):
tenant_id,policy_id,policy_type,platformcaptured_atsnapshot,metadata,assignments,scope_tagsassignments_hash,scope_tags_hash
operation_runs (tenant-owned operational record)
- Purpose: observable lifecycle for capture/compare operations;
summary_countsis numeric-only and key-whitelisted; diagnostics go incontext.
findings (tenant-owned drift outcomes)
- Purpose: drift findings produced by compare; recurrence/lifecycle fields already exist in the repo (incl.
recurrence_key).
Proposed changes (Spec 118)
1) BaselineProfile: add capture mode
Add column: baseline_profiles.capture_mode (string)
- Allowed values:
meta_only | opportunistic | full_content - Default:
opportunistic(maintains current behavior unless explicitly enabled) - Validation: only allow known values
2) Baseline snapshot item: introduce a cross-tenant subject key
Add column: baseline_snapshot_items.subject_key (string)
- Meaning: cross-tenant match key for a subject:
normalized_display_name - Normalization rules: trim, collapse internal whitespace, lowercase
- Index:
index(baseline_snapshot_id, policy_type, subject_key)
Notes:
- Workspace-owned snapshot items MUST NOT persist tenant identifiers. In Spec 118 flows:
baseline_snapshot_items.subject_external_idis treated as an opaque, workspace-safe subject id derived frompolicy_type + subject_key(e.g.sha256(policy_type|subject_key)), solely to satisfy existing uniqueness/lookup needs.- Tenant-specific external IDs remain tenant-scoped and live only in tenant-owned tables (
policies,inventory_items,policy_versions) and in tenant-scopedoperation_runs.context.
meta_jsonbstored on snapshot items MUST be baseline-safe (no tenant external IDs, no operation run IDs, no policy version IDs). It should include only cross-tenant metadata likedisplay_name,policy_type, and a fidelity indicator (contentvsmeta).- Duplicate/ambiguous
subject_keyvalues within the same policy type are treated as evidence gaps and are not evaluated for drift.
3) PolicyVersion: purpose tagging + traceability
Add columns (all nullable except purpose):
policy_versions.capture_purpose(string)- Allowed:
backup | baseline_capture | baseline_compare - Default for existing rows:
backup(or null → treated asbackupat read time; exact backfill strategy documented in migration plan)
- Allowed:
policy_versions.operation_run_id(unsigned bigint, nullable) → FK tooperation_runs.idpolicy_versions.baseline_profile_id(unsigned bigint, nullable) → FK tobaseline_profiles.id
Indexes (for audit/debug + idempotency checks):
(tenant_id, policy_id, capture_purpose, captured_at desc)(tenant_id, capture_purpose, operation_run_id)(tenant_id, capture_purpose, baseline_profile_id)
Retention:
- Baseline-purpose evidence is eligible for shorter retention (configurable) than long-term backup evidence.
4) OperationRun context: baseline capture/compare contract
Baseline runs should populate operation_runs.context with stable, operator-facing keys:
{
"target_scope": {
"entra_tenant_id": "...",
"entra_tenant_name": "...",
"directory_context_id": "..."
},
"baseline_profile_id": 123,
"baseline_snapshot_id": 456,
"capture_mode": "full_content",
"effective_scope": {
"policy_types": ["..."],
"foundation_types": ["..."],
"all_types": ["..."]
},
"baseline_capture": {
"subjects_total": 500,
"evidence_capture": {
"requested": 200,
"succeeded": 180,
"skipped": 10,
"failed": 10,
"throttled": 0
},
"gaps": {
"count": 25,
"top_reasons": ["forbidden", "throttled", "ambiguous_match"]
},
"resume_token": "opaque_token_string"
},
"baseline_compare": {
"inventory_sync_run_id": 999,
"since": "2026-03-03T09:00:00Z",
"coverage": {
"proof": true,
"effective_types": ["..."],
"covered_types": ["..."],
"uncovered_types": ["..."]
},
"fidelity": "content|meta|mixed",
"evidence_capture": {
"requested": 200,
"succeeded": 180,
"skipped": 10,
"failed": 10,
"throttled": 0
},
"evidence_gaps": {
"missing_current": 20,
"ambiguous_match": 3
},
"reason_code": "no_subjects_in_scope|coverage_unproven|evidence_capture_incomplete|rollout_disabled|no_drift_detected|..."
}
}
Notes:
target_scopeis required for Monitoring UI (“Target” display).- Rich diagnostics remain in
context;summary_countsstays within the numeric key whitelist.
Migration strategy
- Add
baseline_profiles.capture_mode. - Add
baseline_snapshot_items.subject_key+ index. - Add
policy_versions.capture_purpose,operation_run_id,baseline_profile_id+ indexes. - Backfill strategy:
- Existing
policy_versionsrows: setcapture_purpose = backup(or treat null as backup in code until backfill finishes). - Existing baseline snapshot items: set
subject_keyfrom storedmeta_jsonb.display_namewhen available (else empty; treated as gap in new logic).
- Existing
Validation rules
capture_modemust be one of:meta_only,opportunistic,full_content.subject_keymust be non-empty to be eligible for drift evaluation.- For full-content capture mode:
- Capture/compare runs must record evidence capture stats and gaps.
- Compare must not emit “missing policy” findings for uncovered policy types.