Implements Spec 118 baseline drift engine improvements: - Resumable, budget-aware evidence capture for baseline capture/compare runs (resume token + UI action) - “Why no findings?” reason-code driven explanations and richer run context panels - Baseline Snapshot resource (list/detail) with fidelity visibility - Retention command + schedule for pruning baseline-purpose PolicyVersions - i18n strings for Baseline Compare landing Verification: - `vendor/bin/sail bin pint --dirty --format agent` - `vendor/bin/sail artisan test --compact --filter=Baseline` (159 passed) Note: - `docs/audits/redaction-audit-2026-03-04.md` left untracked (not part of PR). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #143
179 lines
6.9 KiB
Markdown
179 lines
6.9 KiB
Markdown
# Data Model — Spec 118 Golden Master Deep Drift v2
|
|
|
|
This document describes the data shapes required to implement full-content baseline capture/compare with quota-aware, resumable evidence capture.
|
|
|
|
Spec reference: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md`
|
|
|
|
## Entities (existing)
|
|
|
|
### `baseline_profiles` (workspace-owned)
|
|
|
|
- Purpose: defines baseline name, scope, and (new) capture mode.
|
|
- Current fields (from repo):
|
|
- `id`, `workspace_id`, `name`, `description`, `version_label`, `status`
|
|
- `scope_jsonb`
|
|
- `active_snapshot_id`
|
|
- `created_by_user_id`
|
|
|
|
### `baseline_snapshots` (workspace-owned)
|
|
|
|
- Purpose: immutable baseline snapshot, deduped by a snapshot identity hash.
|
|
- Current fields:
|
|
- `id`, `workspace_id`, `baseline_profile_id`
|
|
- `snapshot_identity_hash` (sha256 string)
|
|
- `captured_at`
|
|
- `summary_jsonb`
|
|
|
|
### `baseline_snapshot_items` (workspace-owned; no tenant identifiers)
|
|
|
|
- Purpose: per-subject baseline evidence for drift evaluation.
|
|
- Current fields:
|
|
- `baseline_snapshot_id`
|
|
- `subject_type` (currently `policy`)
|
|
- `subject_external_id` (legacy column name; MUST NOT store tenant external IDs in Spec 118 flows)
|
|
- `policy_type`
|
|
- `baseline_hash` (fingerprint)
|
|
- `meta_jsonb` (metadata + provenance)
|
|
|
|
### `policy_versions` (tenant-owned evidence)
|
|
|
|
- Purpose: immutable captured policy content with assignments/scope tags and hashes, used as content-fidelity evidence.
|
|
- Current fields (selected):
|
|
- `tenant_id`, `policy_id`, `policy_type`, `platform`
|
|
- `captured_at`
|
|
- `snapshot`, `metadata`, `assignments`, `scope_tags`
|
|
- `assignments_hash`, `scope_tags_hash`
|
|
|
|
### `operation_runs` (tenant-owned operational record)
|
|
|
|
- Purpose: observable lifecycle for capture/compare operations; `summary_counts` is numeric-only and key-whitelisted; diagnostics go in `context`.
|
|
|
|
### `findings` (tenant-owned drift outcomes)
|
|
|
|
- Purpose: drift findings produced by compare; recurrence/lifecycle fields already exist in the repo (incl. `recurrence_key`).
|
|
|
|
## Proposed changes (Spec 118)
|
|
|
|
### 1) BaselineProfile: add capture mode
|
|
|
|
**Add column**: `baseline_profiles.capture_mode` (string)
|
|
|
|
- Allowed values: `meta_only | opportunistic | full_content`
|
|
- Default: `opportunistic` (maintains current behavior unless explicitly enabled)
|
|
- Validation: only allow known values
|
|
|
|
### 2) Baseline snapshot item: introduce a cross-tenant subject key
|
|
|
|
**Add column**: `baseline_snapshot_items.subject_key` (string)
|
|
|
|
- Meaning: cross-tenant match key for a subject: `normalized_display_name`
|
|
- Normalization rules: trim, collapse internal whitespace, lowercase
|
|
- Index: `index(baseline_snapshot_id, policy_type, subject_key)`
|
|
|
|
Notes:
|
|
- Workspace-owned snapshot items MUST NOT persist tenant identifiers. In Spec 118 flows:
|
|
- `baseline_snapshot_items.subject_external_id` is treated as an opaque, workspace-safe **subject id** derived from `policy_type + subject_key` (e.g. `sha256(policy_type|subject_key)`), solely to satisfy existing uniqueness/lookup needs.
|
|
- Tenant-specific external IDs remain tenant-scoped and live only in tenant-owned tables (`policies`, `inventory_items`, `policy_versions`) and in tenant-scoped `operation_runs.context`.
|
|
- `meta_jsonb` stored on snapshot items MUST be baseline-safe (no tenant external IDs, no operation run IDs, no policy version IDs). It should include only cross-tenant metadata like `display_name`, `policy_type`, and a fidelity indicator (`content` vs `meta`).
|
|
- Duplicate/ambiguous `subject_key` values within the same policy type are treated as evidence gaps and are not evaluated for drift.
|
|
|
|
### 3) PolicyVersion: purpose tagging + traceability
|
|
|
|
**Add columns** (all nullable except purpose):
|
|
|
|
- `policy_versions.capture_purpose` (string)
|
|
- Allowed: `backup | baseline_capture | baseline_compare`
|
|
- Default for existing rows: `backup` (or null → treated as `backup` at read time; exact backfill strategy documented in migration plan)
|
|
- `policy_versions.operation_run_id` (unsigned bigint, nullable) → FK to `operation_runs.id`
|
|
- `policy_versions.baseline_profile_id` (unsigned bigint, nullable) → FK to `baseline_profiles.id`
|
|
|
|
**Indexes** (for audit/debug + idempotency checks):
|
|
|
|
- `(tenant_id, policy_id, capture_purpose, captured_at desc)`
|
|
- `(tenant_id, capture_purpose, operation_run_id)`
|
|
- `(tenant_id, capture_purpose, baseline_profile_id)`
|
|
|
|
Retention:
|
|
- Baseline-purpose evidence is eligible for shorter retention (configurable) than long-term backup evidence.
|
|
|
|
### 4) OperationRun context: baseline capture/compare contract
|
|
|
|
Baseline runs should populate `operation_runs.context` with stable, operator-facing keys:
|
|
|
|
```json
|
|
{
|
|
"target_scope": {
|
|
"entra_tenant_id": "...",
|
|
"entra_tenant_name": "...",
|
|
"directory_context_id": "..."
|
|
},
|
|
"baseline_profile_id": 123,
|
|
"baseline_snapshot_id": 456,
|
|
"capture_mode": "full_content",
|
|
"effective_scope": {
|
|
"policy_types": ["..."],
|
|
"foundation_types": ["..."],
|
|
"all_types": ["..."]
|
|
},
|
|
"baseline_capture": {
|
|
"subjects_total": 500,
|
|
"evidence_capture": {
|
|
"requested": 200,
|
|
"succeeded": 180,
|
|
"skipped": 10,
|
|
"failed": 10,
|
|
"throttled": 0
|
|
},
|
|
"gaps": {
|
|
"count": 25,
|
|
"top_reasons": ["forbidden", "throttled", "ambiguous_match"]
|
|
},
|
|
"resume_token": "opaque_token_string"
|
|
},
|
|
"baseline_compare": {
|
|
"inventory_sync_run_id": 999,
|
|
"since": "2026-03-03T09:00:00Z",
|
|
"coverage": {
|
|
"proof": true,
|
|
"effective_types": ["..."],
|
|
"covered_types": ["..."],
|
|
"uncovered_types": ["..."]
|
|
},
|
|
"fidelity": "content|meta|mixed",
|
|
"evidence_capture": {
|
|
"requested": 200,
|
|
"succeeded": 180,
|
|
"skipped": 10,
|
|
"failed": 10,
|
|
"throttled": 0
|
|
},
|
|
"evidence_gaps": {
|
|
"missing_current": 20,
|
|
"ambiguous_match": 3
|
|
},
|
|
"reason_code": "no_subjects_in_scope|coverage_unproven|evidence_capture_incomplete|rollout_disabled|no_drift_detected|..."
|
|
}
|
|
}
|
|
```
|
|
|
|
Notes:
|
|
- `target_scope` is required for Monitoring UI (“Target” display).
|
|
- Rich diagnostics remain in `context`; `summary_counts` stays within the numeric key whitelist.
|
|
|
|
## Migration strategy
|
|
|
|
1) Add `baseline_profiles.capture_mode`.
|
|
2) Add `baseline_snapshot_items.subject_key` + index.
|
|
3) Add `policy_versions.capture_purpose`, `operation_run_id`, `baseline_profile_id` + indexes.
|
|
4) Backfill strategy:
|
|
- Existing `policy_versions` rows: set `capture_purpose = backup` (or treat null as backup in code until backfill finishes).
|
|
- Existing baseline snapshot items: set `subject_key` from stored `meta_jsonb.display_name` when available (else empty; treated as gap in new logic).
|
|
|
|
## Validation rules
|
|
|
|
- `capture_mode` must be one of: `meta_only`, `opportunistic`, `full_content`.
|
|
- `subject_key` must be non-empty to be eligible for drift evaluation.
|
|
- For full-content capture mode:
|
|
- Capture/compare runs must record evidence capture stats and gaps.
|
|
- Compare must not emit “missing policy” findings for uncovered policy types.
|