TenantAtlas/specs/117-baseline-drift-engine/spec.md

# Feature Specification: Golden Master Baseline Drift — Deep Settings Drift via Provider Chain

**Feature Branch**: `117-baseline-drift-engine`
**Created**: 2026-03-02
**Status**: Draft (ready for implementation)
**Input**: User description: "Spec 117 — Golden Master Baseline Drift: Deep Settings-Drift via PolicyVersion Provider Chain"

## Spec Scope Fields *(mandatory)*

- **Scope**: workspace (baseline definition + capture) + tenant (baseline compare monitoring)
- **Primary Routes**:
  - Workspace (admin): Baseline Profiles (capture baseline snapshot)
  - Tenant-context (admin): Baseline Compare runs (compare now, run detail) and Drift Findings landing
- **Data Ownership**:
  - Workspace-owned: baseline profiles, baseline snapshots, baseline snapshot items
  - Tenant-scoped (within a workspace): compare runs and drift findings produced by compare
  - Baseline snapshots are workspace-owned standards captured from a chosen tenant, but snapshot items MUST NOT persist tenant identifiers.
- **RBAC**: no new RBAC surfaces; uses existing baseline + findings capabilities
  - Workspace Baselines:
    - `workspace_baselines.view`
    - `workspace_baselines.manage`
  - Tenant Compare + Findings:
    - `tenant.sync`
    - `tenant_findings.view`
  - Tenant access is required for tenant-context surfaces, in addition to workspace membership

For canonical-view specs: not applicable (this is not a canonical-view feature).

## Clarifications

### Session 2026-03-02

- Q: Which baseline timestamp is the reference for the “since” rule? → A: Baseline snapshot captured time.
- Q: What should v1.5 do when neither content nor meta evidence exists for a subject? → A: Skip the subject and record it in run coverage/warnings (no drift finding for it).
- Q: If baseline and current have different fidelity, what fidelity should the finding badge/filter show? → A: Overall fidelity = the weaker of baseline/current.
- Q: Should a finding store/show provenance for both baseline and current evidence? → A: Yes, store/show both baseline and current evidence (each with fidelity, source, observed timestamp).

## Problem Statement

Golden Master baseline compare currently relies on a “meta-only” drift signal for many policy types. Changes that only affect policy settings (but do not materially change meta fields) frequently produce no drift finding, which makes the feature unreliable and undermines operator trust.

At the same time, the system already has a proven deep drift mechanism in other workflows, based on captured full policy content and a deterministic normalization + hashing pipeline.

This spec upgrades Golden Master baseline drift to use that same evidence layer whenever available, without introducing a second compare logic path.

## Goals (v1.5)

- Detect settings-level drift in Golden Master compares when suitable captured policy content exists.
- Keep compare read-only against existing evidence (no additional external data fetches during compare).
- Allow mixed fidelity (some types/content have deep evidence, others are meta-only), but make it transparent in findings and run detail.
- Guarantee “one engine”: compare does not contain per-type special-casing or duplicate hashing logic.

## Non-Goals (v1.5)

- No “always fetch full content” during compare.
- No new enrichment pipeline that captures full content as part of inventory.
- No unification of backup and Golden Master workflows; only the evidence/hash layer is shared.

## Architecture Decision

**ADR-117-01: Separate workflows, shared evidence layer**

- Backup/versioning and Golden Master baseline drift remain separate workflows (different triggers and scoping).
- Golden Master drift consumes a shared evidence layer that can provide the best available “current state” hash for a subject.
- Evidence is resolved via a provider chain in strict precedence order.

## Definitions

- **Subject key**: a policy identity independent of tenant, identified by policy type + external identifier.
- **Tenant subject**: a subject key within a tenant context, identified by tenant + policy type + external identifier.
- **Policy content version**: an immutable captured representation of a tenant subject’s policy content, with an observation timestamp.
- **Fidelity**:
  - **content**: drift signal derived from canonicalized policy content (deep / semantic)
  - **meta**: drift signal derived from a stable “inventory meta contract” (structural / signal-based)
- **Provider chain**: an ordered resolver that returns the first available policy state evidence for a tenant subject.

## Assumptions

- For v1.5, content-fidelity evidence becomes available opportunistically (e.g., because a backup or other capture workflow already ran).
- The existing normalization + hashing pipeline for content fidelity is the canonical source of truth for deep drift.
- Baseline capture and baseline compare already use observable run records; this spec extends run context and findings evidence details.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Deep settings drift appears when content evidence exists (Priority: P1)

As an operator, I want Golden Master compares to detect settings-level drift when deep policy content evidence is available, so I can trust that “no drift” actually means the settings match.

**Why this priority**: Without settings drift, Golden Master is not reliable for configuration governance.

**Independent Test**: Can be tested by capturing a baseline snapshot, ensuring a policy content version exists for the same subject after the baseline timestamp, then running compare and asserting a drift finding is produced when settings differ.

**Acceptance Scenarios**:

1. **Given** a baseline snapshot item for a subject and a newer content evidence record for the tenant subject, **When** compare runs, **Then** the system uses content fidelity for current-state hashing and produces a “different version” finding when settings differ.
2. **Given** a baseline snapshot item and a newer content evidence record where settings match, **When** compare runs, **Then** the system produces no “different version” finding and records content fidelity in run coverage context.

---

### User Story 2 - Mixed fidelity is transparent and interpretable (Priority: P1)

As an operator, I want each drift finding to clearly communicate how strong the drift signal is (content vs meta) and what evidence source was used, so I can interpret and prioritize findings correctly.

**Why this priority**: Mixed fidelity is acceptable only when it is obvious to the operator.

**Independent Test**: Can be tested by running compare where some subjects have content evidence and others do not, then asserting that findings include fidelity and source and the UI can filter by fidelity.

**Acceptance Scenarios**:

1. **Given** a compare run with a mix of subjects where some have content evidence and others only meta evidence, **When** I view drift findings, **Then** each finding shows a fidelity badge and displays baseline + current evidence provenance (fidelity, source, observed timestamp), and I can filter findings by fidelity.
2. **Given** a compare run with mixed fidelity, **When** I open run detail, **Then** I see a coverage breakdown that distinguishes content-covered types vs meta-only types.

---

### User Story 3 - Baseline capture uses best available evidence (Priority: P2)

As a workspace admin, I want baseline capture to store the strongest available hash for each snapshot item at capture time (content if available, otherwise meta), so baseline comparisons become more reliable without extra steps.

**Why this priority**: Improves trust and reduces “meta-only baseline” limitations without introducing extra costs.

**Independent Test**: Can be tested by capturing a baseline when content evidence exists for some subjects and not others, then asserting that snapshot items store fidelity/source accordingly.

**Acceptance Scenarios**:

1. **Given** baseline capture where content evidence exists for some in-scope subjects, **When** capture completes, **Then** those snapshot items record fidelity=`content` and a content evidence source indicator.
2. **Given** baseline capture where no content evidence exists for a subject, **When** capture completes, **Then** the snapshot item falls back to meta fidelity and records the meta source indicator.

### Edge Cases

- Content evidence exists for a tenant subject but is older than the baseline snapshot timestamp: it must not be used for “current state” when it would produce a temporally incorrect compare.
- Content evidence exists but cannot be normalized deterministically (unexpected shape): compare must fall back to meta fidelity and record a warning/evidence note.
- Coverage for a policy type is unproven for the compare run: findings for that type remain suppressed as per baseline drift coverage rules, regardless of fidelity.
- No evidence exists for a subject from any provider: the subject is skipped, and the run records an evidence-gap warning/coverage entry (no drift finding for that subject).
- Large scopes: evidence resolution is performed in batches (avoid per-subject lookups) so compare runtime scales predictably.

## Requirements *(mandatory)*

This feature reuses long-running baseline capture/compare runs and extends what evidence is used for hashing.

### Constitution alignment (required)

- v1.5 compare does not initiate any new external data fetches; it only consumes existing stored evidence.
- Findings and run context become more audit-friendly by persisting fidelity + source metadata.

### Operational UX Contract (Ops-UX)

- Capture and compare runs continue to use observable run identity and outcomes.
- Any new summary counters remain numeric-only and use existing canonical keys (additional detail stays in run context).
- Mixed fidelity coverage is recorded in run context for operator interpretation.

### Authorization Contract (RBAC-UX)

- No new authorization planes are introduced.
- Existing semantics apply:
  - Non-member / not entitled to workspace or tenant scope: deny-as-not-found behavior
  - Member but missing capability: forbidden behavior
- UI visibility does not replace server-side enforcement.

### Functional Requirements

#### v1.5 — Opportunistic deep drift (no new upstream calls during compare)

- **FR-117v15-01 Current hash resolution layer**: The system MUST provide a single resolution layer that can return a deterministic “current state” hash for a tenant subject, including:
  - the deterministic hash
  - the fidelity level (content or meta)
  - a human-readable source/provenance indicator
  - the timestamp of when the evidence was observed
  - If no evidence exists for a tenant subject, the resolver returns `null`.

- **FR-117v15-01b Null evidence handling**: When the resolver returns null for a subject, compare and capture MUST:
  - skip drift evaluation for that subject
  - record the subject as an evidence gap in the run’s coverage/warnings context
  - not create a drift finding for that subject

- **FR-117v15-02 Provider chain precedence**: The system MUST resolve current-state hashes using an ordered provider chain:
  1) content evidence provider (latest content version since a reference time)
  2) meta evidence provider (inventory meta contract)
  First non-null wins.

- **FR-117v15-03 Content evidence since-rule**: For compares, the resolver MUST treat the baseline snapshot timestamp as the reference time (`since`) to avoid using content evidence older than the baseline.
- **FR-117v15-03a Timestamp definition**: “Baseline snapshot timestamp” refers to the baseline snapshot captured time.

- **FR-117v15-04 Baseline compare integration (no legacy hashing)**: Baseline compare MUST NOT perform direct meta hashing logic. It MUST:
  - read baseline snapshot item hashes + fidelity/source
  - resolve current-state hashes exclusively via the provider chain
  - compute drift by comparing baseline hash vs current hash
  - attach evidence fields to findings (at minimum: fidelity, source, and observed timestamp)

- **FR-117v15-04b Finding provenance (both sides)**: When a drift finding is emitted, it MUST record provenance for both sides:
  - baseline evidence: fidelity, source, and observed timestamp (as recorded on the baseline snapshot item)
  - current evidence: fidelity, source, and observed timestamp (as resolved for the tenant subject)

- **FR-117v15-04a Finding fidelity semantics (mixed evidence)**: When a drift finding is emitted, the finding MUST expose an overall fidelity value suitable for badges and filtering.
  - If either baseline evidence or current evidence is meta fidelity, the overall fidelity MUST be meta.
  - Only when both sides are content fidelity may the overall fidelity be content.

- **FR-117v15-05 Baseline capture opportunistic fidelity**: Baseline capture MUST attempt to store snapshot item hashes via the same provider chain:
  - If content evidence exists at capture time, store content fidelity hash for the baseline item.
  - Otherwise store meta fidelity hash.
  - Snapshot items MUST persist the fidelity and source as audit properties.

- **FR-117v15-06 Coverage breakdown (content vs meta)**: Compare run context MUST include a coverage breakdown that distinguishes:
  - policy types with content evidence coverage
  - policy types that are meta-only
  - policy types uncovered (as per existing coverage guard rules)

- **FR-117v15-07 UX: fidelity transparency**: Drift findings UI MUST:
  - show a fidelity badge per finding (content = high confidence, meta = structural only)
  - allow filtering findings by fidelity
  - show baseline + current evidence provenance per finding (fidelity, source, observed timestamp for each side)
  - show run detail coverage breakdown (content vs meta)

- **FR-117v15-07a Fidelity filter values**: The findings fidelity filter MUST support exactly two values in v1.5: content and meta (no separate “mixed” value).

- **FR-117v15-08 Performance guard**: Evidence resolution MUST be batch-oriented to avoid per-subject query behavior as scope size increases.

- **FR-117v15-09 Evidence source tracking on snapshot items**: Baseline snapshot items MUST persist a source indicator (string) for the hash used, and MUST default it consistently for meta-only items (e.g., an explicit versioned meta source).

#### v2.0 (optional) — Full content capture mode for Golden Master

- **FR-117v2-01 Capture mode**: Baseline profiles MUST support a capture mode concept with at least:
  - meta-only
  - opportunistic (v1.5 default)
  - full content (opt-in)

- **FR-117v2-02 Targeted content capture**: When full content is enabled, the system MUST be able to capture content evidence for in-scope tenant subjects missing “fresh enough” content evidence.

- **FR-117v2-03 Quota/budget safety**: Full content capture MUST be resumable and quota-aware, and MUST record completion/skips/throttling indicators in run context.

- **FR-117v2-04 Provider chain extension point**: The provider chain MAY gain additional providers between content and meta as evidence sources evolve, without changing compare semantics.

## UI Action Matrix *(mandatory when Filament is changed)*

This spec changes how findings/run detail surfaces present evidence (badges/filters/coverage breakdown). No new destructive actions are introduced.

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Drift Findings landing | admin tenant-context | none (existing) | open finding / run detail (existing) | none (existing) | none | existing CTA | none | n/a | no new | Add fidelity badge + fidelity filter; no new mutations |
| Compare run detail | admin tenant-context | none (existing) | open findings (existing) | none (existing) | none | n/a | none | n/a | no new | Add coverage breakdown: content vs meta vs uncovered |
| Baseline capture surfaces | admin workspace | existing capture start | open snapshot detail (existing) | none | none | existing CTA | none | existing | yes (existing) | Capture stores fidelity/source per snapshot item |

### Key Entities *(include if feature involves data)*

- **Current State Hash Resolver**: Ordered resolver that returns the best available deterministic hash for a tenant subject, plus fidelity/source/observed_at.
- **Content Evidence Record**: Immutable captured policy content version with timestamps and subject identity.
- **Meta Evidence Record**: Stable “inventory meta contract” payload used for structural hashing.
- **Baseline Snapshot Item**: Workspace-owned baseline item for a subject key, storing baseline hash + fidelity/source audit fields.
- **Drift Finding Evidence**: Finding fields that capture the evidence fidelity and provenance used to compute the drift signal.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-117-01 Settings drift detection**: When content evidence exists for a changed subject, a compare run produces a drift finding for settings-only changes with a success rate of at least 95% in controlled tests.
  - **Controlled tests definition (CI)**: a deterministic fixture matrix executed via Pest that includes at least 20 settings-only change cases (across at least 3 policy types) where content evidence exists and is newer than the baseline captured time.
  - **Pass criteria (CI)**: at least 19/20 cases produce the expected drift outcome (finding emitted when different; no finding when equal).
- **SC-117-02 Transparency**: 100% of “different version” findings display baseline + current evidence provenance (fidelity, source, observed timestamp for each side), and operators can filter findings by fidelity.
- **SC-117-03 Compare cost containment (v1.5)**: Compare runs complete without initiating any new upstream data fetches.
- **SC-117-04 Performance**: For a baseline scope of 500 subjects, evidence resolution and compare complete within an agreed operational time budget (target: ≤ 2 minutes in a typical staging environment).
  - **Guardrail (CI)**: resolver evidence lookups are batch-oriented and remain set-based (no per-subject query loops). A performance guard test enforces an upper bound on query count for resolving a representative batch.