# Feature Specification: Golden Master Deep Drift v2 (Full Content Capture)

**Feature Branch**: `118-baseline-drift-engine`  
**Created**: 2026-03-03  
**Status**: Draft (implementable)  
**Input**: User description: "Spec 118 — Golden Master Deep Drift v2: Full Content Capture (policy snapshot-backed), quota-aware, resumable, no-legacy"

## Spec Scope Fields *(mandatory)*

- **Scope**: workspace (baseline definition + baseline snapshots) + tenant (compare runs + findings + evidence capture)
- **Primary Routes**:
  - Workspace admin: Baseline Profiles (list, create/edit, detail) + Baseline Snapshots (list/detail)
  - Tenant-context admin: Baseline Compare runs (start, list, detail) + Drift Findings landing
- **Data Ownership**:
  - Workspace-owned: baseline profiles, baseline snapshots, baseline snapshot items
  - Tenant-scoped (within a workspace): operation runs for baseline capture/compare, drift findings, and tenant policy evidence captured for baseline purposes
  - Baseline snapshots are workspace-owned standards captured from a chosen tenant and are comparable against other tenants in the same workspace, but snapshot items MUST NOT persist tenant identifiers (including tenant IDs, tenant external IDs, policy version IDs, and operation run IDs).
- **RBAC**:
  - Workspace Baselines:
    - `workspace_baselines.view`: view baseline profiles + snapshots
    - `workspace_baselines.manage`: create/edit baseline profiles, start baseline capture
  - Tenant Compare + Findings:
    - `tenant.sync`: start baseline compare runs (and any compare-time evidence refresh)
    - `tenant_findings.view`: view drift findings
  - Tenant access is required for tenant-context surfaces, in addition to workspace membership
  - Evidence created for baseline purposes MUST NOT be broadly discoverable outside baseline-related permissions.
  - **Baseline-purpose evidence visibility**: Tenant-owned evidence snapshots / policy versions captured for baseline purposes (e.g. `capture_purpose=baseline_capture|baseline_compare`) MUST be visible only to tenant members with `tenant.sync` or `tenant_findings.view` (never via `tenant.view` alone).

For canonical-view specs: not applicable (this is not a canonical-view feature).

## Clarifications

### Session 2026-03-03

- Q: Are baseline snapshots reusable across multiple tenants in the workspace? → A: Yes — baseline snapshots are reusable across multiple tenants in the same workspace (cross-tenant compare is in-scope).
- Q: How should cross-tenant subject matching work? → A: Match by `policy_type + normalized display_name`.
- Q: What should compare do when cross-tenant matching is missing/ambiguous? → A: Record an evidence gap reason and suppress drift evaluation for those subjects.
- Q: What are the exact rules for `normalized display_name`? → A: `trim` + collapse internal whitespace to single spaces + lowercase.

## Problem Statement

Golden Master baseline compare frequently produces “no drift” even when policy settings changed, because the current state used for comparison is often limited to a metadata-level signal, while the real configuration is only visible in full policy content.

This spec makes Golden Master self-sufficient for deep drift: when a baseline profile is configured for full-content capture, baseline capture and baseline compare automatically generate the required evidence on demand and compare stable content-based fingerprints.

## Goals

- Deep drift by default for baselines configured for full-content capture.
- One compare engine: no parallel legacy compare / fingerprinting / canonicalization logic paths.
- Quota-aware and resumable evidence capture that remains safe under throttling and transient upstream errors.
- Auditability: each run clearly documents scope, coverage, fidelity, evidence capture stats, and any evidence gaps.
- Operator UX: an admin can “Capture baseline (full content)” and “Compare now (full content)” without per-policy manual capture.

## Non-Goals

- No export/PDF/report packaging pipeline.
- No SIEM replacement or ingestion of external audit streams.
- No requirement to introduce a separate “evidence item” reporting model; this spec remains compatible with a future evidence/reporting layer.

## Definitions

- **Subject**: a single compare object identified for cross-tenant comparison by `policy_type + subject_key` (tenant context is provided by the run, not persisted in the workspace-owned snapshot item).
- **Normalized display name**: derived from display name by trimming leading/trailing whitespace, collapsing internal whitespace to single spaces, and converting to lowercase.
- **Subject key (`subject_key`)**: the stored, cross-tenant match key for a subject, equal to the normalized display name.
- **Baseline snapshot**: a workspace-owned captured snapshot of subjects within a baseline scope.
- **Evidence snapshot**: an immutable record of full policy content captured from the tenant for a subject, used to produce a stable, comparable fingerprint.
- **Fidelity**:
  - **content**: drift signal derived from canonicalized full policy content (including assignments and scope tags when applicable)
  - **meta**: drift signal derived from a stable metadata contract (explicitly marked degraded)
- **Coverage proof**: proof that the tenant current-state index is complete enough to safely determine missing-policy outcomes for the scope.

## Assumptions

- Baseline drift already records observable run records for capture and compare.
- The system already has a canonical process for turning full policy content into a stable fingerprint for comparison, used by other workflows.
- Some subjects may not be capturable (permissions, unsupported endpoints, temporary upstream issues); these produce warnings and explicit gaps rather than silent success.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Capture a full-content baseline without per-policy steps (Priority: P1)

As a workspace admin, I want to capture a baseline snapshot with full-content fidelity across the entire configured scope, so that Golden Master comparisons detect settings drift reliably without manually capturing each policy.

**Why this priority**: This is the primary value proposition for deep drift: baseline capture must produce strong evidence automatically.

**Independent Test**: Can be tested by creating a baseline profile configured for full-content capture, running “Capture baseline”, and validating that a baseline snapshot is created with content fidelity for all capturable subjects and explicit gaps for any non-capturable subjects.

**Acceptance Scenarios**:

1. **Given** a baseline profile configured for full-content capture and a tenant with in-scope subjects, **When** I run “Capture baseline (full content)”, **Then** the system captures evidence snapshots on demand for subjects missing suitable evidence and produces a baseline snapshot with per-subject fingerprints and fidelity.
2. **Given** a capture run where some subjects cannot be captured due to throttling or access limitations, **When** the run completes, **Then** the run outcome is “completed with warnings” and the UI shows an evidence gap summary (counts + reasons) rather than presenting a misleading “fully captured” state.

---

### User Story 2 - Compare now with full content and get explainable drift (Priority: P1)

As an operator, I want to run “Compare now (full content)” and see reliable drift findings (missing/unexpected/different), with clear context about coverage and evidence fidelity, so that I can act on findings with confidence.

**Why this priority**: The feature is only trustworthy when “no drift” is explainable and “drift” is based on strong evidence.

**Independent Test**: Can be tested by capturing a full-content baseline, simulating a settings-only change for a subject, running “Compare now (full content)”, and asserting that a drift finding is produced with content fidelity provenance.

**Acceptance Scenarios**:

1. **Given** a full-content baseline snapshot and current evidence refreshed as part of compare, **When** a subject’s settings differ between baseline and current, **Then** compare emits a “different version” finding with content fidelity and stores evidence provenance for both baseline and current.
2. **Given** a compare run where coverage proof is missing for some policy types, **When** compare runs, **Then** the system suppresses “missing policy” outcomes for uncovered types and records a coverage warning and explanation in the run detail.

---

### User Story 3 - Throttling-safe, resumable evidence capture (Priority: P1)

As an operator, I want evidence capture to respect rate limits and safely resume from where it left off, so that deep drift can be executed in large scopes without manual babysitting.

**Why this priority**: Full-content capture is only viable if it behaves predictably under real-world quotas.

**Independent Test**: Can be tested by simulating rate limiting for part of the scope, verifying that the run completes with warnings and a resume token, then resuming and eventually completing without duplicating evidence work.

**Acceptance Scenarios**:

1. **Given** a capture/compare run that hits rate limiting before completing the scope, **When** the run ends, **Then** it records an opaque resume token and a deterministic gap list, and it can be resumed via a single UI action.
2. **Given** a resumed capture, **When** it continues from the resume token, **Then** it does not re-capture subjects already captured in the prior run for the same purpose.

---

### User Story 4 - “Why no findings?” is always clear (Priority: P2)

As an operator, I want the compare run detail to explain “why no findings” (e.g., no subjects, coverage unproven, evidence capture incomplete), so that zero findings never looks like a silent failure.

**Why this priority**: Operator trust depends on eliminating ambiguous “0 findings” states.

**Independent Test**: Can be tested by running compare in a scenario that processes zero subjects (or suppresses findings due to coverage), and verifying that the UI shows a clear explanation sourced from run context.

**Acceptance Scenarios**:

1. **Given** a compare run where the resolved subject list is empty, **When** the run completes, **Then** the run context contains a reason code explaining why and the UI displays it.
2. **Given** a compare run that produces zero findings but processed subjects, **When** it completes, **Then** it still records a reason code such as “no drift detected” and provides evidence/fidelity context.

### Edge Cases

- Scope resolves to zero subjects: compare and capture complete with warnings and an explicit reason code; no silent success.
- Some subjects are forbidden/unsupported: they are recorded as evidence gaps with reasons; drift evaluation is degraded or skipped per rules.
- Evidence is available but cannot be normalized deterministically: the run degrades fidelity for that subject and records the gap reason.
- Compare is retried after a transient failure: findings are not duplicated; lifecycle increments happen at most once per run identity.
- Mixed evidence (some content, some meta): the run clearly reports breakdown; findings display the weaker-of-two fidelity for badge/filter semantics.

## Requirements *(mandatory)*

### Constitution alignment (required)

- This feature performs outbound reads to capture full policy content as evidence, and it does so via observable long-running runs.
- The feature MUST use a single canonical method to produce content-fidelity fingerprints, shared with other workflows.
- “No legacy” is enforced: capture/compare orchestration does not implement per-policy fingerprinting logic and does not call legacy meta drift helpers.

### Operational UX Contract (Ops-UX)

- Baseline capture and baseline compare MUST run as observable operations with a run identity, start/stop times, outcome, and a user-facing progress surface.
- Run lifecycle transitions are service-owned.
- “Completed with warnings” MUST be used when evidence capture or coverage proof is incomplete.
- Compare runs MUST NEVER silently produce “0 findings” without an explicit explanation. The run context MUST include a reason code when:
  - the resolved subject total is 0, or
  - the processed subject count is 0, or
  - findings are suppressed due to coverage/evidence rules.

### Authorization Contract (RBAC-UX)

- Authorization planes:
  - Workspace admin surfaces for baseline profiles/snapshots
  - Tenant-context admin surfaces for compare runs and findings
- 404 vs 403 semantics:
  - Non-member / not entitled to workspace scope OR tenant scope → 404 (deny-as-not-found)
  - Member but missing capability → 403
- Starting runs (“Capture baseline”, “Compare now”, “Resume capture”) is a mutation and MUST be enforced server-side.

### Functional Requirements

#### Configuration & Modes

- **FR-118-01 Capture mode**: Baseline profiles MUST support a capture mode with at least: meta-only, opportunistic, and full-content.
- **FR-118-02 Deep drift by default**: For baseline profiles with capture mode = full-content, baseline capture and compare MUST prioritize content fidelity for all capturable subjects.
- **FR-118-03 One engine, no legacy**: There MUST be exactly one compare engine and one canonical fingerprinting method. No parallel “legacy compare” or “legacy fingerprint” implementations may exist.

#### Subject Scope & Coverage

- **FR-118-04 Effective scope resolution**: Each run MUST resolve and persist an effective scope for the baseline profile (including total subject count).
- **FR-118-04a Cross-tenant matching**: When comparing a workspace-owned baseline snapshot to a tenant’s current state, subject matching MUST use `policy_type + subject_key` where `subject_key` is the normalized display name, and workspace-owned snapshot items MUST NOT persist tenant identifiers.
- **FR-118-04a1 Normalization rules**: The definition of `subject_key` (normalized display name) MUST be consistent across baseline capture and compare: trim leading/trailing whitespace, collapse internal whitespace to single spaces, and lowercase.
- **FR-118-04b Ambiguous/missing match handling**: If cross-tenant matching is missing or ambiguous for a subject (e.g., missing display name, multiple candidates for the same normalized name within a policy type), compare MUST record an evidence gap reason and MUST suppress drift evaluation for that subject.
- **FR-118-05 Coverage proof guard**: Compare MUST only emit “missing policy” outcomes when coverage proof exists for the policy type. If coverage proof is missing/unproven, missing-policy outcomes for that type MUST be suppressed and a warning MUST be recorded.

#### Baseline Capture (full-content)

- **FR-118-CAP-01 Preflight**: Capture MUST resolve the subject list for the effective scope and record the total subject count in run context.
- **FR-118-CAP-02 Evidence capture on demand**: For full-content capture, the system MUST capture any missing or stale evidence snapshots for in-scope subjects, up to a configurable per-run budget.
- **FR-118-CAP-03 Idempotency within run**: Within a single run, the same subject MUST NOT be captured more than once for the same capture purpose.
- **FR-118-CAP-04 Snapshot build**: Baseline snapshots MUST store a per-subject stable fingerprint plus:
  - fingerprint fidelity (`content` vs `meta`)
  - fingerprint source/provenance indicator
  - observed timestamp
- **FR-118-CAP-05 Incomplete capture semantics**: If full-content capture is incomplete, the run MUST complete with warnings, the snapshot may still be created, and any subjects that fell back to meta fidelity (or were skipped) MUST be recorded as gaps.

#### Baseline Compare (full-content)

- **FR-118-CMP-01 Current evidence refresh**: For full-content compares, compare MUST refresh current evidence for in-scope subjects before drift evaluation, within a configurable budget.
- **FR-118-CMP-02 Best-available state resolution**: Current state resolution MUST always prefer full-content evidence when available and fall back to explicitly degraded metadata evidence only when necessary. Compare orchestration MUST NOT implement fingerprinting itself.
- **FR-118-CMP-03 Drift rules**: For each subject:
  - baseline-only → missing policy (only when coverage proof exists for the type)
  - current-only → unexpected policy
  - both present and fingerprints differ → different version
- **FR-118-CMP-04 Stable finding identity + lifecycle**: Findings MUST have a stable recurrence identity independent of fingerprints, and MUST maintain lifecycle fields (first seen, last seen, times seen). Retries MUST NOT duplicate findings.
- **FR-118-CMP-05 Explainability**: Compare run context MUST include:
  - scope totals and processed counts
  - coverage proof status
  - fidelity breakdown (content vs meta)
  - evidence capture stats (requested/succeeded/skipped/failed/throttled)
  - evidence gaps (counts + top reasons, including missing/ambiguous cross-tenant match)

#### Quota, Throttling, Resume

- **FR-118-Q-01 Budget controls**: Evidence capture MUST be bounded by configurable limits (concurrency, items-per-run, retry limits) with safe defaults; default values MUST be explicitly defined in configuration and documented in the implementation plan.
- **FR-118-Q-02 Throttling behavior**: When rate limiting or temporary upstream errors occur, capture MUST back off and retry within limits, and then record throttling as a gap reason if it cannot complete.
- **FR-118-Q-03 Resumable token**: When a run cannot complete the scope within its budget, the run context MUST include an opaque resume token and enough information to resume deterministically.
- **FR-118-Q-04 Partial failure**: Individual subjects may fail without failing the entire run, but the run MUST complete with warnings and must deterministically report gaps.

#### Auditability & Retention

- **FR-118-AUD-01 Run auditing**: Each run MUST record scope, coverage, fidelity breakdown, evidence capture stats, and evidence gaps.
- **FR-118-AUD-02 Evidence purpose tagging**: Evidence snapshots captured for baseline purposes MUST be attributable to the initiating run and baseline profile for audit/debugging.
- **FR-118-AUD-03 Retention policy**: Evidence captured for baseline purposes MUST have a configurable retention distinct from long-term backup evidence.

#### Security & Compliance

- **FR-118-SEC-01 Redaction before persistence**: The system MUST remove secrets/PII from captured policy content before it is stored or used to produce fingerprints.
- **FR-118-SEC-02 Least privilege access**: Evidence captured for baseline purposes MUST be access-controlled and not broadly visible outside baseline-related permissions.
- **FR-118-SEC-03 Audit events**: Starting baseline evidence capture and compare runs MUST write audit events that include purpose, scope counts, and gap/warning summaries.

#### UX Requirements

- **FR-118-UX-01 Single-action buttons**: Baseline profile screens MUST provide:
  - “Capture baseline (full content)”
  - “Compare now (full content)”
- **FR-118-UX-02 Evidence gaps panel**: Compare run detail MUST include an “Evidence capture” panel showing content coverage percentage, fallback counts, and top gap reasons, and MUST provide “Resume capture” when a resume token exists.
- **FR-118-UX-03 Snapshot fidelity visibility**: Snapshot list/detail MUST show whether the snapshot is content-complete or captured with gaps, and show counts by fidelity.
- **FR-118-UX-04 Why-no-findings explanation**: When a run processes zero subjects or produces zero findings, the UI MUST display a clear explanation sourced from the run context reason code.

#### Rollout

- **FR-118-ROL-01 Controlled rollout**: Full-content baseline capture/compare MUST be gated by a short-lived rollout flag for canary deployment.
- **FR-118-ROL-02 No-legacy regression guard**: Automated guardrails MUST prevent re-introduction of legacy fingerprinting/compare paths.

## UI Action Matrix *(mandatory when Filament is changed)*

This spec adds/changes operational actions and run-detail panels on existing baseline/compare surfaces.

For each surface, list the exact action labels, whether they are destructive (confirmation? typed confirmation?),
RBAC gating (capability + enforcement helper), and whether the mutation writes an audit log.

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline Profile (workspace admin) | Admin workspace | Capture baseline (full content); Compare now (full content) | View/inspect baseline profile | Edit (existing), Archive (existing, confirmed) | None | Create baseline profile (existing) | N/A | Save/Cancel (existing) | Yes | Starting capture/compare writes audit events and creates observable runs |
| Compare Run Detail (tenant-context admin) | Admin tenant-context | Resume capture (only when resume token exists) | Linked from runs list | None | None | N/A | N/A | N/A | Yes | Evidence capture panel + why-no-findings explanation sourced from run context |
| Drift Findings landing (tenant-context admin) | Admin tenant-context | None | Open finding (existing) | None | None | Existing CTA | N/A | N/A | Yes (existing) | Findings show fidelity badge + provenance for baseline/current evidence |

### Key Entities *(include if feature involves data)*

- **Baseline profile**: Defines scope and capture mode, and is the parent for baseline snapshots.
- **Baseline snapshot**: A captured baseline reference set for a baseline profile.
- **Baseline snapshot item**: Per-subject baseline evidence (fingerprint, fidelity, provenance, observed timestamp).
- **Evidence snapshot**: Immutable captured policy content used to produce a stable, comparable fingerprint.
- **Operation run**: Observable record of a capture/compare execution, including context, coverage, fidelity breakdown, stats, and gaps.
- **Finding**: Recurring drift result with stable identity and lifecycle fields, plus evidence for baseline/current.
- **Resume token**: Opaque token that enables resuming evidence capture deterministically.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-118-01 Deep drift reliability**: For baseline profiles configured for full-content capture, settings-only changes for in-scope subjects produce a “different version” drift finding with a success rate of at least 95% in controlled tests.
- **SC-118-02 No silent zeros**: 100% of compare runs that process zero subjects or produce zero findings include a run-context reason code and display a corresponding explanation in the UI.
- **SC-118-03 Resumable capture**: In controlled tests with simulated rate limiting, evidence capture completes across one or more resumed runs without duplicating captured subjects and with deterministic gap reporting.
- **SC-118-04 Operator clarity**: On run detail pages, operators can access effective scope, coverage status, fidelity breakdown, capture stats, and evidence gaps without navigating to additional pages.
- **SC-118-05 No-legacy enforcement**: Automated checks reliably fail when legacy fingerprinting/compare helpers are referenced by baseline capture/compare orchestration.