TenantAtlas/specs/116-baseline-drift-engine/spec.md

# Feature Specification: Baseline Drift Engine (Final Architecture)

**Feature Branch**: `116-baseline-drift-engine`
**Created**: 2026-03-01
**Status**: Draft
**Input**: User description: "Spec 116 — Baseline Drift Engine (Final Architecture)"

## Spec Scope Fields *(mandatory)*

- **Scope**: workspace (baseline definition + capture) + tenant (baseline compare monitoring)
- **Primary Routes**:
  - Workspace (admin): Baseline Profiles (create/edit scope, capture baseline)
  - Tenant-context (admin): Baseline Compare runs (compare now, run detail) and Drift Findings landing
- **Data Ownership**:
  - Workspace-owned: Baseline profiles and baseline snapshots
  - Tenant-scoped (within a workspace): Operation runs for baseline capture/compare; drift findings produced by compare
  - Baseline snapshots are workspace-owned standards captured from a chosen tenant, but snapshot items MUST NOT persist tenant identifiers (e.g., no `tenant_id` column on snapshot items).
- **RBAC**:
  - Workspace (Baselines):
    - `workspace_baselines.view`: view baseline profiles + snapshots
    - `workspace_baselines.manage`: create/edit/archive baseline profiles, start capture runs
  - Tenant (Compare):
    - `tenant.sync`: start baseline compare runs
    - `tenant_findings.view`: view drift findings
  - Tenant access is required for tenant-context surfaces, in addition to workspace membership

For canonical-view specs: not applicable (this is not a canonical-view feature).

## Clarifications

### Session 2026-03-01

- Q: Should finding identity be stable across baseline re-captures, or tied to a specific baseline snapshot? → A: Tie finding identity to `baseline_snapshot_id` (stable within a snapshot; re-capture creates new finding identities).
- Q: In v2, should drift dimensions be stored as flags on a single finding, or as separate findings per dimension? → A: Use one finding with dimension flags (no separate findings per dimension).
- Q: When running a compare, which baseline snapshot should be used by default? → A: Default to the baseline profile’s `active_snapshot_id` (updated only by successful captures); allow explicitly selecting a snapshot.
- Q: When coverage is missing for a policy type, should compare emit any findings for that type? → A: Skip all finding emission for uncovered types (no `missing_policy`, no `unexpected_policy`, no `different_version`).

## Outcomes

- **O-1 One engine**: There is exactly one baseline drift compare engine; no parallel legacy compare/hash paths.
- **O-2 Stable findings (recurrence)**: The same underlying drift maps to the same finding identity across retries and across runs, with lifecycle counters.
- **O-3 Auditability & operator UX**: Each compare run records scope, coverage, and fidelity; partial coverage produces warnings (not misleading “missing policy” noise).
- **O-4 No legacy logic after v2**: After the v2 extension, there are no “meta compare here / diff there” special cases; all drift flows through the same pipeline.

## Definitions

- **Subject key**: A compare object identity independent of tenant, identified by `(policy_type, external_id)`.
- **Tenant subject**: A subject key within a tenant context, identified by `(tenant_id, policy_type, external_id)`.
- **Policy state**: A normalized representation of a tenant subject, containing a deterministic hash, fidelity, and observation metadata.
- **Fidelity**:
  - **meta**: drift signal based on a stable “inventory meta contract” (signal-based fields)
  - **content**: drift signal based on canonicalized policy content (semantic)
- **Effective scope**: The expanded set of policy types processed by a run.
- **Coverage**: Which policy types are confirmed to be present/updated in the tenant current state at the time of compare.

## Assumptions

- Baseline drift is sold as “signal-based drift detection” in v1 (meta fidelity), and later upgraded to deep drift (content fidelity) without changing the compare engine semantics.
- The system already has a tenant-scoped inventory sync mechanism capable of recording per-run coverage of which policy types were synced.
- Foundations are treated as opt-in policy types; they are excluded unless explicitly selected.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Capture and compare a baseline with stable findings (Priority: P1)

As a workspace admin, I want to define a baseline scope, capture a baseline snapshot, and compare a tenant against that baseline, so I can reliably detect and track drift over time.

**Why this priority**: This is the core product slice that makes baseline drift sellable: consistent capture, consistent compare, and stable findings.

**Independent Test**: Can be tested by creating a baseline profile with a defined scope, capturing a snapshot, running compare twice, and verifying stable finding identity and lifecycle counters.

**Acceptance Scenarios**:

1. **Given** a baseline profile with scope “all policy types (excluding foundations)”, **When** I capture a baseline snapshot, **Then** the snapshot contains only in-scope policy subjects and each snapshot item records its hash and fidelity.
2. **Given** a captured baseline snapshot and a tenant current state, **When** I run compare twice with the same inputs, **Then** the same drift maps to the same finding identity and lifecycle counters increment at most once per run.

---

### User Story 2 - Coverage warnings prevent misleading missing-policy findings (Priority: P1)

As an operator, I want the compare run to warn when current-state coverage is partial, so that missing policies are not falsely reported when the system simply lacks data.

**Why this priority**: Trust depends on avoiding false negatives/positives; “missing policy” findings on partial sync is unacceptable noise.

**Independent Test**: Can be tested by running compare with an effective scope where some policy types are intentionally marked as not synced, verifying warning outcome and suppression behavior.

**Acceptance Scenarios**:

1. **Given** a compare run where some policy types in effective scope were not synced, **When** compare is executed, **Then** the run completes with warnings and produces no findings at all for those missing-coverage types.
2. **Given** a compare run where coverage is complete, **When** a baseline policy subject is missing in current state for a covered type, **Then** a missing-policy finding is produced.

---

### User Story 3 - Operators can understand scope, coverage, and fidelity in the UI (Priority: P2)

As an operator, I want drift screens to clearly show what was compared (scope), how complete the data was (coverage), and how “deep” the drift signal is (fidelity), so I can interpret findings correctly.

**Why this priority**: Drift findings are only actionable when the operator understands context and limitations.

**Independent Test**: Can be tested by executing a compare run with and without coverage warnings, verifying that run detail and drift landing surfaces render scope counts, coverage badge, and fidelity indicators.

**Acceptance Scenarios**:

1. **Given** a compare run with full coverage, **When** I open run detail, **Then** I see the compared scope and a coverage status of OK.
2. **Given** a compare run with partial coverage, **When** I open the drift landing and run detail, **Then** I see a warning banner and can see which types were missing coverage.

### Edge Cases

- Compare is retried after a transient failure: findings are not duplicated; lifecycle increments happen at most once per run identity.
- Baseline capture is executed with empty scope lists (interpreted as default semantics): policy types means “all supported types excluding foundations”; foundations list means “none”.
- Effective scope expands to zero types (e.g., no supported types): run completes with an explicit warning and produces no findings.
- Policy subjects appear/disappear between inventory sync and compare: handled according to coverage rules; does not create missing-policy noise for uncovered types.
- Two different policy subjects accidentally share an external identifier across types: identity is still unambiguous because `policy_type` is part of the subject key.

## Requirements *(mandatory)*

This feature introduces/extends long-running compare work and uses `OperationRun` for capture and compare runs.
It must comply with:

- **Run observability**: Every capture/compare run must have a visible run identity, scope context, coverage context, and outcome.
- **Safety**: Compare must never claim missing policies for policy types where current-state coverage is not proven.
- **Tenant isolation**: Inventory items, operation runs, and findings are tenant-scoped; cross-tenant access must be deny-as-not-found. Baseline profiles/snapshots are workspace-owned and must not persist tenant identifiers.

### Operational UX Contract (Ops-UX)

- Capture and compare run lifecycle transitions are service-owned (not UI-owned).
- Run summaries provide numeric-only counters using ONLY keys from `app/Support/OpsUx/OperationSummaryKeys.php`.
  - Coverage warnings MUST be represented using an existing canonical numeric key (default: `errors_recorded`).
- Warning semantics mapping (canonical):
  - Any “completed with warnings” case MUST be represented as `OperationRun.outcome = partially_succeeded`.
  - `summary_counts.errors_recorded` MUST be a numeric indicator of warning magnitude.
    - Default: number of uncovered policy types in effective scope.
    - Edge case (effective scope expands to zero types): `summary_counts.errors_recorded = 1` so the warning remains visible under the numeric-only summary_counts contract.
- Scheduled/system-initiated runs (if any) must not generate user terminal DB notifications; audit is handled via monitoring surfaces.
- Regression guard tests are added/updated to enforce correct run outcome semantics and summary counter rules.

### Authorization Contract (RBAC-UX)

- Workspace membership + capability gates:
  - `workspace_baselines.view` is required to view baseline profiles and snapshots.
  - `workspace_baselines.manage` is required to create/edit/archive baseline profiles and start capture runs.
  - `tenant.sync` is required to start compare runs.
  - `tenant_findings.view` is required to view drift findings.
- 404 vs 403 semantics:
  - Non-member or not entitled to workspace/tenant scope → 404 (deny-as-not-found)
  - Member but missing capability → 403
- Destructive-like actions (e.g., archiving a baseline profile) require an explicit confirmation step.
- At least one positive and one negative authorization test exist for each mutation surface.

### Functional Requirements

#### v1 — Meta-fidelity baseline compare (sellable)

- **FR-116v1-01 Baseline profile scope**: Baseline profiles MUST store a scope object with `policy_types` and `foundation_types` lists.
  - Default semantics: `policy_types = []` means all supported policy types excluding foundations; `foundation_types = []` means no foundations.
  - Foundations MUST only be included when explicitly selected.
- **FR-116v1-02 UI scope picker**: The UI MUST provide multi-select controls for Policy Types and Foundations and communicate the default semantics (empty selection = default behavior).
- **FR-116v1-03 Effective scope recorded on runs**: Capture and compare runs MUST record expanded effective scope in run context:
  - `effective_scope.policy_types[]`, `effective_scope.foundation_types[]`, `effective_scope.all_types[]`, and a boolean `effective_scope.foundations_included`.
- **FR-116v1-04 Inventory meta contract**: The system MUST define and persist a stable “inventory meta contract” (signal-based fields) for drift hashing.
  - Minimum required signals: type identifier, version marker (when available), last modified time (when available), scope tags (when available), and assignment target count (when available).
  - Drift hashing for v1 MUST be based only on this contract (not arbitrary meta fields).
  - Contract outputs MUST be versioned so future additions do not retroactively change v1 semantics (e.g., `meta_contract.version = 1`).
  - For baseline snapshot items, the exact contract payload used for hashing MUST be persisted in the snapshot item `meta_jsonb` (e.g., `meta_jsonb.meta_contract`).
- **FR-116v1-05 Provide current-state policy states (meta fidelity)**: For all policy subjects in effective scope, the system MUST produce a normalized policy state for compare, including:
  - subject key (policy type + external id), deterministic hash, fidelity=`meta`, source indicator, and observed timestamp.
  - In v1, `observed_at` MUST be derived from persisted inventory evidence (`inventory_items.last_seen_at`), not from per-item external hydration calls during compare.
  - In v1, `source` MUST indicate the meta-fidelity source (e.g., `inventory_meta_contract:v1`) and MAY include stable provenance (e.g., `inventory_items.last_seen_operation_run_id`) for traceability.
- **FR-116v1-06 Baseline capture stores states (not raw)**: Baseline capture MUST store per-subject snapshot items that include the subject identity and the captured hash + fidelity + source + observed timestamp.
  - Baseline snapshots MUST NOT contain out-of-scope items.
  - Snapshot items MUST store observation metadata in `baseline_snapshot_items.meta_jsonb` (at minimum: `fidelity`, `source`, `observed_at`; when available: `observed_operation_run_id`).
- **FR-116v1-06a Compare snapshot selection**: Baseline compare MUST, by default, use the latest successful baseline snapshot of the selected baseline profile.
  - Definition (v1): “latest successful baseline snapshot” is `baseline_profiles.active_snapshot_id` (updated only after a successful capture run persists a snapshot + items).
  - If `active_snapshot_id` is `null`, compare start MUST be blocked with a clear precondition failure (no implicit “pick the newest captured_at” fallback).
  - The UI MAY allow selecting a specific snapshot explicitly for historical comparisons.
- **FR-116v1-07 Coverage guard**: Compare MUST check current-state coverage recorded by the most recent inventory sync run.
  - If effective scope contains policy types not present in coverage, the compare run MUST complete with warnings.
  - For any uncovered policy type, the compare MUST NOT emit findings of any kind for that type (no `missing_policy`, no `unexpected_policy`, no `different_version`).
  - Drift findings for types with proven coverage may still be produced.
  - If there is no completed inventory sync run (or coverage proof is missing/unreadable), coverage MUST be treated as unproven for all types and the compare MUST produce zero findings (fail-safe) and complete with warnings.
- **FR-116v1-08 Drift rules**: Compare MUST produce drift results per policy subject:
  - Baseline-only → `missing_policy` (only when coverage is proven for the subject’s type)
  - Current-only → `unexpected_policy`
  - Both present and hashes differ → `different_version` (with fidelity=`meta`)
- **FR-116v1-09 Stable finding identity**: Findings MUST have a stable identity key derived from: tenant, baseline snapshot, policy type, external id, and change type.
  - Hashes are evidence fields and may update without changing identity.
  - Finding identity MUST be tied to a specific baseline snapshot (re-capture creates a new baseline snapshot and therefore new finding identities).
- **FR-116v1-10 Finding lifecycle + retry idempotency**: Findings MUST record first seen, last seen, and times seen.
  - For a given run identity, lifecycle counters MUST not increment more than once.
- **FR-116v1-11 Auditability**: Each capture and compare run MUST write an audit trail including effective scope counts, coverage warning summary (if any), and finding counts per change type.
- Audit trail storage (canonical):
  - Aggregations that do not fit `summary_counts` MUST be stored in `operation_runs.context` (not new summary keys).
  - Compare MUST store per-change-type counts in run context under `findings.counts_by_change_type` (e.g., keys: `missing_policy`, `unexpected_policy`, `different_version`).
  - For this repository, the canonical audit trail is the `operation_runs` record itself (status/outcome + context + numeric summary_counts); do not introduce parallel “audit summary” persistence for the same data.
- **FR-116v1-12 Drift UI context**: Compare run detail and drift landing MUST surface scope, coverage status, and fidelity (meta-based drift) and show a warning banner when coverage warnings were present.

#### v2 — Content-fidelity extension (deep drift, same engine)

**Deferred / out of scope for this delivery**: The v2 requirements below are intentionally not covered by `specs/116-baseline-drift-engine/tasks.md` and will be implemented in a follow-up spec/milestone.

- **FR-116v2-01 Provider precedence**: Current state MUST be sourced with a precedence chain per policy type: “policy version (if available) → inventory content (if available) → meta fallback (explicitly marked degraded)”.
- **FR-116v2-02 Content hash availability**: The inventory system MUST persist a content hash and capture timestamp for hydrated policy content.
- **FR-116v2-03 Quota-aware hydration**: Content hydration MUST be throttling-safe and resumable, with explicit per-run caps and concurrency limits, and must record hydration coverage in run context.
- **FR-116v2-04 Content normalization rules**: The system MUST define canonicalization rules per policy type, including volatile-field removal and (where needed) redaction hooks.
- **FR-116v2-05 Drift dimensions (optional but final)**: The compare output MAY include dimension flags (content, assignments, scope tags) without changing finding identity.
  - If dimension flags are present, they MUST be stored on the same finding record as evidence/flags; the system MUST NOT create separate findings per dimension.
  - `change_type` semantics remain compatible with v1 (dimensions refine the “different_version” class rather than multiplying identities).
- **FR-116v2-06 Capture/compare use the same pipeline**: Capture and compare MUST use the same policy state pipeline and hashing semantics; v2 must not introduce special-case compare paths.
- **FR-116v2-07 Coverage/fidelity guard**: If content hydration is incomplete for some types, compare MAY still run but must clearly indicate degraded fidelity and must follow registry-defined behavior for those types.
- **FR-116v2-08 No-legacy guarantee**: After v2 cutover, legacy compare/hash helpers are removed and CI guards prevent re-introduction.

## UI Action Matrix *(mandatory when Filament is changed)*

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|
| Baseline Profiles | Workspace admin | Create Baseline Profile | View action / record inspection (per Action Surface Contract) | Edit, Archive (confirmed) | None | “Create Baseline Profile” | Capture Baseline (compare is tenant-context) | Save, Cancel | Yes | Archive requires confirmation; capture starts OperationRuns and is audited |
| Baseline Capture Run Detail | Workspace admin | None | Linked from runs list | None | None | None | None | N/A | Yes | Shows effective scope + fidelity + counts + warnings |
| Baseline Compare Run Detail | Tenant-context admin | Run Compare (if shown), Re-run Compare (if allowed) | Linked from runs list | None | None | None | None | N/A | Yes | Shows coverage badge and warning banner; uncovered types emit no findings |
| Drift Findings Landing | Tenant-context admin | None | Table filter by change type | View (optional), Acknowledge/Resolve (if workflow exists) | None | None | None | N/A | Yes | Surfaces fidelity + coverage context; no destructive actions required for v1 |

### Key Entities *(include if feature involves data)*

- **Baseline profile**: Defines scope (policy types + opt-in foundations) and is the parent for baseline snapshots.
- **Baseline snapshot item**: Stores per-policy-subject baseline state evidence (hash, fidelity, source, observed timestamp).
- **Compare run**: A recorded operation that compares a tenant current state to a baseline snapshot, including effective scope and coverage warnings.
- **Finding**: A stable, recurring drift finding with lifecycle fields (first seen, last seen, times seen) and evidence (baseline/current hashes, fidelity).

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-116-01 One engine**: All baseline compare and capture runs use exactly one drift pipeline; no alternative compare paths exist in production code.
- **SC-116-02 Stable recurrence**: For a fixed baseline snapshot + tenant + policy subject + change type, repeated compares (including retries) produce at most one finding identity, and lifecycle counters increment at most once per run.
- **SC-116-03 Coverage safety**: When coverage is partial for any effective-scope type, the compare run is visibly marked as “completed with warnings” and produces zero findings for those uncovered types.
- **SC-116-04 Operator clarity**: On the compare run detail screen, operators can see effective scope counts, coverage status, and fidelity within one page load, with a clear warning banner when applicable.
- **SC-116-05 Performance guard (v1)**: Compare runs complete without per-item external hydration calls; runtime scales with number of in-scope subjects via chunking.