# Feature Specification: Baseline Drift Engine (Final Architecture) **Feature Branch**: `116-baseline-drift-engine` **Created**: 2026-03-01 **Status**: Draft **Input**: User description: "Spec 116 — Baseline Drift Engine (Final Architecture)" ## Spec Scope Fields *(mandatory)* - **Scope**: workspace (baseline definition + capture) + tenant (baseline compare monitoring) - **Primary Routes**: - Workspace (admin): Baseline Profiles (create/edit scope, capture baseline) - Tenant-context (admin): Baseline Compare runs (compare now, run detail) and Drift Findings landing - **Data Ownership**: - Workspace-owned: Baseline profiles and baseline snapshots - Tenant-scoped (within a workspace): Operation runs for baseline capture/compare; drift findings produced by compare - Baseline snapshots are workspace-owned standards captured from a chosen tenant, but snapshot items MUST NOT persist tenant identifiers (e.g., no `tenant_id` column on snapshot items). - **RBAC**: - Workspace (Baselines): - `workspace_baselines.view`: view baseline profiles + snapshots - `workspace_baselines.manage`: create/edit/archive baseline profiles, start capture runs - Tenant (Compare): - `tenant.sync`: start baseline compare runs - `tenant_findings.view`: view drift findings - Tenant access is required for tenant-context surfaces, in addition to workspace membership For canonical-view specs: not applicable (this is not a canonical-view feature). ## Clarifications ### Session 2026-03-01 - Q: Should finding identity be stable across baseline re-captures, or tied to a specific baseline snapshot? → A: Tie finding identity to `baseline_snapshot_id` (stable within a snapshot; re-capture creates new finding identities). - Q: In v2, should drift dimensions be stored as flags on a single finding, or as separate findings per dimension? → A: Use one finding with dimension flags (no separate findings per dimension). - Q: When running a compare, which baseline snapshot should be used by default? → A: Default to the baseline profile’s `active_snapshot_id` (updated only by successful captures); allow explicitly selecting a snapshot. - Q: When coverage is missing for a policy type, should compare emit any findings for that type? → A: Skip all finding emission for uncovered types (no `missing_policy`, no `unexpected_policy`, no `different_version`). ## Outcomes - **O-1 One engine**: There is exactly one baseline drift compare engine; no parallel legacy compare/hash paths. - **O-2 Stable findings (recurrence)**: The same underlying drift maps to the same finding identity across retries and across runs, with lifecycle counters. - **O-3 Auditability & operator UX**: Each compare run records scope, coverage, and fidelity; partial coverage produces warnings (not misleading “missing policy” noise). - **O-4 No legacy logic after v2**: After the v2 extension, there are no “meta compare here / diff there” special cases; all drift flows through the same pipeline. ## Definitions - **Subject key**: A compare object identity independent of tenant, identified by `(policy_type, external_id)`. - **Tenant subject**: A subject key within a tenant context, identified by `(tenant_id, policy_type, external_id)`. - **Policy state**: A normalized representation of a tenant subject, containing a deterministic hash, fidelity, and observation metadata. - **Fidelity**: - **meta**: drift signal based on a stable “inventory meta contract” (signal-based fields) - **content**: drift signal based on canonicalized policy content (semantic) - **Effective scope**: The expanded set of policy types processed by a run. - **Coverage**: Which policy types are confirmed to be present/updated in the tenant current state at the time of compare. ## Assumptions - Baseline drift is sold as “signal-based drift detection” in v1 (meta fidelity), and later upgraded to deep drift (content fidelity) without changing the compare engine semantics. - The system already has a tenant-scoped inventory sync mechanism capable of recording per-run coverage of which policy types were synced. - Foundations are treated as opt-in policy types; they are excluded unless explicitly selected. ## User Scenarios & Testing *(mandatory)* ### User Story 1 - Capture and compare a baseline with stable findings (Priority: P1) As a workspace admin, I want to define a baseline scope, capture a baseline snapshot, and compare a tenant against that baseline, so I can reliably detect and track drift over time. **Why this priority**: This is the core product slice that makes baseline drift sellable: consistent capture, consistent compare, and stable findings. **Independent Test**: Can be tested by creating a baseline profile with a defined scope, capturing a snapshot, running compare twice, and verifying stable finding identity and lifecycle counters. **Acceptance Scenarios**: 1. **Given** a baseline profile with scope “all policy types (excluding foundations)”, **When** I capture a baseline snapshot, **Then** the snapshot contains only in-scope policy subjects and each snapshot item records its hash and fidelity. 2. **Given** a captured baseline snapshot and a tenant current state, **When** I run compare twice with the same inputs, **Then** the same drift maps to the same finding identity and lifecycle counters increment at most once per run. --- ### User Story 2 - Coverage warnings prevent misleading missing-policy findings (Priority: P1) As an operator, I want the compare run to warn when current-state coverage is partial, so that missing policies are not falsely reported when the system simply lacks data. **Why this priority**: Trust depends on avoiding false negatives/positives; “missing policy” findings on partial sync is unacceptable noise. **Independent Test**: Can be tested by running compare with an effective scope where some policy types are intentionally marked as not synced, verifying warning outcome and suppression behavior. **Acceptance Scenarios**: 1. **Given** a compare run where some policy types in effective scope were not synced, **When** compare is executed, **Then** the run completes with warnings and produces no findings at all for those missing-coverage types. 2. **Given** a compare run where coverage is complete, **When** a baseline policy subject is missing in current state for a covered type, **Then** a missing-policy finding is produced. --- ### User Story 3 - Operators can understand scope, coverage, and fidelity in the UI (Priority: P2) As an operator, I want drift screens to clearly show what was compared (scope), how complete the data was (coverage), and how “deep” the drift signal is (fidelity), so I can interpret findings correctly. **Why this priority**: Drift findings are only actionable when the operator understands context and limitations. **Independent Test**: Can be tested by executing a compare run with and without coverage warnings, verifying that run detail and drift landing surfaces render scope counts, coverage badge, and fidelity indicators. **Acceptance Scenarios**: 1. **Given** a compare run with full coverage, **When** I open run detail, **Then** I see the compared scope and a coverage status of OK. 2. **Given** a compare run with partial coverage, **When** I open the drift landing and run detail, **Then** I see a warning banner and can see which types were missing coverage. ### Edge Cases - Compare is retried after a transient failure: findings are not duplicated; lifecycle increments happen at most once per run identity. - Baseline capture is executed with empty scope lists (interpreted as default semantics): policy types means “all supported types excluding foundations”; foundations list means “none”. - Effective scope expands to zero types (e.g., no supported types): run completes with an explicit warning and produces no findings. - Policy subjects appear/disappear between inventory sync and compare: handled according to coverage rules; does not create missing-policy noise for uncovered types. - Two different policy subjects accidentally share an external identifier across types: identity is still unambiguous because `policy_type` is part of the subject key. ## Requirements *(mandatory)* This feature introduces/extends long-running compare work and uses `OperationRun` for capture and compare runs. It must comply with: - **Run observability**: Every capture/compare run must have a visible run identity, scope context, coverage context, and outcome. - **Safety**: Compare must never claim missing policies for policy types where current-state coverage is not proven. - **Tenant isolation**: Inventory items, operation runs, and findings are tenant-scoped; cross-tenant access must be deny-as-not-found. Baseline profiles/snapshots are workspace-owned and must not persist tenant identifiers. ### Operational UX Contract (Ops-UX) - Capture and compare run lifecycle transitions are service-owned (not UI-owned). - Run summaries provide numeric-only counters using ONLY keys from `app/Support/OpsUx/OperationSummaryKeys.php`. - Coverage warnings MUST be represented using an existing canonical numeric key (default: `errors_recorded`). - Warning semantics mapping (canonical): - Any “completed with warnings” case MUST be represented as `OperationRun.outcome = partially_succeeded`. - `summary_counts.errors_recorded` MUST be a numeric indicator of warning magnitude. - Default: number of uncovered policy types in effective scope. - Edge case (effective scope expands to zero types): `summary_counts.errors_recorded = 1` so the warning remains visible under the numeric-only summary_counts contract. - Scheduled/system-initiated runs (if any) must not generate user terminal DB notifications; audit is handled via monitoring surfaces. - Regression guard tests are added/updated to enforce correct run outcome semantics and summary counter rules. ### Authorization Contract (RBAC-UX) - Workspace membership + capability gates: - `workspace_baselines.view` is required to view baseline profiles and snapshots. - `workspace_baselines.manage` is required to create/edit/archive baseline profiles and start capture runs. - `tenant.sync` is required to start compare runs. - `tenant_findings.view` is required to view drift findings. - 404 vs 403 semantics: - Non-member or not entitled to workspace/tenant scope → 404 (deny-as-not-found) - Member but missing capability → 403 - Destructive-like actions (e.g., archiving a baseline profile) require an explicit confirmation step. - At least one positive and one negative authorization test exist for each mutation surface. ### Functional Requirements #### v1 — Meta-fidelity baseline compare (sellable) - **FR-116v1-01 Baseline profile scope**: Baseline profiles MUST store a scope object with `policy_types` and `foundation_types` lists. - Default semantics: `policy_types = []` means all supported policy types excluding foundations; `foundation_types = []` means no foundations. - Foundations MUST only be included when explicitly selected. - **FR-116v1-02 UI scope picker**: The UI MUST provide multi-select controls for Policy Types and Foundations and communicate the default semantics (empty selection = default behavior). - **FR-116v1-03 Effective scope recorded on runs**: Capture and compare runs MUST record expanded effective scope in run context: - `effective_scope.policy_types[]`, `effective_scope.foundation_types[]`, `effective_scope.all_types[]`, and a boolean `effective_scope.foundations_included`. - **FR-116v1-04 Inventory meta contract**: The system MUST define and persist a stable “inventory meta contract” (signal-based fields) for drift hashing. - Minimum required signals: type identifier, version marker (when available), last modified time (when available), scope tags (when available), and assignment target count (when available). - Drift hashing for v1 MUST be based only on this contract (not arbitrary meta fields). - Contract outputs MUST be versioned so future additions do not retroactively change v1 semantics (e.g., `meta_contract.version = 1`). - For baseline snapshot items, the exact contract payload used for hashing MUST be persisted in the snapshot item `meta_jsonb` (e.g., `meta_jsonb.meta_contract`). - **FR-116v1-05 Provide current-state policy states (meta fidelity)**: For all policy subjects in effective scope, the system MUST produce a normalized policy state for compare, including: - subject key (policy type + external id), deterministic hash, fidelity=`meta`, source indicator, and observed timestamp. - In v1, `observed_at` MUST be derived from persisted inventory evidence (`inventory_items.last_seen_at`), not from per-item external hydration calls during compare. - In v1, `source` MUST indicate the meta-fidelity source (e.g., `inventory_meta_contract:v1`) and MAY include stable provenance (e.g., `inventory_items.last_seen_operation_run_id`) for traceability. - **FR-116v1-06 Baseline capture stores states (not raw)**: Baseline capture MUST store per-subject snapshot items that include the subject identity and the captured hash + fidelity + source + observed timestamp. - Baseline snapshots MUST NOT contain out-of-scope items. - Snapshot items MUST store observation metadata in `baseline_snapshot_items.meta_jsonb` (at minimum: `fidelity`, `source`, `observed_at`; when available: `observed_operation_run_id`). - **FR-116v1-06a Compare snapshot selection**: Baseline compare MUST, by default, use the latest successful baseline snapshot of the selected baseline profile. - Definition (v1): “latest successful baseline snapshot” is `baseline_profiles.active_snapshot_id` (updated only after a successful capture run persists a snapshot + items). - If `active_snapshot_id` is `null`, compare start MUST be blocked with a clear precondition failure (no implicit “pick the newest captured_at” fallback). - The UI MAY allow selecting a specific snapshot explicitly for historical comparisons. - **FR-116v1-07 Coverage guard**: Compare MUST check current-state coverage recorded by the most recent inventory sync run. - If effective scope contains policy types not present in coverage, the compare run MUST complete with warnings. - For any uncovered policy type, the compare MUST NOT emit findings of any kind for that type (no `missing_policy`, no `unexpected_policy`, no `different_version`). - Drift findings for types with proven coverage may still be produced. - If there is no completed inventory sync run (or coverage proof is missing/unreadable), coverage MUST be treated as unproven for all types and the compare MUST produce zero findings (fail-safe) and complete with warnings. - **FR-116v1-08 Drift rules**: Compare MUST produce drift results per policy subject: - Baseline-only → `missing_policy` (only when coverage is proven for the subject’s type) - Current-only → `unexpected_policy` - Both present and hashes differ → `different_version` (with fidelity=`meta`) - **FR-116v1-09 Stable finding identity**: Findings MUST have a stable identity key derived from: tenant, baseline snapshot, policy type, external id, and change type. - Hashes are evidence fields and may update without changing identity. - Finding identity MUST be tied to a specific baseline snapshot (re-capture creates a new baseline snapshot and therefore new finding identities). - **FR-116v1-10 Finding lifecycle + retry idempotency**: Findings MUST record first seen, last seen, and times seen. - For a given run identity, lifecycle counters MUST not increment more than once. - **FR-116v1-11 Auditability**: Each capture and compare run MUST write an audit trail including effective scope counts, coverage warning summary (if any), and finding counts per change type. - Audit trail storage (canonical): - Aggregations that do not fit `summary_counts` MUST be stored in `operation_runs.context` (not new summary keys). - Compare MUST store per-change-type counts in run context under `findings.counts_by_change_type` (e.g., keys: `missing_policy`, `unexpected_policy`, `different_version`). - For this repository, the canonical audit trail is the `operation_runs` record itself (status/outcome + context + numeric summary_counts); do not introduce parallel “audit summary” persistence for the same data. - **FR-116v1-12 Drift UI context**: Compare run detail and drift landing MUST surface scope, coverage status, and fidelity (meta-based drift) and show a warning banner when coverage warnings were present. #### v2 — Content-fidelity extension (deep drift, same engine) **Deferred / out of scope for this delivery**: The v2 requirements below are intentionally not covered by `specs/116-baseline-drift-engine/tasks.md` and will be implemented in a follow-up spec/milestone. - **FR-116v2-01 Provider precedence**: Current state MUST be sourced with a precedence chain per policy type: “policy version (if available) → inventory content (if available) → meta fallback (explicitly marked degraded)”. - **FR-116v2-02 Content hash availability**: The inventory system MUST persist a content hash and capture timestamp for hydrated policy content. - **FR-116v2-03 Quota-aware hydration**: Content hydration MUST be throttling-safe and resumable, with explicit per-run caps and concurrency limits, and must record hydration coverage in run context. - **FR-116v2-04 Content normalization rules**: The system MUST define canonicalization rules per policy type, including volatile-field removal and (where needed) redaction hooks. - **FR-116v2-05 Drift dimensions (optional but final)**: The compare output MAY include dimension flags (content, assignments, scope tags) without changing finding identity. - If dimension flags are present, they MUST be stored on the same finding record as evidence/flags; the system MUST NOT create separate findings per dimension. - `change_type` semantics remain compatible with v1 (dimensions refine the “different_version” class rather than multiplying identities). - **FR-116v2-06 Capture/compare use the same pipeline**: Capture and compare MUST use the same policy state pipeline and hashing semantics; v2 must not introduce special-case compare paths. - **FR-116v2-07 Coverage/fidelity guard**: If content hydration is incomplete for some types, compare MAY still run but must clearly indicate degraded fidelity and must follow registry-defined behavior for those types. - **FR-116v2-08 No-legacy guarantee**: After v2 cutover, legacy compare/hash helpers are removed and CI guards prevent re-introduction. ## UI Action Matrix *(mandatory when Filament is changed)* | Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions | |---|---|---|---|---|---|---|---|---|---| | Baseline Profiles | Workspace admin | Create Baseline Profile | View action / record inspection (per Action Surface Contract) | Edit, Archive (confirmed) | None | “Create Baseline Profile” | Capture Baseline (compare is tenant-context) | Save, Cancel | Yes | Archive requires confirmation; capture starts OperationRuns and is audited | | Baseline Capture Run Detail | Workspace admin | None | Linked from runs list | None | None | None | None | N/A | Yes | Shows effective scope + fidelity + counts + warnings | | Baseline Compare Run Detail | Tenant-context admin | Run Compare (if shown), Re-run Compare (if allowed) | Linked from runs list | None | None | None | None | N/A | Yes | Shows coverage badge and warning banner; uncovered types emit no findings | | Drift Findings Landing | Tenant-context admin | None | Table filter by change type | View (optional), Acknowledge/Resolve (if workflow exists) | None | None | None | N/A | Yes | Surfaces fidelity + coverage context; no destructive actions required for v1 | ### Key Entities *(include if feature involves data)* - **Baseline profile**: Defines scope (policy types + opt-in foundations) and is the parent for baseline snapshots. - **Baseline snapshot item**: Stores per-policy-subject baseline state evidence (hash, fidelity, source, observed timestamp). - **Compare run**: A recorded operation that compares a tenant current state to a baseline snapshot, including effective scope and coverage warnings. - **Finding**: A stable, recurring drift finding with lifecycle fields (first seen, last seen, times seen) and evidence (baseline/current hashes, fidelity). ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-116-01 One engine**: All baseline compare and capture runs use exactly one drift pipeline; no alternative compare paths exist in production code. - **SC-116-02 Stable recurrence**: For a fixed baseline snapshot + tenant + policy subject + change type, repeated compares (including retries) produce at most one finding identity, and lifecycle counters increment at most once per run. - **SC-116-03 Coverage safety**: When coverage is partial for any effective-scope type, the compare run is visibly marked as “completed with warnings” and produces zero findings for those uncovered types. - **SC-116-04 Operator clarity**: On the compare run detail screen, operators can see effective scope counts, coverage status, and fidelity within one page load, with a clear warning banner when applicable. - **SC-116-05 Performance guard (v1)**: Compare runs complete without per-item external hydration calls; runtime scales with number of in-scope subjects via chunking.