TenantAtlas/specs/116-baseline-drift-engine/spec.md
2026-03-02 11:00:46 +01:00

21 KiB
Raw Blame History

Feature Specification: Baseline Drift Engine (Final Architecture)

Feature Branch: 116-baseline-drift-engine
Created: 2026-03-01
Status: Draft
Input: User description: "Spec 116 — Baseline Drift Engine (Final Architecture)"

Spec Scope Fields (mandatory)

  • Scope: workspace (baseline definition + capture) + tenant (baseline compare monitoring)
  • Primary Routes:
    • Workspace (admin): Baseline Profiles (create/edit scope, capture baseline)
    • Tenant-context (admin): Baseline Compare runs (compare now, run detail) and Drift Findings landing
  • Data Ownership:
    • Workspace-owned: Baseline profiles and baseline snapshots
    • Tenant-scoped (within a workspace): Operation runs for baseline capture/compare; drift findings produced by compare
    • Baseline snapshots are workspace-owned standards captured from a chosen tenant, but snapshot items MUST NOT persist tenant identifiers (e.g., no tenant_id column on snapshot items).
  • RBAC:
    • Workspace (Baselines):
      • workspace_baselines.view: view baseline profiles + snapshots
      • workspace_baselines.manage: create/edit/archive baseline profiles, start capture runs
    • Tenant (Compare):
      • tenant.sync: start baseline compare runs
      • tenant_findings.view: view drift findings
    • Tenant access is required for tenant-context surfaces, in addition to workspace membership

For canonical-view specs: not applicable (this is not a canonical-view feature).

Clarifications

Session 2026-03-01

  • Q: Should finding identity be stable across baseline re-captures, or tied to a specific baseline snapshot? → A: Tie finding identity to baseline_snapshot_id (stable within a snapshot; re-capture creates new finding identities).
  • Q: In v2, should drift dimensions be stored as flags on a single finding, or as separate findings per dimension? → A: Use one finding with dimension flags (no separate findings per dimension).
  • Q: When running a compare, which baseline snapshot should be used by default? → A: Default to the baseline profiles active_snapshot_id (updated only by successful captures); allow explicitly selecting a snapshot.
  • Q: When coverage is missing for a policy type, should compare emit any findings for that type? → A: Skip all finding emission for uncovered types (no missing_policy, no unexpected_policy, no different_version).

Outcomes

  • O-1 One engine: There is exactly one baseline drift compare engine; no parallel legacy compare/hash paths.
  • O-2 Stable findings (recurrence): The same underlying drift maps to the same finding identity across retries and across runs, with lifecycle counters.
  • O-3 Auditability & operator UX: Each compare run records scope, coverage, and fidelity; partial coverage produces warnings (not misleading “missing policy” noise).
  • O-4 No legacy logic after v2: After the v2 extension, there are no “meta compare here / diff there” special cases; all drift flows through the same pipeline.

Definitions

  • Subject key: A compare object identity independent of tenant, identified by (policy_type, external_id).
  • Tenant subject: A subject key within a tenant context, identified by (tenant_id, policy_type, external_id).
  • Policy state: A normalized representation of a tenant subject, containing a deterministic hash, fidelity, and observation metadata.
  • Fidelity:
    • meta: drift signal based on a stable “inventory meta contract” (signal-based fields)
    • content: drift signal based on canonicalized policy content (semantic)
  • Effective scope: The expanded set of policy types processed by a run.
  • Coverage: Which policy types are confirmed to be present/updated in the tenant current state at the time of compare.

Assumptions

  • Baseline drift is sold as “signal-based drift detection” in v1 (meta fidelity), and later upgraded to deep drift (content fidelity) without changing the compare engine semantics.
  • The system already has a tenant-scoped inventory sync mechanism capable of recording per-run coverage of which policy types were synced.
  • Foundations are treated as opt-in policy types; they are excluded unless explicitly selected.

User Scenarios & Testing (mandatory)

User Story 1 - Capture and compare a baseline with stable findings (Priority: P1)

As a workspace admin, I want to define a baseline scope, capture a baseline snapshot, and compare a tenant against that baseline, so I can reliably detect and track drift over time.

Why this priority: This is the core product slice that makes baseline drift sellable: consistent capture, consistent compare, and stable findings.

Independent Test: Can be tested by creating a baseline profile with a defined scope, capturing a snapshot, running compare twice, and verifying stable finding identity and lifecycle counters.

Acceptance Scenarios:

  1. Given a baseline profile with scope “all policy types (excluding foundations)”, When I capture a baseline snapshot, Then the snapshot contains only in-scope policy subjects and each snapshot item records its hash and fidelity.
  2. Given a captured baseline snapshot and a tenant current state, When I run compare twice with the same inputs, Then the same drift maps to the same finding identity and lifecycle counters increment at most once per run.

User Story 2 - Coverage warnings prevent misleading missing-policy findings (Priority: P1)

As an operator, I want the compare run to warn when current-state coverage is partial, so that missing policies are not falsely reported when the system simply lacks data.

Why this priority: Trust depends on avoiding false negatives/positives; “missing policy” findings on partial sync is unacceptable noise.

Independent Test: Can be tested by running compare with an effective scope where some policy types are intentionally marked as not synced, verifying warning outcome and suppression behavior.

Acceptance Scenarios:

  1. Given a compare run where some policy types in effective scope were not synced, When compare is executed, Then the run completes with warnings and produces no findings at all for those missing-coverage types.
  2. Given a compare run where coverage is complete, When a baseline policy subject is missing in current state for a covered type, Then a missing-policy finding is produced.

User Story 3 - Operators can understand scope, coverage, and fidelity in the UI (Priority: P2)

As an operator, I want drift screens to clearly show what was compared (scope), how complete the data was (coverage), and how “deep” the drift signal is (fidelity), so I can interpret findings correctly.

Why this priority: Drift findings are only actionable when the operator understands context and limitations.

Independent Test: Can be tested by executing a compare run with and without coverage warnings, verifying that run detail and drift landing surfaces render scope counts, coverage badge, and fidelity indicators.

Acceptance Scenarios:

  1. Given a compare run with full coverage, When I open run detail, Then I see the compared scope and a coverage status of OK.
  2. Given a compare run with partial coverage, When I open the drift landing and run detail, Then I see a warning banner and can see which types were missing coverage.

Edge Cases

  • Compare is retried after a transient failure: findings are not duplicated; lifecycle increments happen at most once per run identity.
  • Baseline capture is executed with empty scope lists (interpreted as default semantics): policy types means “all supported types excluding foundations”; foundations list means “none”.
  • Effective scope expands to zero types (e.g., no supported types): run completes with an explicit warning and produces no findings.
  • Policy subjects appear/disappear between inventory sync and compare: handled according to coverage rules; does not create missing-policy noise for uncovered types.
  • Two different policy subjects accidentally share an external identifier across types: identity is still unambiguous because policy_type is part of the subject key.

Requirements (mandatory)

This feature introduces/extends long-running compare work and uses OperationRun for capture and compare runs. It must comply with:

  • Run observability: Every capture/compare run must have a visible run identity, scope context, coverage context, and outcome.
  • Safety: Compare must never claim missing policies for policy types where current-state coverage is not proven.
  • Tenant isolation: Inventory items, operation runs, and findings are tenant-scoped; cross-tenant access must be deny-as-not-found. Baseline profiles/snapshots are workspace-owned and must not persist tenant identifiers.

Operational UX Contract (Ops-UX)

  • Capture and compare run lifecycle transitions are service-owned (not UI-owned).
  • Run summaries provide numeric-only counters using ONLY keys from app/Support/OpsUx/OperationSummaryKeys.php.
    • Coverage warnings MUST be represented using an existing canonical numeric key (default: errors_recorded).
  • Warning semantics mapping (canonical):
    • Any “completed with warnings” case MUST be represented as OperationRun.outcome = partially_succeeded.
    • summary_counts.errors_recorded MUST be a numeric indicator of warning magnitude.
      • Default: number of uncovered policy types in effective scope.
      • Edge case (effective scope expands to zero types): summary_counts.errors_recorded = 1 so the warning remains visible under the numeric-only summary_counts contract.
  • Scheduled/system-initiated runs (if any) must not generate user terminal DB notifications; audit is handled via monitoring surfaces.
  • Regression guard tests are added/updated to enforce correct run outcome semantics and summary counter rules.

Authorization Contract (RBAC-UX)

  • Workspace membership + capability gates:
    • workspace_baselines.view is required to view baseline profiles and snapshots.
    • workspace_baselines.manage is required to create/edit/archive baseline profiles and start capture runs.
    • tenant.sync is required to start compare runs.
    • tenant_findings.view is required to view drift findings.
  • 404 vs 403 semantics:
    • Non-member or not entitled to workspace/tenant scope → 404 (deny-as-not-found)
    • Member but missing capability → 403
  • Destructive-like actions (e.g., archiving a baseline profile) require an explicit confirmation step.
  • At least one positive and one negative authorization test exist for each mutation surface.

Functional Requirements

v1 — Meta-fidelity baseline compare (sellable)

  • FR-116v1-01 Baseline profile scope: Baseline profiles MUST store a scope object with policy_types and foundation_types lists.
    • Default semantics: policy_types = [] means all supported policy types excluding foundations; foundation_types = [] means no foundations.
    • Foundations MUST only be included when explicitly selected.
  • FR-116v1-02 UI scope picker: The UI MUST provide multi-select controls for Policy Types and Foundations and communicate the default semantics (empty selection = default behavior).
  • FR-116v1-03 Effective scope recorded on runs: Capture and compare runs MUST record expanded effective scope in run context:
    • effective_scope.policy_types[], effective_scope.foundation_types[], effective_scope.all_types[], and a boolean effective_scope.foundations_included.
  • FR-116v1-04 Inventory meta contract: The system MUST define and persist a stable “inventory meta contract” (signal-based fields) for drift hashing.
    • Minimum required signals: type identifier, version marker (when available), last modified time (when available), scope tags (when available), and assignment target count (when available).
    • Drift hashing for v1 MUST be based only on this contract (not arbitrary meta fields).
    • Contract outputs MUST be versioned so future additions do not retroactively change v1 semantics (e.g., meta_contract.version = 1).
    • For baseline snapshot items, the exact contract payload used for hashing MUST be persisted in the snapshot item meta_jsonb (e.g., meta_jsonb.meta_contract).
  • FR-116v1-05 Provide current-state policy states (meta fidelity): For all policy subjects in effective scope, the system MUST produce a normalized policy state for compare, including:
    • subject key (policy type + external id), deterministic hash, fidelity=meta, source indicator, and observed timestamp.
    • In v1, observed_at MUST be derived from persisted inventory evidence (inventory_items.last_seen_at), not from per-item external hydration calls during compare.
    • In v1, source MUST indicate the meta-fidelity source (e.g., inventory_meta_contract:v1) and MAY include stable provenance (e.g., inventory_items.last_seen_operation_run_id) for traceability.
  • FR-116v1-06 Baseline capture stores states (not raw): Baseline capture MUST store per-subject snapshot items that include the subject identity and the captured hash + fidelity + source + observed timestamp.
    • Baseline snapshots MUST NOT contain out-of-scope items.
    • Snapshot items MUST store observation metadata in baseline_snapshot_items.meta_jsonb (at minimum: fidelity, source, observed_at; when available: observed_operation_run_id).
  • FR-116v1-06a Compare snapshot selection: Baseline compare MUST, by default, use the latest successful baseline snapshot of the selected baseline profile.
    • Definition (v1): “latest successful baseline snapshot” is baseline_profiles.active_snapshot_id (updated only after a successful capture run persists a snapshot + items).
    • If active_snapshot_id is null, compare start MUST be blocked with a clear precondition failure (no implicit “pick the newest captured_at” fallback).
    • The UI MAY allow selecting a specific snapshot explicitly for historical comparisons.
  • FR-116v1-07 Coverage guard: Compare MUST check current-state coverage recorded by the most recent inventory sync run.
    • If effective scope contains policy types not present in coverage, the compare run MUST complete with warnings.
    • For any uncovered policy type, the compare MUST NOT emit findings of any kind for that type (no missing_policy, no unexpected_policy, no different_version).
    • Drift findings for types with proven coverage may still be produced.
    • If there is no completed inventory sync run (or coverage proof is missing/unreadable), coverage MUST be treated as unproven for all types and the compare MUST produce zero findings (fail-safe) and complete with warnings.
  • FR-116v1-08 Drift rules: Compare MUST produce drift results per policy subject:
    • Baseline-only → missing_policy (only when coverage is proven for the subjects type)
    • Current-only → unexpected_policy
    • Both present and hashes differ → different_version (with fidelity=meta)
  • FR-116v1-09 Stable finding identity: Findings MUST have a stable identity key derived from: tenant, baseline snapshot, policy type, external id, and change type.
    • Hashes are evidence fields and may update without changing identity.
    • Finding identity MUST be tied to a specific baseline snapshot (re-capture creates a new baseline snapshot and therefore new finding identities).
  • FR-116v1-10 Finding lifecycle + retry idempotency: Findings MUST record first seen, last seen, and times seen.
    • For a given run identity, lifecycle counters MUST not increment more than once.
  • FR-116v1-11 Auditability: Each capture and compare run MUST write an audit trail including effective scope counts, coverage warning summary (if any), and finding counts per change type.
  • Audit trail storage (canonical):
    • Aggregations that do not fit summary_counts MUST be stored in operation_runs.context (not new summary keys).
    • Compare MUST store per-change-type counts in run context under findings.counts_by_change_type (e.g., keys: missing_policy, unexpected_policy, different_version).
    • For this repository, the canonical audit trail is the operation_runs record itself (status/outcome + context + numeric summary_counts); do not introduce parallel “audit summary” persistence for the same data.
  • FR-116v1-12 Drift UI context: Compare run detail and drift landing MUST surface scope, coverage status, and fidelity (meta-based drift) and show a warning banner when coverage warnings were present.

v2 — Content-fidelity extension (deep drift, same engine)

Deferred / out of scope for this delivery: The v2 requirements below are intentionally not covered by specs/116-baseline-drift-engine/tasks.md and will be implemented in a follow-up spec/milestone.

  • FR-116v2-01 Provider precedence: Current state MUST be sourced with a precedence chain per policy type: “policy version (if available) → inventory content (if available) → meta fallback (explicitly marked degraded)”.
  • FR-116v2-02 Content hash availability: The inventory system MUST persist a content hash and capture timestamp for hydrated policy content.
  • FR-116v2-03 Quota-aware hydration: Content hydration MUST be throttling-safe and resumable, with explicit per-run caps and concurrency limits, and must record hydration coverage in run context.
  • FR-116v2-04 Content normalization rules: The system MUST define canonicalization rules per policy type, including volatile-field removal and (where needed) redaction hooks.
  • FR-116v2-05 Drift dimensions (optional but final): The compare output MAY include dimension flags (content, assignments, scope tags) without changing finding identity.
    • If dimension flags are present, they MUST be stored on the same finding record as evidence/flags; the system MUST NOT create separate findings per dimension.
    • change_type semantics remain compatible with v1 (dimensions refine the “different_version” class rather than multiplying identities).
  • FR-116v2-06 Capture/compare use the same pipeline: Capture and compare MUST use the same policy state pipeline and hashing semantics; v2 must not introduce special-case compare paths.
  • FR-116v2-07 Coverage/fidelity guard: If content hydration is incomplete for some types, compare MAY still run but must clearly indicate degraded fidelity and must follow registry-defined behavior for those types.
  • FR-116v2-08 No-legacy guarantee: After v2 cutover, legacy compare/hash helpers are removed and CI guards prevent re-introduction.

UI Action Matrix (mandatory when Filament is changed)

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions | |---|---|---|---|---|---|---|---|---|---| | Baseline Profiles | Workspace admin | Create Baseline Profile | View action / record inspection (per Action Surface Contract) | Edit, Archive (confirmed) | None | “Create Baseline Profile” | Capture Baseline (compare is tenant-context) | Save, Cancel | Yes | Archive requires confirmation; capture starts OperationRuns and is audited | | Baseline Capture Run Detail | Workspace admin | None | Linked from runs list | None | None | None | None | N/A | Yes | Shows effective scope + fidelity + counts + warnings | | Baseline Compare Run Detail | Tenant-context admin | Run Compare (if shown), Re-run Compare (if allowed) | Linked from runs list | None | None | None | None | N/A | Yes | Shows coverage badge and warning banner; uncovered types emit no findings | | Drift Findings Landing | Tenant-context admin | None | Table filter by change type | View (optional), Acknowledge/Resolve (if workflow exists) | None | None | None | N/A | Yes | Surfaces fidelity + coverage context; no destructive actions required for v1 |

Key Entities (include if feature involves data)

  • Baseline profile: Defines scope (policy types + opt-in foundations) and is the parent for baseline snapshots.
  • Baseline snapshot item: Stores per-policy-subject baseline state evidence (hash, fidelity, source, observed timestamp).
  • Compare run: A recorded operation that compares a tenant current state to a baseline snapshot, including effective scope and coverage warnings.
  • Finding: A stable, recurring drift finding with lifecycle fields (first seen, last seen, times seen) and evidence (baseline/current hashes, fidelity).

Success Criteria (mandatory)

Measurable Outcomes

  • SC-116-01 One engine: All baseline compare and capture runs use exactly one drift pipeline; no alternative compare paths exist in production code.
  • SC-116-02 Stable recurrence: For a fixed baseline snapshot + tenant + policy subject + change type, repeated compares (including retries) produce at most one finding identity, and lifecycle counters increment at most once per run.
  • SC-116-03 Coverage safety: When coverage is partial for any effective-scope type, the compare run is visibly marked as “completed with warnings” and produces zero findings for those uncovered types.
  • SC-116-04 Operator clarity: On the compare run detail screen, operators can see effective scope counts, coverage status, and fidelity within one page load, with a clear warning banner when applicable.
  • SC-116-05 Performance guard (v1): Compare runs complete without per-item external hydration calls; runtime scales with number of in-scope subjects via chunking.