TenantAtlas/specs/116-baseline-drift-engine/research.md
2026-03-02 02:08:28 +01:00

6.4 KiB
Raw Blame History

Phase 0 — Research (Baseline Drift Engine)

This document resolves the open design/implementation questions needed to produce a concrete implementation plan for Spec 116, grounded in the current codebase.

Repo Reality Check (What already exists)

  • Baseline domain tables exist: baseline_profiles, baseline_snapshots, baseline_snapshot_items, baseline_tenant_assignments.
  • Baseline ops exist:
    • Capture: App\Services\Baselines\BaselineCaptureServiceApp\Jobs\CaptureBaselineSnapshotJob.
    • Compare: App\Services\Baselines\BaselineCompareServiceApp\Jobs\CompareBaselineToTenantJob.
  • Findings lifecycle primitives exist (times_seen/first_seen/last_seen) and recurrence support exists (findings.recurrence_key).
  • An existing recurrence-based drift generator exists: App\Services\Drift\DriftFindingGenerator (uses recurrence_key and also sets fingerprint = recurrence_key to satisfy the unique constraint).
  • Inventory sync is OperationRun-based and stamps inventory_items.last_seen_operation_run_id.

Decisions

1) Finding identity for baseline compare

Decision: Baseline compare findings MUST use a stable recurrence key derived from:

  • tenant_id
  • baseline_snapshot_id (not baseline profile id)
  • policy_type
  • subject_external_id
  • change_type

This recurrence key is stored in findings.recurrence_key and ALSO used as findings.fingerprint (to keep the existing unique constraint unique(tenant_id, fingerprint) effective).

Rationale:

  • Matches Spec 116 (identity tied to baseline_snapshot_id and independent of evidence hashes).
  • Aligns with existing, proven pattern in DriftFindingGenerator (recurrence_key-based upsert; fingerprint reused).

Alternatives considered:

  • Keep DriftHasher::fingerprint(...) with baseline/current hashes included → rejected because it changes identity when evidence changes (violates FR-116v1-09).
  • Add a new unique DB constraint on (tenant_id, recurrence_key) → possible later hardening; not required initially because fingerprint uniqueness already enforces dedupe when fingerprint = recurrence_key.

2) Scope key for baseline compare findings

Decision: Keep findings grouped by baseline profile using scope_key = baseline_profile:{baselineProfileId}.

Rationale:

  • Spec 116 requires snapshot-scoped identity (via baseline_snapshot_id in the recurrence key), but does not require snapshot-scoped grouping.
  • The repository already has UI widgets/stats and auto-close behavior keyed to baseline_profile:{id}; keeping scope_key stable minimizes churn and preserves existing semantics.
  • Re-captures still create new finding identities because the recurrence key includes baseline_snapshot_id.

Alternatives considered:

  • Snapshot-scoped scope_key = baseline_snapshot:{id} → rejected for v1 because it would require larger refactors to stats, widgets, and auto-close queries, without being mandated by the spec.

3) Coverage guard (prevent false missing policies)

Decision: Coverage MUST be derived from the latest completed inventory_sync OperationRun for the tenant:

  • Record per-policy-type processing outcomes into that runs context (coverage payload).
  • Baseline compare MUST compute uncovered_policy_types = effective_scope - covered_policy_types.
  • Baseline compare MUST emit no findings of any kind for uncovered policy types.
  • The compare OperationRun outcome should be partially_succeeded when uncovered types exist ("completed with warnings" in Ops UX), and summary counts should include errors_recorded = count(uncovered_policy_types).

Rationale:

  • Spec FR-116v1-07 and SC-116-03.
  • Current compare logic uses inventory_items.last_seen_operation_run_id filter only; without an explicit coverage list, a missing type looks identical to a truly empty tenant.

Alternatives considered:

  • Infer coverage purely from "were there any inventory items for this policy type in the last sync run" → rejected because a legitimately empty type would be indistinguishable from "not synced".

4) Inventory meta contract hashing (v1 fidelity=meta)

Decision: Introduce an explicit "Inventory Meta Contract" builder used by BOTH capture and compare in v1.

  • Inputs: policy_type, external_id, and a whitelist of stable signals from inventory/meta (etag, last modified, scope tags, assignment target count, version marker when available).
  • Output: a normalized associative array, hashed deterministically.

Rationale:

  • Spec FR-116v1-04: hashing must be based on a stable contract, not arbitrary meta.
  • Current BaselineSnapshotIdentity::hashItemContent() hashes the entire meta_jsonb (including keys like etag which may be noisy and keys that may expand over time).

Alternatives considered:

  • Keep current hashing of meta_jsonb → rejected because it is not an explicit contract and may drift as we add inventory metadata.

5) Baseline scope + foundations

Decision: Extend baseline scope JSON to include:

  • policy_types: [] (empty means default "all supported policy types excluding foundations")
  • foundation_types: [] (empty means default "none")

Foundations list must be derived from the same canonical foundation list used by inventory sync selection logic.

Rationale:

  • Spec FR-116v1-01.
  • Current BaselineScope only supports policy_types and treats empty as "all" (including foundations) which conflicts with the spec default.

6) v2 architecture strategy (content fidelity)

Decision: v2 is implemented as an extension of the same pipeline via a provider precedence chain:

PolicyVersion (if available) → Inventory content (if available) → Meta contract fallback (degraded)

The baseline compare engine stores dimension flags on the same finding (no additional finding identities).

Rationale:

  • Spec FR-116v2-01 and FR-116v2-05.
  • There is already a content-normalization + hashing stack in DriftFindingGenerator (policy snapshot / assignments / scope tags) which can inform the content fidelity provider.

Notes / Risks

  • Existing baseline compare findings are currently keyed by fingerprint that includes baseline/current hashes and uses scope_key = baseline_profile:{id}. The v1 migration should plan for “old findings become stale” behavior; do not attempt silent in-place identity rewriting without an explicit migration/backfill plan.
  • Coverage persistence must remain numeric-only in summary_counts (per Ops-UX). Detailed coverage lists belong in operation_runs.context.