Ahmed Darrazi add136cc3c spec(116): baseline drift engine specs

2026-03-02 02:08:28 +01:00

6.4 KiB

Raw Blame History

Phase 0 — Research (Baseline Drift Engine)

This document resolves the open design/implementation questions needed to produce a concrete implementation plan for Spec 116, grounded in the current codebase.

Repo Reality Check (What already exists)

Baseline domain tables exist: baseline_profiles, baseline_snapshots, baseline_snapshot_items, baseline_tenant_assignments.
Baseline ops exist:
- Capture: App\Services\Baselines\BaselineCaptureService → App\Jobs\CaptureBaselineSnapshotJob.
- Compare: App\Services\Baselines\BaselineCompareService → App\Jobs\CompareBaselineToTenantJob.
Findings lifecycle primitives exist (times_seen/first_seen/last_seen) and recurrence support exists (findings.recurrence_key).
An existing recurrence-based drift generator exists: App\Services\Drift\DriftFindingGenerator (uses recurrence_key and also sets fingerprint = recurrence_key to satisfy the unique constraint).
Inventory sync is OperationRun-based and stamps inventory_items.last_seen_operation_run_id.

Decisions

1) Finding identity for baseline compare

Decision: Baseline compare findings MUST use a stable recurrence key derived from:

tenant_id
baseline_snapshot_id (not baseline profile id)
policy_type
subject_external_id
change_type

This recurrence key is stored in findings.recurrence_key and ALSO used as findings.fingerprint (to keep the existing unique constraint unique(tenant_id, fingerprint) effective).

Rationale:

Matches Spec 116 (identity tied to baseline_snapshot_id and independent of evidence hashes).
Aligns with existing, proven pattern in DriftFindingGenerator (recurrence_key-based upsert; fingerprint reused).

Alternatives considered:

Keep DriftHasher::fingerprint(...) with baseline/current hashes included → rejected because it changes identity when evidence changes (violates FR-116v1-09).
Add a new unique DB constraint on (tenant_id, recurrence_key) → possible later hardening; not required initially because fingerprint uniqueness already enforces dedupe when fingerprint = recurrence_key.

2) Scope key for baseline compare findings

Decision: Keep findings grouped by baseline profile using scope_key = baseline_profile:{baselineProfileId}.

Rationale:

Spec 116 requires snapshot-scoped identity (via baseline_snapshot_id in the recurrence key), but does not require snapshot-scoped grouping.
The repository already has UI widgets/stats and auto-close behavior keyed to baseline_profile:{id}; keeping scope_key stable minimizes churn and preserves existing semantics.
Re-captures still create new finding identities because the recurrence key includes baseline_snapshot_id.

Alternatives considered:

Snapshot-scoped scope_key = baseline_snapshot:{id} → rejected for v1 because it would require larger refactors to stats, widgets, and auto-close queries, without being mandated by the spec.

3) Coverage guard (prevent false missing policies)

Decision: Coverage MUST be derived from the latest completed inventory_sync OperationRun for the tenant:

Record per-policy-type processing outcomes into that run’s context (coverage payload).
Baseline compare MUST compute uncovered_policy_types = effective_scope - covered_policy_types.
Baseline compare MUST emit no findings of any kind for uncovered policy types.
The compare OperationRun outcome should be partially_succeeded when uncovered types exist ("completed with warnings" in Ops UX), and summary counts should include errors_recorded = count(uncovered_policy_types).

Rationale:

Spec FR-116v1-07 and SC-116-03.
Current compare logic uses inventory_items.last_seen_operation_run_id filter only; without an explicit coverage list, a missing type looks identical to a truly empty tenant.

Alternatives considered:

Infer coverage purely from "were there any inventory items for this policy type in the last sync run" → rejected because a legitimately empty type would be indistinguishable from "not synced".

4) Inventory meta contract hashing (v1 fidelity=meta)

Decision: Introduce an explicit "Inventory Meta Contract" builder used by BOTH capture and compare in v1.

Inputs: policy_type, external_id, and a whitelist of stable signals from inventory/meta (etag, last modified, scope tags, assignment target count, version marker when available).
Output: a normalized associative array, hashed deterministically.

Rationale:

Spec FR-116v1-04: hashing must be based on a stable contract, not arbitrary meta.
Current BaselineSnapshotIdentity::hashItemContent() hashes the entire meta_jsonb (including keys like etag which may be noisy and keys that may expand over time).

Alternatives considered:

Keep current hashing of meta_jsonb → rejected because it is not an explicit contract and may drift as we add inventory metadata.

5) Baseline scope + foundations

Decision: Extend baseline scope JSON to include:

policy_types: [] (empty means default "all supported policy types excluding foundations")
foundation_types: [] (empty means default "none")

Foundations list must be derived from the same canonical foundation list used by inventory sync selection logic.

Rationale:

Spec FR-116v1-01.
Current BaselineScope only supports policy_types and treats empty as "all" (including foundations) which conflicts with the spec default.

6) v2 architecture strategy (content fidelity)

Decision: v2 is implemented as an extension of the same pipeline via a provider precedence chain:

PolicyVersion (if available) → Inventory content (if available) → Meta contract fallback (degraded)

The baseline compare engine stores dimension flags on the same finding (no additional finding identities).

Rationale:

Spec FR-116v2-01 and FR-116v2-05.
There is already a content-normalization + hashing stack in DriftFindingGenerator (policy snapshot / assignments / scope tags) which can inform the content fidelity provider.

Notes / Risks

Existing baseline compare findings are currently keyed by fingerprint that includes baseline/current hashes and uses scope_key = baseline_profile:{id}. The v1 migration should plan for “old findings become stale” behavior; do not attempt silent in-place identity rewriting without an explicit migration/backfill plan.
Coverage persistence must remain numeric-only in summary_counts (per Ops-UX). Detailed coverage lists belong in operation_runs.context.

6.4 KiB Raw Blame History Unescape Escape

Phase 0 — Research (Baseline Drift Engine)

Repo Reality Check (What already exists)

Decisions

1) Finding identity for baseline compare

2) Scope key for baseline compare findings

3) Coverage guard (prevent false missing policies)

4) Inventory meta contract hashing (v1 fidelity=meta)

5) Baseline scope + foundations

6) v2 architecture strategy (content fidelity)

Notes / Risks

6.4 KiB

Raw Blame History