TenantAtlas/specs/116-baseline-drift-engine/research.md
ahmido 7620144ab6 Spec 116: Baseline drift engine v1 (meta fidelity + coverage guard) (#141)
Implements Spec 116 baseline drift engine v1 (meta fidelity) with coverage guard, stable finding identity, and Filament UI surfaces.

Highlights
- Baseline capture/compare jobs and supporting services (meta contract hashing via InventoryMetaContract + DriftHasher)
- Coverage proof parsing + compare partial outcome behavior
- Filament pages/resources/widgets for baseline compare + drift landing improvements
- Pest tests for capture/compare/coverage guard and UI start surfaces
- Research report: docs/research/golden-master-baseline-drift-deep-analysis.md

Validation
- `vendor/bin/sail bin pint --dirty`
- `vendor/bin/sail artisan test --compact --filter="Baseline"`

Notes
- No destructive user actions added; compare/capture remain queued jobs.
- Provider registration unchanged (Laravel 11+/12 uses bootstrap/providers.php for panel providers; not touched here).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #141
2026-03-02 22:02:58 +00:00

105 lines
6.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Phase 0 — Research (Baseline Drift Engine)
This document resolves the open design/implementation questions needed to produce a concrete implementation plan for Spec 116, grounded in the current codebase.
## Repo Reality Check (What already exists)
- Baseline domain tables exist: `baseline_profiles`, `baseline_snapshots`, `baseline_snapshot_items`, `baseline_tenant_assignments`.
- Baseline ops exist:
- Capture: `App\Services\Baselines\BaselineCaptureService``App\Jobs\CaptureBaselineSnapshotJob`.
- Compare: `App\Services\Baselines\BaselineCompareService``App\Jobs\CompareBaselineToTenantJob`.
- Findings lifecycle primitives exist (times_seen/first_seen/last_seen) and recurrence support exists (`findings.recurrence_key`).
- An existing recurrence-based drift generator exists: `App\Services\Drift\DriftFindingGenerator` (uses `recurrence_key` and also sets `fingerprint = recurrence_key` to satisfy the unique constraint).
- Inventory sync is OperationRun-based and stamps `inventory_items.last_seen_operation_run_id`.
## Decisions
### 1) Finding identity for baseline compare
**Decision:** Baseline compare findings MUST use a stable recurrence key derived from:
- `tenant_id`
- `baseline_snapshot_id` (not baseline profile id)
- `policy_type`
- `subject_external_id`
- `change_type`
This recurrence key is stored in `findings.recurrence_key` and ALSO used as `findings.fingerprint` (to keep the existing unique constraint `unique(tenant_id, fingerprint)` effective).
**Rationale:**
- Matches Spec 116 (identity tied to `baseline_snapshot_id` and independent of evidence hashes).
- Aligns with existing, proven pattern in `DriftFindingGenerator` (recurrence_key-based upsert; fingerprint reused).
**Alternatives considered:**
- Keep `DriftHasher::fingerprint(...)` with baseline/current hashes included → rejected because it changes identity when evidence changes (violates FR-116v1-09).
- Add a new unique DB constraint on `(tenant_id, recurrence_key)` → possible later hardening; not required initially because fingerprint uniqueness already enforces dedupe when `fingerprint = recurrence_key`.
### 2) Scope key for baseline compare findings
**Decision:** Keep findings grouped by baseline profile using `scope_key = baseline_profile:{baselineProfileId}`.
**Rationale:**
- Spec 116 requires snapshot-scoped *identity* (via `baseline_snapshot_id` in the recurrence key), but does not require snapshot-scoped grouping.
- The repository already has UI widgets/stats and auto-close behavior keyed to `baseline_profile:{id}`; keeping scope_key stable minimizes churn and preserves existing semantics.
- Re-captures still create new finding identities because the recurrence key includes `baseline_snapshot_id`.
**Alternatives considered:**
- Snapshot-scoped `scope_key = baseline_snapshot:{id}` → rejected for v1 because it would require larger refactors to stats, widgets, and auto-close queries, without being mandated by the spec.
### 3) Coverage guard (prevent false missing policies)
**Decision:** Coverage MUST be derived from the latest completed `inventory_sync` OperationRun for the tenant:
- Record per-policy-type processing outcomes into that runs context (coverage payload).
- Baseline compare MUST compute `uncovered_policy_types = effective_scope - covered_policy_types`.
- Baseline compare MUST emit **no findings of any kind** for uncovered policy types.
- The compare OperationRun outcome should be `partially_succeeded` when uncovered types exist ("completed with warnings" in Ops UX), and summary counts should include `errors_recorded = count(uncovered_policy_types)`.
**Rationale:**
- Spec FR-116v1-07 and SC-116-03.
- Current compare logic uses `inventory_items.last_seen_operation_run_id` filter only; without an explicit coverage list, a missing type looks identical to a truly empty tenant.
**Alternatives considered:**
- Infer coverage purely from "were there any inventory items for this policy type in the last sync run" → rejected because a legitimately empty type would be indistinguishable from "not synced".
### 4) Inventory meta contract hashing (v1 fidelity=meta)
**Decision:** Introduce an explicit "Inventory Meta Contract" builder used by BOTH capture and compare in v1.
- Inputs: `policy_type`, `external_id`, and a whitelist of stable signals from inventory/meta (etag, last modified, scope tags, assignment target count, version marker when available).
- Output: a normalized associative array, hashed deterministically.
**Rationale:**
- Spec FR-116v1-04: hashing must be based on a stable contract, not arbitrary meta.
- Current `BaselineSnapshotIdentity::hashItemContent()` hashes the entire `meta_jsonb` (including keys like `etag` which may be noisy and keys that may expand over time).
**Alternatives considered:**
- Keep current hashing of `meta_jsonb` → rejected because it is not an explicit contract and may drift as we add inventory metadata.
### 5) Baseline scope + foundations
**Decision:** Extend baseline scope JSON to include:
- `policy_types: []` (empty means default "all supported policy types excluding foundations")
- `foundation_types: []` (empty means default "none")
Foundations list must be derived from the same canonical foundation list used by inventory sync selection logic.
**Rationale:**
- Spec FR-116v1-01.
- Current `BaselineScope` only supports `policy_types` and treats empty as "all" (including foundations) which conflicts with the spec default.
### 6) v2 architecture strategy (content fidelity)
**Decision:** v2 is implemented as an extension of the same pipeline via a provider precedence chain:
`PolicyVersion (if available) → Inventory content (if available) → Meta contract fallback (degraded)`
The baseline compare engine stores dimension flags on the same finding (no additional finding identities).
**Rationale:**
- Spec FR-116v2-01 and FR-116v2-05.
- There is already a content-normalization + hashing stack in `DriftFindingGenerator` (policy snapshot / assignments / scope tags) which can inform the content fidelity provider.
## Notes / Risks
- Existing baseline compare findings are currently keyed by `fingerprint` that includes baseline/current hashes and uses `scope_key = baseline_profile:{id}`. The v1 migration should plan for “old findings become stale” behavior; do not attempt silent in-place identity rewriting without an explicit migration/backfill plan.
- Coverage persistence must remain numeric-only in `summary_counts` (per Ops-UX). Detailed coverage lists belong in `operation_runs.context`.