ahmido 7620144ab6 Spec 116: Baseline drift engine v1 (meta fidelity + coverage guard) (#141 )

Implements Spec 116 baseline drift engine v1 (meta fidelity) with coverage guard, stable finding identity, and Filament UI surfaces.

Highlights
- Baseline capture/compare jobs and supporting services (meta contract hashing via InventoryMetaContract + DriftHasher)
- Coverage proof parsing + compare partial outcome behavior
- Filament pages/resources/widgets for baseline compare + drift landing improvements
- Pest tests for capture/compare/coverage guard and UI start surfaces
- Research report: docs/research/golden-master-baseline-drift-deep-analysis.md

Validation
- `vendor/bin/sail bin pint --dirty`
- `vendor/bin/sail artisan test --compact --filter="Baseline"`

Notes
- No destructive user actions added; compare/capture remain queued jobs.
- Provider registration unchanged (Laravel 11+/12 uses bootstrap/providers.php for panel providers; not touched here).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #141

2026-03-02 22:02:58 +00:00

13 KiB

Raw Blame History

Implementation Plan: 116 — Baseline Drift Engine (Final Architecture)

Branch: 116-baseline-drift-engine | Date: 2026-03-01 | Spec: specs/116-baseline-drift-engine/spec.md Input: Feature specification from specs/116-baseline-drift-engine/spec.md

Summary

Align the existing baseline capture/compare pipeline to Spec 116 by (1) defining an explicit meta-fidelity hash contract, (2) enforcing the “coverage guard” based on the latest inventory sync run, and (3) switching baseline-compare findings to snapshot-scoped stable identities (recurrence keys) while preserving existing baseline-profile grouping for UI/stats and auto-close semantics.

Technical Context

Language/Version: PHP 8.4 Primary Dependencies: Laravel 12, Filament v5, Livewire v4 Storage: PostgreSQL (via Sail) Testing: Pest v4 (PHPUnit 12) Target Platform: Docker (Laravel Sail) Project Type: Web application (Laravel) Performance Goals: Compare jobs must remain bounded by scope size; avoid N+1 queries when loading snapshot + current inventory Constraints:

Ops-UX: OperationRun lifecycle + 3-surface feedback (toast queued-only, progress in widget/run detail, terminal DB notification exactly-once)
Summary counts numeric-only and keys restricted to OperationSummaryKeys
Tenant/workspace isolation + RBAC deny-as-not-found rules Scale/Scope: Tenant inventories may be large; baseline compare must be efficient on (tenant_id, policy_type) filtering

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Inventory-first: PASS — compare uses Inventory as last observed state; baselines are immutable snapshots.
Read/write separation: PASS — this feature is read-only analysis; no Graph writes.
Graph contract path: PASS — inventory sync already uses GraphClientInterface; baseline compare itself is DB-only at render time.
Deterministic capabilities: PASS — baseline capability checks use existing registries/policies; no new ad-hoc strings.
Workspace + tenant isolation: PASS — baseline profiles are workspace-owned; runs/findings are tenant-owned; authorization remains deny-as-not-found for non-members.
Run observability (OperationRun): PASS — capture/compare already use OperationRun + queued jobs.
Ops-UX 3-surface feedback: PASS — existing pages use canonical queued toast presenter.
Ops-UX lifecycle: PASS — transitions must remain inside OperationRunService.
Ops-UX summary counts: PASS — only numeric summary counts using canonical keys.
Filament UI contract: PASS — only small scope-picker adjustments; no new pages beyond what exists.
Filament UX-001 layout: PASS — Baseline Profile Create/Edit will be updated to a Main/Aside layout as part of the scope-picker work.

Project Structure

Documentation (this feature)

specs/116-baseline-drift-engine/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── openapi.yaml
└── checklists/
    └── requirements.md

Source Code (repository root)

app/
├── Filament/
│   ├── Pages/                # Baseline compare landing + run detail links (existing)
│   ├── Resources/            # BaselineProfileResource (existing)
│   └── Widgets/              # Baseline compare widgets (existing)
├── Jobs/
│   ├── CaptureBaselineSnapshotJob.php
│   └── CompareBaselineToTenantJob.php
├── Models/
│   ├── BaselineProfile.php
│   ├── BaselineSnapshot.php
│   ├── BaselineSnapshotItem.php
│   ├── Finding.php
│   ├── InventoryItem.php
│   └── OperationRun.php
├── Services/
│   ├── Baselines/
│   │   ├── BaselineCaptureService.php
│   │   ├── BaselineCompareService.php
│   │   ├── BaselineAutoCloseService.php
│   │   └── BaselineSnapshotIdentity.php
│   ├── Drift/
│   │   ├── DriftFindingGenerator.php
│   │   └── DriftHasher.php
│   ├── Inventory/
│   │   └── InventorySyncService.php
│   └── OperationRunService.php
└── Support/
    ├── Baselines/BaselineCompareStats.php
    └── OpsUx/OperationSummaryKeys.php

tests/
└── Feature/
    └── Baselines/
        ├── BaselineCompareFindingsTest.php
        ├── BaselineComparePreconditionsTest.php
        ├── BaselineCompareStatsTest.php
        └── BaselineOperabilityAutoCloseTest.php

Structure Decision: Web application (Laravel 12) — all work stays in existing app/ services/jobs/models and tests/Feature.

Complexity Tracking

No constitution violations are required for this feature.

Phase 0 — Outline & Research (DONE)

Outputs:

specs/116-baseline-drift-engine/research.md

Key reconciliations captured:

Baseline compare finding identity will move to recurrence-key based upsert (snapshot-scoped identity) aligned with the existing DriftFindingGenerator pattern.
Coverage guard requires persisting per-type coverage outcomes into the latest inventory sync run context.
Scope must include policy_types + foundation_types with correct empty-default semantics.

Phase 1 — Design & Contracts (DONE)

Outputs:

specs/116-baseline-drift-engine/data-model.md
specs/116-baseline-drift-engine/contracts/openapi.yaml
specs/116-baseline-drift-engine/quickstart.md

Design highlights:

Coverage lives in operation_runs.context for inventory sync runs (detailed lists), while summary_counts remain numeric-only.
Findings use recurrence_key and fingerprint = recurrence_key for idempotent upserts.
Findings remain grouped by scope_key = baseline_profile:{id} to preserve existing UI/stats and auto-close behavior.

Phase 1 — Agent Context Update (REQUIRED)

Run:

.specify/scripts/bash/update-agent-context.sh copilot

Phase 2 — Implementation Plan

Step 1 — Baseline scope schema + UI picker

Goal: implement FR-116v1-01 and FR-116v1-02.

Changes:

Update baseline scope handling (app/Support/Baselines/BaselineScope.php) to support:
- policy_types: [] meaning “all supported policy types excluding foundations”
- foundation_types: [] meaning “none”
Update BaselineProfile form schema (Filament Resource) to show multi-selects for Policy Types and Foundations.
Document selector-to-config mapping (source of truth for option lists + defaults):

Selector	Form state path	Options source	Default semantics
Policy Types	`scope_jsonb.policy_types`	`config('tenantpilot.supported_policy_types')` via `App\Support\Inventory\InventoryPolicyTypeMeta::supported()`	Empty ⇒ all supported policy types (excluding foundations)
Foundations	`scope_jsonb.foundation_types`	`config('tenantpilot.foundation_types')` via `App\Support\Inventory\InventoryPolicyTypeMeta::foundations()`	Empty ⇒ none

Notes:

Inventory sync selection uses App\Services\BackupScheduling\PolicyTypeResolver::supportedPolicyTypes() for policy types, and InventorySyncService::foundationTypes() (derived from config('tenantpilot.foundation_types')) when include_foundations=true.

Tests:

Update/add Pest tests around scope expansion defaults (prefer a focused unit-like test if an expansion helper exists).

Step 2 — Inventory Meta Contract (explicit hash input)

Goal: implement FR-116v1-04, FR-116v1-05, FR-116v1-06, FR-116v1-06a.

Changes:

Introduce a dedicated contract builder (e.g. App\Services\Baselines\InventoryMetaContract) that returns a normalized array for hashing.
- Contract output must be explicitly versioned (e.g., meta_contract.version = 1) so future additions do not retroactively change v1 semantics.
- Contract signals are best-effort: missing signals are represented as null (not omitted) to keep hashing stable across partial inventories.
Update baseline capture hashing (BaselineSnapshotIdentity::hashItemContent() or the capture service) to hash the contract output only.
- Persist the exact contract payload used for hashing to baseline_snapshot_items.meta_jsonb.meta_contract for auditability/reproducibility.
- Persist observation metadata alongside the hash in baseline_snapshot_items.meta_jsonb (at minimum: fidelity, source, observed_at; when available: observed_operation_run_id).
Update baseline compare to compute current_hash using the same contract builder.
- Current-state observed_at is derived from persisted inventory evidence (inventory_items.last_seen_at) and MUST NOT require per-item external hydration calls during compare.
Define “latest successful snapshot” (v1) as baseline_profiles.active_snapshot_id and ensure compare start is blocked when it is null (no “pick newest captured_at” fallback).

Tests:

Add a small Pest test for contract normalization stability (ordering, missing fields, nullability) in tests/Unit/Baselines/InventoryMetaContractTest.php.
Update baseline capture/compare tests if they currently assume hashing full meta_jsonb.

Step 3 — Inventory sync coverage recording

Goal: provide coverage for FR-116v1-07.

Changes:

Extend inventory sync pipeline (in App\Services\Inventory\InventorySyncService and/or the job that orchestrates sync) to write a coverage payload into the inventory sync OperationRun.context:
- Per policy type: status (succeeded|failed|skipped) and optional item_count.
- Foundations can be included in the same shape if they are part of selection.
Ensure this is written even when some types fail, so downstream compare can determine uncovered types.

Tests:

Add/extend tests around inventory sync operation context writing (mocking Graph calls as needed; keep scope minimal).

Step 4 — Baseline compare coverage guard + outcome semantics

Goal: implement FR-116v1-07 and align to Ops-UX.

Changes:

In baseline compare job/service:
- Resolve the latest inventory sync run for the tenant.
- Compute covered_policy_types from sync run context.
- Compute uncovered_policy_types = effective_scope.policy_types - covered_policy_types.
- Skip emission of all finding types for uncovered policy types.
- Record coverage details into the compare run context for auditability.
- If uncovered types exist, set compare outcome to partially_succeeded via OperationRunService and set summary_counts.errors_recorded = count(uncovered_policy_types).
- If effective scope expands to zero types, complete as partially_succeeded and set summary_counts.errors_recorded = 1 so the warning remains visible under numeric-only summary counts.
- If there is no completed inventory sync run (or coverage proof is missing/unreadable), treat coverage as unproven for all effective-scope types (fail-safe): emit zero findings and complete as partially_succeeded.

Tests:

Add a new Pest test in tests/Feature/Baselines asserting:
- uncovered types cause partial outcome
- uncovered types produce zero findings (even if snapshot/current data would otherwise create missing/unexpected/different)
- covered types still produce findings

Step 5 — Snapshot-scoped stable finding identity

Goal: implement FR-116v1-09 and FR-116v1-10.

Changes:

Replace hash-evidence-based fingerprint generation in baseline compare with a stable recurrence key:
- Inputs: tenant_id, baseline_snapshot_id, policy_type, subject_external_id, change_type
Persist:
- findings.recurrence_key = <computed>
- findings.fingerprint = <same computed>
Keep scope_key = baseline_profile:{baselineProfileId}.
Ensure retry idempotency: do not increment lifecycle counters more than once per run identity.

Tests:

Update tests/Feature/Baselines/BaselineCompareFindingsTest.php:
- Ensure fingerprint no longer depends on baseline/current hash.
- Assert stable identity across re-runs with changed evidence hashes.
Add coverage for “recapture uses new snapshot id → new finding identity”.

Step 6 — Auto-close + stats compatibility

Goal: preserve existing operability expectations and keep UI stable.

Changes:

Ensure BaselineAutoCloseService still resolves stale findings after a fully successful compare, even though identities now include snapshot id.
Confirm BaselineCompareStats remains correct for grouping by scope_key = baseline_profile:{id}.

Tests:

Update/keep tests/Feature/Baselines/BaselineOperabilityAutoCloseTest.php passing.
Update tests/Feature/Baselines/BaselineCompareStatsTest.php only if scope semantics change.

Step 7 — Ops UX + auditability

Goal: implement FR-116v1-03 and FR-116v1-11.

Changes:

Ensure both capture and compare runs write:
- effective_scope.* in run context
- coverage summary and uncovered lists when partial
- numeric summary counts using canonical keys only
- per-change-type finding counts in operation_runs.context.findings.counts_by_change_type
- Treat the operation_runs record as the canonical audit trail for this feature slice (do not add parallel “audit summary” persistence for the same data).

Tests:

Add a regression test that asserts summary_counts contains only allowed keys and numeric values (where a helper exists).

Post-design Constitution Re-check

Expected: PASS (no changes introduce new Graph endpoints or bypass services; OperationRun lifecycle + 3-surface feedback remain intact; RBAC deny-as-not-found semantics preserved).

13 KiB Raw Blame History

Implementation Plan: 116 — Baseline Drift Engine (Final Architecture)

Summary

Technical Context

Constitution Check

Project Structure

Documentation (this feature)

Source Code (repository root)

Complexity Tracking

Phase 0 — Outline & Research (DONE)

Phase 1 — Design & Contracts (DONE)

Phase 1 — Agent Context Update (REQUIRED)

Phase 2 — Implementation Plan

Step 1 — Baseline scope schema + UI picker

Step 2 — Inventory Meta Contract (explicit hash input)

Step 3 — Inventory sync coverage recording

Step 4 — Baseline compare coverage guard + outcome semantics

Step 5 — Snapshot-scoped stable finding identity

Step 6 — Auto-close + stats compatibility

Step 7 — Ops UX + auditability

Post-design Constitution Re-check

13 KiB

Raw Blame History