TenantAtlas/specs/118-baseline-drift-engine/plan.md

# Implementation Plan: Golden Master Deep Drift v2 (Full Content Capture)

**Branch**: `118-baseline-drift-engine` | **Date**: 2026-03-03 | **Spec**: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md`

## Summary

Enable reliable, settings-level drift detection (“deep drift”) for Golden Master baselines by making baseline capture and baseline compare self-sufficient:

- For baseline profiles configured for full-content capture, both capture and compare automatically capture the required policy content evidence on demand (quota-aware, resumable), rather than relying on opportunistic evidence.
- Drift comparison uses the existing canonical fingerprinting pipeline and evidence provider chain (content-first, explicit degraded fallback), with “no legacy” enforced via code paths and automated guards.
- Operations are observable and explainable: each run records effective scope, coverage proof, fidelity breakdown, evidence capture stats, evidence gaps, and “why no findings” reason codes.
- Security and governance constraints are enforced: captured policy evidence is redacted before persistence/fingerprinting, audit events are emitted for capture/compare/resume mutations, baseline-purpose evidence is pruned per retention policy, and full-content mode is gated by a short-lived rollout flag.
- Admin UX exposes single-click actions (“Capture baseline (full content)”, “Compare now (full content)”, and “Resume capture” when applicable), surfaces evidence gaps clearly, and provides baseline snapshot fidelity visibility (content-complete vs gaps).

## Technical Context

**Language/Version**: PHP 8.4.15
**Primary Dependencies**: Laravel 12.52, Filament 5.2, Livewire 4.1, Microsoft Graph integration via `GraphClientInterface`
**Storage**: PostgreSQL (JSONB-heavy for evidence/snapshots)
**Testing**: Pest 4.3 (PHPUnit 12.5)
**Target Platform**: Containerized web app (Local: Sail; Staging/Production: Dokploy)
**Project Type**: Web application (Laravel monolith + Filament admin panel)
**Performance Goals**: Capture/compare runs handle 200–500 in-scope subjects per run under throttling constraints, without blocking UI; evidence capture is bounded and resumable.
**Constraints**: All long-running + remote work is async + observable via `OperationRun`; rate limits (429/503) must back off safely; no secrets/PII persisted in evidence or logs; tenant/workspace isolation is strict.
**Scale/Scope**: Multi-workspace, multi-tenant; per tenant potentially hundreds–thousands of policies; baselines may be assigned to multiple tenants in a workspace.

**Initial budget defaults (v1, adjustable via config)**:
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_ITEMS_PER_RUN=200`
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_CONCURRENCY=5`
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_RETRIES=3`
- `TENANTPILOT_BASELINE_EVIDENCE_RETENTION_DAYS=90`

## Constitution Check

*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*

- PASS — Inventory-first: Inventory remains the subject index (“last observed”), while content evidence is captured explicitly as immutable policy versions for comparison.
- PASS — Read/write separation: this feature adds/extends read-only capture/compare operations (no restore); any destructive UI actions remain confirmed + audited.
- PASS — Graph contract path: evidence capture uses existing Graph client abstractions and contract registry (`/Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/graph_contracts.php`); no direct/adhoc endpoints in feature code.
- PASS — Deterministic capabilities: capability gating continues through the canonical capability resolvers and enforcement helpers (no role-string checks).
- PASS — RBAC-UX: workspace membership + capability gates enforced server-side; non-member access is deny-as-not-found; member missing capability is forbidden.
- PASS — Workspace & tenant isolation: baseline profiles/snapshots are workspace-owned; compare runs/findings/evidence remain tenant-scoped; canonical Monitoring pages remain DB-only at render time.
- PASS — Ops observability: baseline capture/compare are `OperationRun`-backed; start surfaces enqueue-only; no remote work at render time.
- PASS — Ops-UX 3-surface feedback + lifecycle + summary counts: enqueue toast uses the canonical presenter; progress shown only in global widget + run detail; completion emits exactly one terminal DB notification to initiator; status/outcome transitions remain service-owned; summary counts stay numeric-only using canonical keys.
- PASS — Automation & throttling: evidence capture respects 429/503 backoff + jitter (client + phase-level budget handling) and supports resumption via an opaque token stored in run context.
- PASS — BADGE-001: any new/changed badges use existing badge catalog mapping (no ad-hoc).
- PASS — Filament action surface + UX-001: actions are declared, capability-gated, and confirmed where destructive-like; tables maintain an inspect affordance; view uses infolists; empty states have 1 CTA.

## Project Structure

### Documentation (this feature)

```text
/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
└── tasks.md
```

### Source Code (repository root)

```text
/Users/ahmeddarrazi/Documents/projects/TenantAtlas/
app/
├── Filament/
│   ├── Pages/BaselineCompareLanding.php
│   ├── Resources/BaselineProfileResource.php
│   └── Resources/BaselineProfileResource/RelationManagers/BaselineTenantAssignmentsRelationManager.php
├── Jobs/
│   ├── CaptureBaselineSnapshotJob.php
│   └── CompareBaselineToTenantJob.php
├── Models/
│   ├── BaselineProfile.php
│   ├── BaselineSnapshot.php
│   ├── BaselineSnapshotItem.php
│   ├── BaselineTenantAssignment.php
│   ├── Policy.php
│   ├── PolicyVersion.php
│   ├── InventoryItem.php
│   ├── OperationRun.php
│   └── Finding.php
├── Services/
│   ├── Baselines/
│   │   ├── BaselineCaptureService.php
│   │   ├── BaselineCompareService.php
│   │   ├── CurrentStateHashResolver.php
│   │   └── Evidence/
│   ├── Intune/PolicyCaptureOrchestrator.php
│   └── OperationRunService.php
├── Support/
│   ├── Baselines/
│   ├── OpsUx/
│   └── OperationRunType.php
config/
├── graph_contracts.php
└── tenantpilot.php
database/
└── migrations/
tests/
└── Feature/
```

**Structure Decision**: Laravel monolith. Baseline drift orchestration lives in `app/Services/Baselines` + `app/Jobs`, UI in `app/Filament`, and evidence capture reuses `app/Services/Intune` capture orchestration.

Tasks are defined in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/tasks.md`.

## Complexity Tracking

No constitution violations are required for Spec 118 planning. (Table intentionally omitted.)

## Phase 0 — Research (output: research.md)

Goals:
- Confirm precise extension points for adding full-content evidence capture to existing baseline capture/compare jobs.
- Decide the purpose-tagging and idempotency strategy for baseline evidence captured as `PolicyVersion`.
- Confirm Monitoring run detail requirements for `context.target_scope` and baseline-specific context sections.

Deliverable: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/research.md`

## Phase 1 — Design (output: data-model.md + contracts/* + quickstart.md)

Deliverables:
- Data model changes + JSON context shapes: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/data-model.md`
- Route surface contract reference: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/contracts/openapi.yaml`
- Developer quickstart: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/quickstart.md`

Post-design constitution re-check: PASS (see decisions in research + data model docs; Ops-UX and RBAC constraints preserved).

## Phase 2 — Implementation Planning (high-level)

1) Add migrations:
   - `baseline_profiles.capture_mode`
   - `baseline_snapshot_items.subject_key`
   - `policy_versions.capture_purpose`, `operation_run_id`, `baseline_profile_id` + indexes
2) Implement quota-aware, resumable baseline evidence capture phase:
   - reuse existing capture orchestration (policy payload + assignments + scope tags)
   - emit capture stats + resume token in `OperationRun.context`
3) Integrate the capture phase into:
   - baseline capture job (before snapshot build)
   - baseline compare job (refresh phase before drift evaluation)
4) Update drift matching to use cross-tenant subject key (`policy_type + subject_key`) where `subject_key` is the normalized display name, and record ambiguous/missing match as evidence gaps (no finding).
5) Update Ops-UX context:
   - ensure `context.target_scope` exists for baseline capture/compare runs
   - add “why no findings” reason codes
6) Update UI action surfaces:
   - Baseline profile: capture mode + “Capture baseline (full content)” + tenant-targeted “Compare now (full content)”
   - Operation run detail: evidence capture panel + “Resume capture” when token exists
7) Add focused Pest tests:
   - full-content capture creates content-fidelity snapshot items (or warnings + gaps)
   - compare detects settings drift with content evidence
   - throttling/resume semantics and “no silent zeros” reason codes
8) Add governance hardening:
   - enforce rollout gate across UI/services/jobs for full-content mode
   - redact secrets/PII from captured evidence before persistence/fingerprinting
   - emit audit events for capture/compare/resume operations
   - prune baseline-purpose evidence per retention policy (scheduled)