Implements Spec 118 baseline drift engine improvements: - Resumable, budget-aware evidence capture for baseline capture/compare runs (resume token + UI action) - “Why no findings?” reason-code driven explanations and richer run context panels - Baseline Snapshot resource (list/detail) with fidelity visibility - Retention command + schedule for pruning baseline-purpose PolicyVersions - i18n strings for Baseline Compare landing Verification: - `vendor/bin/sail bin pint --dirty --format agent` - `vendor/bin/sail artisan test --compact --filter=Baseline` (159 passed) Note: - `docs/audits/redaction-audit-2026-03-04.md` left untracked (not part of PR). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #143
162 lines
10 KiB
Markdown
162 lines
10 KiB
Markdown
# Implementation Plan: Golden Master Deep Drift v2 (Full Content Capture)
|
||
|
||
**Branch**: `118-baseline-drift-engine` | **Date**: 2026-03-03 | **Spec**: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md
|
||
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md`
|
||
|
||
## Summary
|
||
|
||
Enable reliable, settings-level drift detection (“deep drift”) for Golden Master baselines by making baseline capture and baseline compare self-sufficient:
|
||
|
||
- For baseline profiles configured for full-content capture, both capture and compare automatically capture the required policy content evidence on demand (quota-aware, resumable), rather than relying on opportunistic evidence.
|
||
- Drift comparison uses the existing canonical fingerprinting pipeline and evidence provider chain (content-first, explicit degraded fallback), with “no legacy” enforced via code paths and automated guards.
|
||
- Operations are observable and explainable: each run records effective scope, coverage proof, fidelity breakdown, evidence capture stats, evidence gaps, and “why no findings” reason codes.
|
||
- Security and governance constraints are enforced: captured policy evidence is redacted before persistence/fingerprinting, audit events are emitted for capture/compare/resume mutations, baseline-purpose evidence is pruned per retention policy, and full-content mode is gated by a short-lived rollout flag.
|
||
- Admin UX exposes single-click actions (“Capture baseline (full content)”, “Compare now (full content)”, and “Resume capture” when applicable), surfaces evidence gaps clearly, and provides baseline snapshot fidelity visibility (content-complete vs gaps).
|
||
|
||
## Technical Context
|
||
|
||
**Language/Version**: PHP 8.4.15
|
||
**Primary Dependencies**: Laravel 12.52, Filament 5.2, Livewire 4.1, Microsoft Graph integration via `GraphClientInterface`
|
||
**Storage**: PostgreSQL (JSONB-heavy for evidence/snapshots)
|
||
**Testing**: Pest 4.3 (PHPUnit 12.5)
|
||
**Target Platform**: Containerized web app (Local: Sail; Staging/Production: Dokploy)
|
||
**Project Type**: Web application (Laravel monolith + Filament admin panel)
|
||
**Performance Goals**: Capture/compare runs handle 200–500 in-scope subjects per run under throttling constraints, without blocking UI; evidence capture is bounded and resumable.
|
||
**Constraints**: All long-running + remote work is async + observable via `OperationRun`; rate limits (429/503) must back off safely; no secrets/PII persisted in evidence or logs; tenant/workspace isolation is strict.
|
||
**Scale/Scope**: Multi-workspace, multi-tenant; per tenant potentially hundreds–thousands of policies; baselines may be assigned to multiple tenants in a workspace.
|
||
|
||
**Initial budget defaults (v1, adjustable via config)**:
|
||
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_ITEMS_PER_RUN=200`
|
||
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_CONCURRENCY=5`
|
||
- `TENANTPILOT_BASELINE_EVIDENCE_MAX_RETRIES=3`
|
||
- `TENANTPILOT_BASELINE_EVIDENCE_RETENTION_DAYS=90`
|
||
|
||
## Constitution Check
|
||
|
||
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
||
|
||
- PASS — Inventory-first: Inventory remains the subject index (“last observed”), while content evidence is captured explicitly as immutable policy versions for comparison.
|
||
- PASS — Read/write separation: this feature adds/extends read-only capture/compare operations (no restore); any destructive UI actions remain confirmed + audited.
|
||
- PASS — Graph contract path: evidence capture uses existing Graph client abstractions and contract registry (`/Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/graph_contracts.php`); no direct/adhoc endpoints in feature code.
|
||
- PASS — Deterministic capabilities: capability gating continues through the canonical capability resolvers and enforcement helpers (no role-string checks).
|
||
- PASS — RBAC-UX: workspace membership + capability gates enforced server-side; non-member access is deny-as-not-found; member missing capability is forbidden.
|
||
- PASS — Workspace & tenant isolation: baseline profiles/snapshots are workspace-owned; compare runs/findings/evidence remain tenant-scoped; canonical Monitoring pages remain DB-only at render time.
|
||
- PASS — Ops observability: baseline capture/compare are `OperationRun`-backed; start surfaces enqueue-only; no remote work at render time.
|
||
- PASS — Ops-UX 3-surface feedback + lifecycle + summary counts: enqueue toast uses the canonical presenter; progress shown only in global widget + run detail; completion emits exactly one terminal DB notification to initiator; status/outcome transitions remain service-owned; summary counts stay numeric-only using canonical keys.
|
||
- PASS — Automation & throttling: evidence capture respects 429/503 backoff + jitter (client + phase-level budget handling) and supports resumption via an opaque token stored in run context.
|
||
- PASS — BADGE-001: any new/changed badges use existing badge catalog mapping (no ad-hoc).
|
||
- PASS — Filament action surface + UX-001: actions are declared, capability-gated, and confirmed where destructive-like; tables maintain an inspect affordance; view uses infolists; empty states have 1 CTA.
|
||
|
||
## Project Structure
|
||
|
||
### Documentation (this feature)
|
||
|
||
```text
|
||
/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/
|
||
├── spec.md
|
||
├── plan.md
|
||
├── research.md
|
||
├── data-model.md
|
||
├── quickstart.md
|
||
├── contracts/
|
||
└── tasks.md
|
||
```
|
||
|
||
### Source Code (repository root)
|
||
|
||
```text
|
||
/Users/ahmeddarrazi/Documents/projects/TenantAtlas/
|
||
app/
|
||
├── Filament/
|
||
│ ├── Pages/BaselineCompareLanding.php
|
||
│ ├── Resources/BaselineProfileResource.php
|
||
│ └── Resources/BaselineProfileResource/RelationManagers/BaselineTenantAssignmentsRelationManager.php
|
||
├── Jobs/
|
||
│ ├── CaptureBaselineSnapshotJob.php
|
||
│ └── CompareBaselineToTenantJob.php
|
||
├── Models/
|
||
│ ├── BaselineProfile.php
|
||
│ ├── BaselineSnapshot.php
|
||
│ ├── BaselineSnapshotItem.php
|
||
│ ├── BaselineTenantAssignment.php
|
||
│ ├── Policy.php
|
||
│ ├── PolicyVersion.php
|
||
│ ├── InventoryItem.php
|
||
│ ├── OperationRun.php
|
||
│ └── Finding.php
|
||
├── Services/
|
||
│ ├── Baselines/
|
||
│ │ ├── BaselineCaptureService.php
|
||
│ │ ├── BaselineCompareService.php
|
||
│ │ ├── CurrentStateHashResolver.php
|
||
│ │ └── Evidence/
|
||
│ ├── Intune/PolicyCaptureOrchestrator.php
|
||
│ └── OperationRunService.php
|
||
├── Support/
|
||
│ ├── Baselines/
|
||
│ ├── OpsUx/
|
||
│ └── OperationRunType.php
|
||
config/
|
||
├── graph_contracts.php
|
||
└── tenantpilot.php
|
||
database/
|
||
└── migrations/
|
||
tests/
|
||
└── Feature/
|
||
```
|
||
|
||
**Structure Decision**: Laravel monolith. Baseline drift orchestration lives in `app/Services/Baselines` + `app/Jobs`, UI in `app/Filament`, and evidence capture reuses `app/Services/Intune` capture orchestration.
|
||
|
||
Tasks are defined in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/tasks.md`.
|
||
|
||
## Complexity Tracking
|
||
|
||
No constitution violations are required for Spec 118 planning. (Table intentionally omitted.)
|
||
|
||
## Phase 0 — Research (output: research.md)
|
||
|
||
Goals:
|
||
- Confirm precise extension points for adding full-content evidence capture to existing baseline capture/compare jobs.
|
||
- Decide the purpose-tagging and idempotency strategy for baseline evidence captured as `PolicyVersion`.
|
||
- Confirm Monitoring run detail requirements for `context.target_scope` and baseline-specific context sections.
|
||
|
||
Deliverable: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/research.md`
|
||
|
||
## Phase 1 — Design (output: data-model.md + contracts/* + quickstart.md)
|
||
|
||
Deliverables:
|
||
- Data model changes + JSON context shapes: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/data-model.md`
|
||
- Route surface contract reference: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/contracts/openapi.yaml`
|
||
- Developer quickstart: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/quickstart.md`
|
||
|
||
Post-design constitution re-check: PASS (see decisions in research + data model docs; Ops-UX and RBAC constraints preserved).
|
||
|
||
## Phase 2 — Implementation Planning (high-level)
|
||
|
||
1) Add migrations:
|
||
- `baseline_profiles.capture_mode`
|
||
- `baseline_snapshot_items.subject_key`
|
||
- `policy_versions.capture_purpose`, `operation_run_id`, `baseline_profile_id` + indexes
|
||
2) Implement quota-aware, resumable baseline evidence capture phase:
|
||
- reuse existing capture orchestration (policy payload + assignments + scope tags)
|
||
- emit capture stats + resume token in `OperationRun.context`
|
||
3) Integrate the capture phase into:
|
||
- baseline capture job (before snapshot build)
|
||
- baseline compare job (refresh phase before drift evaluation)
|
||
4) Update drift matching to use cross-tenant subject key (`policy_type + subject_key`) where `subject_key` is the normalized display name, and record ambiguous/missing match as evidence gaps (no finding).
|
||
5) Update Ops-UX context:
|
||
- ensure `context.target_scope` exists for baseline capture/compare runs
|
||
- add “why no findings” reason codes
|
||
6) Update UI action surfaces:
|
||
- Baseline profile: capture mode + “Capture baseline (full content)” + tenant-targeted “Compare now (full content)”
|
||
- Operation run detail: evidence capture panel + “Resume capture” when token exists
|
||
7) Add focused Pest tests:
|
||
- full-content capture creates content-fidelity snapshot items (or warnings + gaps)
|
||
- compare detects settings drift with content evidence
|
||
- throttling/resume semantics and “no silent zeros” reason codes
|
||
8) Add governance hardening:
|
||
- enforce rollout gate across UI/services/jobs for full-content mode
|
||
- redact secrets/PII from captured evidence before persistence/fingerprinting
|
||
- emit audit events for capture/compare/resume operations
|
||
- prune baseline-purpose evidence per retention policy (scheduled)
|