Implements Spec 118 baseline drift engine improvements: - Resumable, budget-aware evidence capture for baseline capture/compare runs (resume token + UI action) - “Why no findings?” reason-code driven explanations and richer run context panels - Baseline Snapshot resource (list/detail) with fidelity visibility - Retention command + schedule for pruning baseline-purpose PolicyVersions - i18n strings for Baseline Compare landing Verification: - `vendor/bin/sail bin pint --dirty --format agent` - `vendor/bin/sail artisan test --compact --filter=Baseline` (159 passed) Note: - `docs/audits/redaction-audit-2026-03-04.md` left untracked (not part of PR). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #143
103 lines
6.3 KiB
Markdown
103 lines
6.3 KiB
Markdown
# Research — Spec 118 Golden Master Deep Drift v2
|
|
|
|
This document resolves planning unknowns for implementing `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/118-baseline-drift-engine/spec.md` in the existing Laravel + Filament codebase.
|
|
|
|
## Decision 1 — Full-content evidence capture orchestration
|
|
|
|
**Decision**: Introduce a dedicated “baseline content capture” phase that can be invoked from both baseline capture and baseline compare:
|
|
|
|
- Baseline capture (`baseline_capture` run): capture evidence needed to build a content-fidelity baseline snapshot (as budget allows).
|
|
- Baseline compare (`baseline_compare` run): refresh current evidence before drift evaluation (as budget allows).
|
|
|
|
The phase reuses the existing Intune capture orchestration (`/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Intune/PolicyCaptureOrchestrator.php`) so we do not introduce a second capture implementation.
|
|
|
|
**Rationale**:
|
|
- Aligns with Spec 118 goal: deep drift by default, without per-policy manual capture.
|
|
- Keeps a single source of truth for content capture (policy payload + assignments + scope tags).
|
|
- Makes quota management, retries, and resumability explicit at the operation level.
|
|
|
|
**Alternatives considered**:
|
|
- Opportunistic only (rejected: repeats Spec 117 fragility; “no drift” can still be a silent failure).
|
|
- UI-driven per-policy capture (rejected: explicitly out of UX goals).
|
|
|
|
## Decision 2 — PolicyVersion purpose tagging + run traceability
|
|
|
|
**Decision**: Extend `policy_versions` with baseline-purpose attribution:
|
|
|
|
- `capture_purpose`: `backup | baseline_capture | baseline_compare`
|
|
- `operation_run_id` (nullable): link to the run that captured the version
|
|
- `baseline_profile_id` (nullable): link for baseline_* captures
|
|
|
|
**Rationale**:
|
|
- Enables audit/debug (“which run produced this evidence, for what purpose?”) without introducing a separate evidence table.
|
|
- Supports idempotency and “resume capture” semantics (skip already-captured subjects for the same run/purpose).
|
|
|
|
**Alternatives considered**:
|
|
- Store purpose only in `policy_versions.metadata` (rejected: harder to index/query; weaker guardrails).
|
|
- Create an EvidenceItems model now (rejected: explicitly not required in Spec 118).
|
|
|
|
## Decision 3 — Golden Master subject matching across tenants
|
|
|
|
**Decision**: Treat the Golden Master “subject identity” as a cross-tenant match key derived from policy display name:
|
|
|
|
- Subject match key: `policy_type + normalized_display_name`
|
|
- `normalized_display_name` rules: trim leading/trailing whitespace, collapse internal whitespace to single spaces, lowercase.
|
|
|
|
Implementation uses a dedicated snapshot-item field (e.g., `baseline_snapshot_items.subject_key`) for matching, while preserving tenant-specific external IDs separately for evidence resolution.
|
|
|
|
Ambiguous/missing match handling:
|
|
- Missing match in current tenant → eligible for “missing policy” (only with coverage proof).
|
|
- Multiple matches for the same key within a tenant/type → record evidence gap and suppress drift evaluation for that subject key (no finding).
|
|
|
|
**Rationale**:
|
|
- Baselines are workspace-owned and can be assigned to multiple tenants; external IDs are tenant-specific and cannot be used for cross-tenant matching.
|
|
- The match key keeps snapshot items free of tenant identifiers while enabling consistent comparisons.
|
|
|
|
**Alternatives considered**:
|
|
- Match by tenant external ID (rejected: breaks cross-tenant baseline assignment).
|
|
- Require per-tenant baseline snapshots (rejected for Spec 118: changes product semantics and assignment UX).
|
|
- Introduce an explicit mapping table (rejected for R1: higher effort and requires operational UX not described in spec).
|
|
|
|
## Decision 4 — Quota-aware capture + resumable token
|
|
|
|
**Decision**: Evidence capture is bounded and resumable:
|
|
|
|
- Enforce per-run limits (max items, max concurrency, max retry attempts).
|
|
- Store an opaque “resume token” in `operation_runs.context` when a run cannot complete within budget.
|
|
- Provide a “Resume capture” UI action that starts a follow-up run continuing from that token.
|
|
|
|
**Rationale**:
|
|
- Large tenants/scopes must not create uncontrolled queue storms or long-running jobs.
|
|
- Operators need explicit visibility into “what was captured vs skipped” and a safe path to completion.
|
|
|
|
**Alternatives considered**:
|
|
- “Always finish no matter what” (rejected: risks rate limiting and operational instability).
|
|
- Mark run failed on any capture failure (rejected: Spec 118 allows partial failure with warnings).
|
|
|
|
## Decision 5 — Ops-UX + run context contract (“Why no findings?”)
|
|
|
|
**Decision**: Baseline runs explicitly populate:
|
|
|
|
- `context.target_scope` (required for Monitoring run detail; avoids “No target scope details…”)
|
|
- `context.effective_scope` + `context.capture_mode`
|
|
- evidence capture stats + gaps + reason codes when subjects processed = 0 or findings = 0
|
|
|
|
Keep `summary_counts` numeric-only and limited to keys from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/OpsUx/OperationSummaryKeys.php`; store richer detail in `context`.
|
|
|
|
**Rationale**:
|
|
- Eliminates ambiguous “0 findings” outcomes and improves operator trust.
|
|
- Conforms to Ops-UX 3-surface feedback contract and Monitoring expectations.
|
|
|
|
**Alternatives considered**:
|
|
- Put details into `summary_counts` (rejected: key whitelist contract).
|
|
- Only log details (rejected: operators need UI visibility).
|
|
|
|
## Notes on current codebase (facts observed)
|
|
|
|
- Baseline capture run creation: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Baselines/BaselineCaptureService.php`
|
|
- Baseline compare run creation: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Baselines/BaselineCompareService.php`
|
|
- Capture job (currently opportunistic content): `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CaptureBaselineSnapshotJob.php`
|
|
- Compare job (provider-chain evidence): `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CompareBaselineToTenantJob.php`
|
|
- Evidence providers + resolver: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Baselines/CurrentStateHashResolver.php` and `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Baselines/Evidence/*`
|
|
- Monitoring target scope rendering expects `context.target_scope`: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/OperationRunResource.php`
|