ahmido 92704a2f7e Spec 118: Resumable baseline evidence capture + snapshot UX (#143 )

Implements Spec 118 baseline drift engine improvements:

- Resumable, budget-aware evidence capture for baseline capture/compare runs (resume token + UI action)
- “Why no findings?” reason-code driven explanations and richer run context panels
- Baseline Snapshot resource (list/detail) with fidelity visibility
- Retention command + schedule for pruning baseline-purpose PolicyVersions
- i18n strings for Baseline Compare landing

Verification:
- `vendor/bin/sail bin pint --dirty --format agent`
- `vendor/bin/sail artisan test --compact --filter=Baseline` (159 passed)

Note:
- `docs/audits/redaction-audit-2026-03-04.md` left untracked (not part of PR).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #143

2026-03-04 22:34:13 +00:00

22 KiB

Raw Blame History

description
Task list for Spec 118 implementation

Tasks: Golden Master Deep Drift v2 (Full Content Capture)

Input: Design documents from /specs/118-baseline-drift-engine/

Tests: REQUIRED (Pest) — this feature changes runtime behavior.

Terminology: subject_key = Spec 118 normalized display_name (trim + collapse internal whitespace + lowercase).

Data isolation (SCOPE-001): Workspace-owned baseline_snapshot_items MUST NOT persist tenant identifiers (no tenant IDs, no tenant external IDs, no operation run IDs, no policy version IDs) — only cross-tenant keys + non-tenant metadata.

Phase 1: Setup (Shared Infrastructure)

Purpose: Establish a safe baseline and introduce feature-level configuration scaffolding.

T001 Capture current baseline behavior by running existing suites in tests/Feature/Baselines/BaselineCaptureTest.php, tests/Feature/Baselines/BaselineCompareFindingsTest.php, tests/Feature/Filament/BaselineProfileCaptureStartSurfaceTest.php, and tests/Feature/Filament/BaselineCompareLandingStartSurfaceTest.php
T002 [P] Add Spec 118 rollout + budget env vars to .env.example (e.g. TENANTPILOT_BASELINE_FULL_CONTENT_CAPTURE_ENABLED, TENANTPILOT_BASELINE_EVIDENCE_MAX_ITEMS_PER_RUN=200, TENANTPILOT_BASELINE_EVIDENCE_MAX_CONCURRENCY=5, TENANTPILOT_BASELINE_EVIDENCE_MAX_RETRIES=3, TENANTPILOT_BASELINE_EVIDENCE_RETENTION_DAYS=90)
T003 [P] Add config surface for Spec 118 rollout + budgets in config/tenantpilot.php (new baselines.full_content_capture.* keys sourced from env)

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Shared primitives required by ALL user stories.

⚠️ CRITICAL: No user story work should begin until this phase is complete.

T004 Add baseline capture mode enum in app/Support/Baselines/BaselineCaptureMode.php (values: meta_only, opportunistic, full_content)
T005 [P] Add policy version capture purpose enum in app/Support/Baselines/PolicyVersionCapturePurpose.php (values: backup, baseline_capture, baseline_compare)
T006 [P] Add subject-key helper in app/Support/Baselines/BaselineSubjectKey.php (normalize display name + derive workspace-safe subject id as sha256(policy_type|subject_key) for baseline_snapshot_items.subject_external_id)
T007 [P] Add baseline compare “why no findings” reason codes in app/Support/Baselines/BaselineCompareReasonCode.php (e.g. no_subjects_in_scope, coverage_unproven, evidence_capture_incomplete, rollout_disabled, no_drift_detected)
T008 [P] Add full-content rollout gate helper in app/Support/Baselines/BaselineFullContentRolloutGate.php (reads config('tenantpilot.baselines.full_content_capture.enabled'), provides an assertEnabled() used by services + jobs)
T009 [P] Add resume token contract in app/Support/Baselines/BaselineEvidenceResumeToken.php (versioned encode/decode; stored as opaque string in operation_runs.context.*.resume_token)
T010 [P] Add policy snapshot redactor in app/Services/Intune/PolicySnapshotRedactor.php (remove secrets/PII from payload/assignments/scope tags before persistence + hashing)
T011 [P] Add redaction coverage test in tests/Feature/Intune/PolicySnapshotRedactionTest.php (assert stored PolicyVersion.snapshot is redacted and content hash uses redacted content)
T012 Add migration for baseline_profiles.capture_mode in database/migrations/2026_03_03_100001_add_capture_mode_to_baseline_profiles_table.php
T013 [P] Add migration for baseline_snapshot_items.subject_key + index in database/migrations/2026_03_03_100002_add_subject_key_to_baseline_snapshot_items_table.php
T014 [P] Add migration for policy_versions.capture_purpose, policy_versions.operation_run_id, policy_versions.baseline_profile_id + indexes in database/migrations/2026_03_03_100003_add_baseline_purpose_to_policy_versions_table.php
T015 Update app/Models/BaselineProfile.php to store/cast capture_mode via BaselineCaptureMode and include it in $fillable (default: opportunistic)
T016 [P] Update factory defaults/states for capture mode in database/factories/BaselineProfileFactory.php
T017 [P] Update database/factories/BaselineSnapshotItemFactory.php to set subject_key derived from meta_jsonb.display_name via BaselineSubjectKey and set subject_external_id using the workspace-safe subject id (no tenant external IDs)
T018 Update app/Models/PolicyVersion.php to cast capture_purpose and define relationships to OperationRun + BaselineProfile (new nullable FKs)
T019 [P] Update database/factories/PolicyVersionFactory.php to default capture_purpose to backup
T020 Update app/Services/Intune/VersionService.php to apply PolicySnapshotRedactor before persistence/hashing and persist capture_purpose, operation_run_id, and baseline_profile_id when capturing versions (including via captureFromGraph())
T021 Update app/Services/Intune/PolicyCaptureOrchestrator.php to pass baseline-purpose attribution into PolicyVersion creation/reuse/backfill and ensure snapshot dedupe uses redacted payloads (no secrets/PII in stored snapshots)
T022 Update content hashing to include settings + assignments + scope tags in app/Services/Baselines/Evidence/ContentEvidenceProvider.php (use SettingsNormalizer, hash normalized assignments, and hash normalized scope-tag IDs via ScopeTagsNormalizer)
T023 Ensure content evidence provenance includes policy_version_id, operation_run_id, and capture_purpose in app/Services/Baselines/Evidence/ContentEvidenceProvider.php (tenant-scoped only; snapshot items must strip tenant identifiers)
T024 Implement quota-aware baseline evidence capture phase scaffold in app/Services/Baselines/BaselineContentCapturePhase.php (inputs: tenant + subjects + purpose + budgets incl. concurrency + optional resume token; outputs: stats + gaps + optional resume token)
T025 Update run start context to include target_scope + capture_mode and enforce rollout gate for full_content in app/Services/Baselines/BaselineCaptureService.php (reject start if disabled)
T026 [P] Update run start context to include target_scope + capture_mode and enforce rollout gate for full_content in app/Services/Baselines/BaselineCompareService.php (reject start if disabled)
T027 Add capture mode field + badge to Filament baseline profile CRUD in app/Filament/Resources/BaselineProfileResource.php (hide/disable full_content option when rollout flag is disabled)

Checkpoint: DB + enums + capture phase scaffolding are in place; user stories can be implemented and tested independently.

Phase 3: User Story 1 — Capture a full-content baseline without per-policy steps (Priority: P1) 🎯 MVP

Goal: Capture a baseline snapshot that uses full-content evidence by default (with explicit gaps + warnings if capture is incomplete).

Independent Test: Create a baseline profile configured for full-content capture, run “Capture baseline (full content)”, and validate the snapshot items have content-fidelity evidence (or explicit gaps) and the run context records capture stats.

Tests (write first)

T028 [P] [US1] Add baseline full-content on-demand evidence test in tests/Feature/BaselineDriftEngine/CaptureBaselineFullContentOnDemandTest.php (no PolicyVersion exists → capture creates one with capture_purpose=baseline_capture and snapshot item fidelity is content)
T029 [P] [US1] Update meta-fallback test to assert opportunistic mode degrades to meta when evidence is missing in tests/Feature/BaselineDriftEngine/CaptureBaselineMetaFallbackTest.php
T030 [P] [US1] Update capture start surface expectations for full-content labeling + rollout gating in tests/Feature/Filament/BaselineProfileCaptureStartSurfaceTest.php
T031 [P] [US1] Add snapshot item isolation test in tests/Feature/BaselineDriftEngine/BaselineSnapshotNoTenantIdentifiersTest.php (assert baseline_snapshot_items do not store tenant external IDs and meta_jsonb omits tenant identifiers like meta_contract.subject_external_id and evidence.observed_operation_run_id)
T032 [P] [US1] Add audit event coverage for baseline capture start/completion in tests/Feature/BaselineDriftEngine/BaselineCaptureAuditEventsTest.php (assert action metadata includes purpose, scope counts, and gap/warning summary)

Implementation

T033 [US1] Update baseline capture action labeling + modal copy + rollout gate messaging in app/Filament/Resources/BaselineProfileResource/Pages/ViewBaselineProfile.php (show “Capture baseline (full content)” when capture_mode=full_content)
T034 [US1] Integrate BaselineContentCapturePhase into baseline capture in app/Jobs/CaptureBaselineSnapshotJob.php (purpose baseline_capture, budgeted, record context.baseline_capture.evidence_capture, context.baseline_capture.gaps, context.baseline_capture.resume_token, and add job-level rollout gate guard)
T035 [US1] Persist subject_key and workspace-safe subject_external_id (derived via BaselineSubjectKey) when building snapshot items, and sanitize meta_jsonb to exclude tenant identifiers in app/Jobs/CaptureBaselineSnapshotJob.php
T036 [US1] Update baseline snapshot identity hashing to use policy_type + subject_key + baseline_hash in app/Services/Baselines/BaselineSnapshotIdentity.php (dedupe must not depend on tenant-specific external IDs)
T037 [US1] Ensure capture run status/outcome transitions go through OperationRunService and mark warnings (OperationRunOutcome::PartiallySucceeded) when any subject falls back to meta or is skipped in app/Jobs/CaptureBaselineSnapshotJob.php
T038 [US1] Expand capture audit events to include purpose, scope counts, evidence capture stats, and gap/warning summary in app/Jobs/CaptureBaselineSnapshotJob.php
T039 [US1] Add snapshot fidelity + gaps counts into baseline_snapshots.summary_jsonb for snapshot list/detail UX in app/Jobs/CaptureBaselineSnapshotJob.php

Parallel execution example (US1):

Developer A: T028, T034, T036
Developer B: T030, T033, T035, T038

Checkpoint: A baseline snapshot can be captured in full-content mode without per-policy steps, and runs are explainable when gaps exist.

Phase 4: User Story 2 — Compare now with full content and get explainable drift (Priority: P1)

Goal: Compare baseline vs current using content-first evidence refresh, cross-tenant subject matching, and explainable run context.

Independent Test: Capture a full-content baseline, simulate a settings-only change for a subject, run “Compare now (full content)”, and assert a “different version” finding exists with content provenance.

Tests (write first)

T040 [P] [US2] Add cross-tenant match test (policy_type + subject_key) in tests/Feature/Baselines/BaselineCompareCrossTenantMatchTest.php
T041 [P] [US2] Add ambiguous match suppression test in tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php (duplicate subject_key values → evidence gap; no finding)
T042 [P] [US2] Add coverage proof guard test in tests/Feature/Baselines/BaselineCompareCoverageProofGuardTest.php (uncovered types suppress missing_policy outcomes; run completes with warnings + records context)
T043 [P] [US2] Add stable recurrence identity test in tests/Feature/Baselines/BaselineCompareFindingRecurrenceKeyTest.php (recurrence key independent of hashes; retries don’t duplicate; lifecycle fields update)
T044 [P] [US2] Update compare start surface expectations for full-content labeling + rollout gating in tests/Feature/Filament/BaselineCompareLandingStartSurfaceTest.php
T045 [P] [US2] Add baseline profile “Compare now (full content)” start-surface test in tests/Feature/Filament/BaselineProfileCompareStartSurfaceTest.php
T046 [P] [US2] Add audit event coverage for baseline compare start/completion in tests/Feature/Baselines/BaselineCompareAuditEventsTest.php (purpose, scope counts, gaps/warnings summary)

Implementation

T047 [US2] Add “Compare now (full content)” header action to baseline profile view in app/Filament/Resources/BaselineProfileResource/Pages/ViewBaselineProfile.php (select target tenant; require tenant.sync; enforce rollout gate server-side)
T048 [US2] Integrate BaselineContentCapturePhase refresh into compare in app/Jobs/CompareBaselineToTenantJob.php (purpose baseline_compare, budgeted, record context.baseline_compare.evidence_capture, context.baseline_compare.evidence_gaps, context.baseline_compare.resume_token, and add job-level rollout gate guard)
T049 [US2] Switch compare matching to policy_type + subject_key in app/Jobs/CompareBaselineToTenantJob.php (load baseline items by subject_key; compute current subject_key from inventory display name; detect missing/empty/duplicate keys on either side; record gap reasons; suppress drift evaluation for those keys)
T050 [US2] Enforce coverage proof guard behavior in app/Jobs/CompareBaselineToTenantJob.php (suppress missing_policy for uncovered/unproven types; record warning + BaselineCompareReasonCode when suppression affects outcomes)
T051 [US2] Update finding recurrence identity to be stable and independent of hashes in app/Jobs/CompareBaselineToTenantJob.php (recurrence key uses tenant_id + baseline_profile_id + policy_type + subject_key + change_type; retries must not duplicate findings)
T052 [US2] Ensure findings carry subject_key + display_name fallbacks in evidence_jsonb and update subject display name fallback logic in app/Filament/Resources/FindingResource.php (COALESCE inventory display name with evidence display name)
T053 [US2] Ensure compare run context contains scope totals, processed counts, coverage proof status, fidelity breakdown, evidence capture stats, and top gap reasons in app/Jobs/CompareBaselineToTenantJob.php
T054 [US2] Update baseline compare landing to label “Compare now (full content)” when applicable in app/Filament/Pages/BaselineCompareLanding.php and resources/views/filament/pages/baseline-compare-landing.blade.php
T055 [US2] Extend stats DTO to surface fidelity + evidence gap summary from run context in app/Support/Baselines/BaselineCompareStats.php
T056 [US2] Add evidence capture + gaps panels for baseline capture/compare runs in Monitoring detail in app/Filament/Resources/OperationRunResource.php
T057 [US2] Expand compare audit events to include purpose, scope counts, evidence capture stats, and gaps/warnings summary in app/Jobs/CompareBaselineToTenantJob.php

Parallel execution example (US2):

Developer A: T040, T048, T050, T056
Developer B: T045, T047, T054, T055, T052

Checkpoint: Compare runs refresh evidence when needed, generate findings reliably, and provide explainable context even with coverage warnings or gaps.

Phase 5: User Story 3 — Throttling-safe, resumable evidence capture (Priority: P1)

Goal: Evidence capture respects quotas, records a resume token, and resumes deterministically without duplicating work.

Independent Test: Simulate throttling/budget exhaustion, verify run records a resume token, then resume and complete without re-capturing already-captured subjects.

Tests (write first)

T058 [P] [US3] Add “budget exhaustion produces resume token” test in tests/Feature/Baselines/BaselineCompareResumeTokenTest.php
T059 [P] [US3] Add “resume is idempotent” test in tests/Feature/Baselines/BaselineCompareResumeIdempotencyTest.php
T060 [P] [US3] Add resume token contract test in tests/Feature/Baselines/BaselineEvidenceResumeTokenContractTest.php (token is opaque; decode yields deterministic resume state)
T061 [P] [US3] Add run-detail resume action test in tests/Feature/Filament/OperationRunResumeCaptureActionTest.php
T062 [P] [US3] Add audit event coverage for resume capture in tests/Feature/Baselines/BaselineResumeCaptureAuditEventsTest.php

Implementation

T063 [US3] Implement budgets (items-per-run + concurrency + retries) + retry/backoff/jitter + throttling gap reasons + resume cursor handling in app/Services/Baselines/BaselineContentCapturePhase.php (use BaselineEvidenceResumeToken encode/decode)
T064 [US3] Add resume starter service in app/Services/Baselines/BaselineEvidenceCaptureResumeService.php (start follow-up baseline_capture/baseline_compare runs from a prior run + resume token; enforce RBAC; write audit events)
T065 [US3] Add “Resume capture” header action for eligible runs in app/Filament/Pages/Operations/TenantlessOperationRunViewer.php (requires confirmation; uses Ops-UX queued toast + canonical view-run link)
T066 [US3] Wire resume token consumption + re-emission into app/Jobs/CaptureBaselineSnapshotJob.php (baseline capture) and app/Jobs/CompareBaselineToTenantJob.php (baseline compare)

Parallel execution example (US3):

Developer A: T058, T063, T066
Developer B: T061, T064, T065

Checkpoint: Operators can safely complete large scopes via resumable capture without manual per-policy capture.

Phase 6: User Story 4 — “Why no findings?” is always clear (Priority: P2)

Goal: Zero findings never looks like a silent failure; compare run detail clearly explains the outcome.

Independent Test: Run compare with zero subjects (or with suppressed findings due to coverage/gaps) and verify a clear explanation sourced from run context is displayed.

Tests (write first)

T067 [P] [US4] Add reason-code coverage test for zero-subject / zero-findings / suppressed-by-coverage outcomes in tests/Feature/Baselines/BaselineCompareWhyNoFindingsReasonCodeTest.php
T068 [P] [US4] Add UI assertion test for “why no findings” messaging in tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php

Implementation

T069 [US4] Populate context.baseline_compare.reason_code for all 0-subject / 0-findings outcomes in app/Jobs/CompareBaselineToTenantJob.php (use BaselineCompareReasonCode, including coverage_unproven/rollout_disabled where applicable)
T070 [US4] Render reason-code explanation + evidence context in Monitoring run detail in app/Filament/Resources/OperationRunResource.php
T071 [US4] Replace “All clear” copy with reason-aware messaging on baseline compare landing in resources/views/filament/pages/baseline-compare-landing.blade.php (source reason code from BaselineCompareStats)
T072 [US4] Propagate reason code + human message from run context in app/Support/Baselines/BaselineCompareStats.php

Parallel execution example (US4):

Developer A: T067, T069
Developer B: T068, T071, T072

Checkpoint: Every compare run with “0 findings” has a clear, user-visible explanation and supporting evidence context.

Phase 7: Polish & Cross-Cutting Concerns

Purpose: Guardrails, visibility, and validation across all stories.

T073 [P] Add Spec 118 no-legacy regression guard(s) in tests/Feature/Guards/Spec118NoLegacyBaselineDriftGuardTest.php (assert capture/compare do not implement hashing outside the provider/hasher pipeline and do not reference deprecated helpers)
T074 Update PolicyVersion listing to hide baseline-purpose evidence by default (unless the actor has tenant.sync or tenant_findings.view) in app/Filament/Resources/PolicyVersionResource.php
T075 [P] Add visibility/authorization coverage for baseline-purpose PolicyVersions in tests/Feature/Filament/PolicyVersionBaselineEvidenceVisibilityTest.php (assert baseline-purpose rows are hidden for tenant.view-only actors)
T076 Implement baseline-purpose PolicyVersion retention enforcement in app/Console/Commands/PruneBaselineEvidencePolicyVersionsCommand.php and schedule it in routes/console.php (prune baseline_capture/baseline_compare older than configured retention; do not prune backup) + tests in tests/Feature/Retention/PruneBaselineEvidencePolicyVersionsTest.php and tests/Feature/Scheduling/PruneBaselineEvidencePolicyVersionsScheduleTest.php
T077 Add Baseline Snapshot list/detail surfaces with fidelity visibility in app/Filament/Resources/BaselineSnapshotResource.php, app/Filament/Resources/BaselineSnapshotResource/Pages/ListBaselineSnapshots.php, and app/Filament/Resources/BaselineSnapshotResource/Pages/ViewBaselineSnapshot.php (badge + counts by fidelity; “captured with gaps” state) + tests in tests/Feature/Filament/BaselineSnapshotFidelityVisibilityTest.php
T078 Run formatting on changed files using vendor/bin/sail bin pint --dirty --format agent (touchpoints include app/Jobs/CaptureBaselineSnapshotJob.php, app/Jobs/CompareBaselineToTenantJob.php, app/Services/Baselines/BaselineContentCapturePhase.php)
T079 Run targeted test suite from specs/118-baseline-drift-engine/quickstart.md and update it if any step is inaccurate in specs/118-baseline-drift-engine/quickstart.md

Dependencies & Execution Order

Story completion order

Phase 1 (Setup) → Phase 2 (Foundational) → user stories.
User stories after Phase 2:
- US1 (P1) is the MVP capture capability and should be implemented first end-to-end.
- US2 (P1) depends on US1 for end-to-end validation (a baseline snapshot must exist), but implementation can proceed in parallel after Phase 2.
- US3 (P1) depends on the capture phase being integrated in US1/US2.
- US4 (P2) depends on US2’s run-context fields.

Dependency graph

graph TD
  P1["Phase 1: Setup"] --> P2["Phase 2: Foundational"]
  P2 --> US1["US1: Capture baseline (full content)"]
  P2 --> US2["US2: Compare now (full content)"]
  US1 --> US2
  US2 --> US3["US3: Resumable capture"]
  US2 --> US4["US4: Why no findings"]
  US3 --> POLISH["Phase 7: Polish"]
  US4 --> POLISH

Implementation Strategy (MVP-first)

Ship US1 with a strict run-context contract and explicit gap reporting (no silent success).
Add US2 compare refresh + cross-tenant matching with explainability.
Harden with US3 resumability and throttle-safe behavior.
Complete operator trust with US4 reason-code UX.
Enforce “no legacy” and visibility constraints in Polish.

22 KiB Raw Blame History Unescape Escape

Tasks: Golden Master Deep Drift v2 (Full Content Capture)

Phase 1: Setup (Shared Infrastructure)

Phase 2: Foundational (Blocking Prerequisites)

Phase 3: User Story 1 — Capture a full-content baseline without per-policy steps (Priority: P1) 🎯 MVP

Tests (write first)

Implementation

Phase 4: User Story 2 — Compare now with full content and get explainable drift (Priority: P1)

Tests (write first)

Implementation

Phase 5: User Story 3 — Throttling-safe, resumable evidence capture (Priority: P1)

Tests (write first)

Implementation

Phase 6: User Story 4 — “Why no findings?” is always clear (Priority: P2)

Tests (write first)

Implementation

Phase 7: Polish & Cross-Cutting Concerns

Dependencies & Execution Order

Story completion order

Dependency graph

Implementation Strategy (MVP-first)

22 KiB

Raw Blame History