TenantAtlas/docs/research/golden-master-baseline-drift-deep-analysis.md
ahmido 7620144ab6 Spec 116: Baseline drift engine v1 (meta fidelity + coverage guard) (#141)
Implements Spec 116 baseline drift engine v1 (meta fidelity) with coverage guard, stable finding identity, and Filament UI surfaces.

Highlights
- Baseline capture/compare jobs and supporting services (meta contract hashing via InventoryMetaContract + DriftHasher)
- Coverage proof parsing + compare partial outcome behavior
- Filament pages/resources/widgets for baseline compare + drift landing improvements
- Pest tests for capture/compare/coverage guard and UI start surfaces
- Research report: docs/research/golden-master-baseline-drift-deep-analysis.md

Validation
- `vendor/bin/sail bin pint --dirty`
- `vendor/bin/sail artisan test --compact --filter="Baseline"`

Notes
- No destructive user actions added; compare/capture remain queued jobs.
- Provider registration unchanged (Laravel 11+/12 uses bootstrap/providers.php for panel providers; not touched here).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #141
2026-03-02 22:02:58 +00:00

42 KiB
Raw Blame History

Golden Master / Baseline Drift — Deep Settings-Drift (Content-Fidelity) Analysis

Enterprise Research Report for TenantAtlas / TenantPilot
Date: 2025-07-15
Scope: Architecture, code evidence, implementation proposal


Table of Contents

  1. Executive Summary
  2. System Map — Side-by-Side Comparison
  3. Architecture Decision Record (ADR-001): Unify vs Separate
  4. Deep-Dive: Why Settings Changes Don't Produce Baseline Drift
  5. Code Evidence Table
  6. Type Coverage Matrix
  7. Proposal: Deep Drift Implementation Plan
  8. Test Plan (Enterprise)
  9. Open Questions / Assumptions
  10. Key Questions Answered (KQ-01 through KQ-06)

1. Executive Summary

  1. Two parallel drift systems exist: Baseline Compare (meta fidelity, inventory-sourced) and Backup Drift (content fidelity, PolicyVersion-sourced). They share DriftHasher but are otherwise separate data paths with separate finding generators.

  2. The core gap: CompareBaselineToTenantJob hashes InventoryMetaContract v1 — which contains only odata_type, etag, scope_tag_ids, assignment_target_count — never actual policy settings. When an admin changes a Wi-Fi password or a compliance threshold in Intune, none of these meta signals necessarily change.

  3. Inventory sync uses Graph LIST endpoints, which return metadata and display fields only. Per-item GET (which fetches settings, assignments, scope tags) is only performed during Backup via PolicyCaptureOrchestrator.

  4. DriftFindingGenerator (the backup drift system) does detect settings changes — it normalizes PolicyVersion.snapshot via SettingsNormalizerPolicyNormalizer::flattenForDiff() → type-specific normalizers, then hashes with DriftHasher.

  5. Spec 116 already designs v2 with a provider precedence chain (PolicyVersion → Inventory content → Meta fallback), which is the correct architectural direction. The v1 meta baseline shipped first as a deliberate, safe-to-ship initial milestone.

  6. Unification is recommended (provider chain approach) — not merging the two jobs, but enabling CompareBaselineToTenantJob to optionally consume PolicyVersion snapshots as a content-fidelity provider, falling back to InventoryMetaContract when no PolicyVersion is available.

  7. 28 supported policy types are registered in tenantpilot.php, plus 3 foundation types. Of these, 10+ have complex hydration (settings catalog, group policy, security baselines, compliance actions) and would benefit most from deep-drift detection.

  8. The etag signal is unreliable as a settings-change proxy: Microsoft Graph etag semantics vary per resource type, and etag may or may not change when settings are modified. It is useful as a hint but not a guarantee.

  9. API cost is the primary constraint: content-fidelity compare requires per-item GET calls (or a recent Backup that already captured PolicyVersions). The hybrid provider chain avoids this by opportunistically reusing existing PolicyVersions without requiring a full backup before every compare.

  10. Coverage Guard is critical for v2: the baseline system must know which types have fresh PolicyVersions and suppress content-fidelity findings for types where no recent version exists (falling back to meta fidelity).

  11. Risk profile: Shipping deep-drift for wrong types (without proper per-type normalization) could produce false positives. Type-specific normalizers already exist for the backup drift path; reusing them is safe.

  12. Recommended phasing: v1.5 (current sprint) = add content_hash_source column to baseline_snapshot_items + provider chain in compare job. v2.0 = on-demand per-item GET during baseline capture for types lacking recent PolicyVersions.


2. System Map

Side-by-Side Comparison Table

Dimension System A: Baseline Compare System B: Backup Drift
Entry point CompareBaselineToTenantJob GenerateDriftFindingsJobDriftFindingGenerator
Data source (current) InventoryItem (from LIST sync) PolicyVersion (from per-item GET backup)
Data source (baseline) BaselineSnapshotItem (captured from inventory) Earlier PolicyVersion from prior OperationRun
Hash contract InventoryMetaContract v1 → DriftHasher::hashNormalized() SettingsNormalizerPolicyNormalizer::flattenForDiff()DriftHasher::hashNormalized()
Hash inputs version, policy_type, subject_external_id, odata_type, etag, scope_tag_ids, assignment_target_count Full PolicyVersion.snapshot JSON (with volatile key removal)
Fidelity meta (persisted as fidelity='meta' in snapshot context) content (settings + assignments + scope_tags)
Dimensions detected missing_policy, different_version, unexpected_policy policy_snapshot (added/removed/modified), policy_assignments (modified), policy_scope_tags (modified)
Finding identity recurrence_key = sha256(tenantId|snapshotId|policyType|extId|changeType) recurrence_key = sha256(drift:tenantId:scopeKey:subjectType:extId:dimension)
Scope key baseline_profile:{profileId} DriftScopeKey::fromSelectionHash()
Auto-close BaselineAutoCloseService (stale finding resolution) resolveStaleDriftFindings() within DriftFindingGenerator
Coverage guard InventoryCoverage::fromContext() → uncovered types → partial outcome None (trusts backup captured all types)
Graph API calls Zero at compare time (reads from DB) Zero at compare time (reads PolicyVersions from DB)
Graph API calls (capture) Zero (inventory sync did LIST) Per-item GET via PolicyCaptureOrchestrator
Normalizer pipeline None (meta contract is the normalization) SettingsNormalizerPolicyNormalizer → type normalizers
Shared components DriftHasher, Finding model DriftHasher, Finding model
Trigger After inventory sync, on schedule/manual After backup, on schedule/manual

Data Flow Diagrams

SYSTEM A — Baseline Compare (Meta Fidelity)
============================================
Graph LIST ──► InventorySyncService ──► InventoryItem (meta_jsonb)
                                             │
                                             ▼
                                    CaptureBaselineSnapshotJob
                                    ├─ InventoryMetaContract.build()
                                    ├─ DriftHasher.hashNormalized()
                                    └─► BaselineSnapshotItem (baseline_hash)
                                             │
                                             ▼
                                    CompareBaselineToTenantJob
                                    ├─ loadCurrentInventory() → InventoryItem
                                    ├─ BaselineSnapshotIdentity.hashItemContent()
                                    │   └─ InventoryMetaContract.build()
                                    │   └─ DriftHasher.hashNormalized()
                                    ├─ computeDrift() → hash compare
                                    └─ upsertFindings() → Finding records


SYSTEM B — Backup Drift (Content Fidelity)
============================================
Graph GET ──► PolicySnapshotService.fetch() ──► full JSON snapshot
                     │
                     ▼
              PolicyCaptureOrchestrator.capture()
              ├─ assignments GET
              ├─ scope tags resolve
              └─► VersionService.captureVersion() ──► PolicyVersion
                                                         │
                                                         ▼
                                               DriftFindingGenerator.generate()
                                               ├─ versionForRun() → baseline/current PV
                                               ├─ SettingsNormalizer.normalizeForDiff()
                                               │   └─ PolicyNormalizer.flattenForDiff()
                                               ├─ DriftHasher.hashNormalized() × 3
                                               │   (snapshot, assignments, scope_tags)
                                               └─ upsertDriftFinding() → Finding records

3. ADR-001: Unify vs Separate

Title

ADR-001: Golden Master Baseline Compare — Provider Chain for Content Fidelity

Status

PROPOSED

Context

TenantPilot has two drift detection systems that evolved independently:

  • System A (Baseline Compare): Designed for "does the tenant still match the golden master?" Use case. Ships with meta-fidelity (v1) — fast, cheap, zero additional Graph calls at compare time. Detects structural drift (policy added/removed/meta-changed) but is blind to settings changes.

  • System B (Backup Drift): Designed for "what changed between two backup points?" Use case. Content-fidelity — full PolicyVersion snapshots with per-type normalization. Detects settings, assignments, and scope tag changes.

The two systems cannot be merged into one without fundamentally changing their triggering, scoping, and API cost models. However, System A's accuracy can be dramatically improved by consuming data already produced by System B.

Decision

Adopt the Provider Chain pattern as already designed in Spec 116 v2:

ContentProvider = PolicyVersion → InventoryContent → MetaFallback

Specifically:

  1. CompareBaselineToTenantJob gains a ContentProviderChain that, for each (policy_type, external_id):

    • First: Looks for a PolicyVersion captured since the last baseline snapshot timestamp. If found, normalizes via SettingsNormalizerDriftHasher → returns content fidelity hash.
    • Second (future): Looks for enriched inventory content if inventory sync is upgraded to capture settings (v2.0+).
    • Fallback: Builds InventoryMetaContract v1 → DriftHasher → returns meta fidelity hash.
  2. Each baseline snapshot item records its fidelity (meta | content) and content_hash_source (inventory_meta_v1 | policy_version:{id} | inventory_content_v2).

  3. Compare findings carry fidelity in evidence, enabling UI to display confidence level.

  4. Coverage Guard is extended: a type is content-covered only if PolicyVersions exist for ≥N% of items. Below that threshold, fallback to meta fidelity (do not suppress).

Consequences

  • Positive: No new Graph API calls needed (reuses existing PolicyVersions from backups). Zero additional infrastructure. Incremental rollout per policy type. Existing meta-fidelity behavior preserved as fallback.
  • Negative: Content fidelity depends on backup recency. If a tenant hasn't been backed up, only meta fidelity is available. Could create "mixed fidelity" findings within a single compare run.
  • Rejected Alternative: Full merge of System A and B into a single system. Rejected because they serve different use cases (golden master comparison vs point-in-time drift), have different scoping models (BaselineProfile vs selection_hash), and different triggering models (post-inventory-sync vs post-backup).
  • Rejected Alternative: Always-GET during baseline compare. Rejected due to API cost (30+ types × 100s of policies = 1000s of GET calls per tenant per compare run).

Compliance Notes

  • Livewire v4.0+ / Filament v5: no UI changes in core ADR; provider chain is purely backend.
  • Provider registration: n/a (backend services only).
  • No destructive actions.
  • Asset strategy: no new assets.

4. Deep-Dive: Why Settings Changes Don't Produce Baseline Drift

The Root Cause Chain

Step 1: Inventory Sync captures only LIST metadata

InventorySyncService::executeSelectionUnderLock() (line ~340-450) calls Graph LIST endpoints. For each policy, it extracts:

  • display_name, category, platform (display fields)
  • odata_type, etag, scope_tag_ids, assignment_target_count (meta signals)

These are stored in InventoryItem.meta_jsonb. No settings values are fetched or stored.

Step 2: Baseline Capture hashes only the Meta Contract

CaptureBaselineSnapshotJob::collectSnapshotItems() reads from InventoryItem, then calls BaselineSnapshotIdentity::hashItemContent():

// BaselineSnapshotIdentity.php, line 56-67
public function hashItemContent(string $policyType, string $subjectExternalId, array $metaJsonb): string
{
    $contract = $this->metaContract->build(
        policyType: $policyType,
        subjectExternalId: $subjectExternalId,
        metaJsonb: $metaJsonb,
    );
    return $this->hasher->hashNormalized($contract);
}

The InventoryMetaContract::build() output is:

[
    'version'                => 1,
    'policy_type'            => 'settingsCatalogPolicy',
    'subject_external_id'    => '<guid>',
    'odata_type'             => '#microsoft.graph.deviceManagementConfigurationPolicy',
    'etag'                   => '"abc..."',           // ← unreliable change indicator
    'scope_tag_ids'          => ['0'],
    'assignment_target_count' => 3,
]

This is ALL that gets hashed. Actual policy settings (the Wi-Fi password, the compliance threshold, the firewall rule) are nowhere in this contract.

Step 3: Baseline Compare re-computes the same meta hash

CompareBaselineToTenantJob::loadCurrentInventory() (line 367-409) reads current InventoryItem records and calls the same BaselineSnapshotIdentity::hashItemContent() with the same InventoryMetaContract, producing the same hash structure.

computeDrift() (line 435-500) then compares baseline_hash vs current_hash:

if ($baselineItem['baseline_hash'] !== $currentItem['current_hash']) {
    $drift[] = ['change_type' => 'different_version', ...];
}

If the admin changed a policy setting but the meta signals (etag, scope_tag_ids, assignment_target_count) stayed the same, baseline_hash === current_hash and NO drift is detected.

Why etag is unreliable

Microsoft Graph etag behavior varies by resource type:

  • Some types update etag on any property change (including settings)
  • Some types update etag only on top-level property changes (not nested settings)
  • Settings Catalog policies may or may not update the parent resource etag when child settings are modified (the settings are a separate subresource at /configurationPolicies/{id}/settings)
  • Group Policy Configurations have settings in definitionValuespresentationValues (multi-level nesting); etag at root level may not reflect these changes

The Contrast: How Backup Drift Does Detect Settings Changes

DriftFindingGenerator::generate() (line 32-80) operates on PolicyVersion.snapshot — the full JSON captured via per-item GET:

$baselineSnapshot = $baselineVersion->snapshot;  // Full JSON from Graph GET
$currentSnapshot  = $currentVersion->snapshot;

$baselineNormalized = $this->settingsNormalizer->normalizeForDiff($baselineSnapshot, $policyType, $platform);
$currentNormalized  = $this->settingsNormalizer->normalizeForDiff($currentSnapshot, $policyType, $platform);

$baselineSnapshotHash = $this->hasher->hashNormalized($baselineNormalized);
$currentSnapshotHash  = $this->hasher->hashNormalized($currentNormalized);

if ($baselineSnapshotHash !== $currentSnapshotHash) {
    // → Drift finding with change_type = 'modified'
}

This pipeline captures actual settings values, normalizes them per policy type, strips volatile metadata, and hashes the result. If a setting changed, the hash changes, and drift is detected.

Summary Visualization

Admin changes Wi-Fi password in Intune
          │
          ▼
┌─────────────────────────────────┐
│ Graph LIST (inventory sync)     │
│ returns: displayName, etag, ... │
│                                 │
│ etag MAY change, settings NOT   │
│ returned by LIST endpoint       │
└────────────┬────────────────────┘
             │
     ┌───────┴────────┐
     ▼                ▼
 InventoryItem    PolicyVersion
 (meta only)      (if backup ran)
     │                │
     ▼                ▼
 Meta Contract    Full Snapshot
 hash unchanged   hash CHANGED
     │                │
     ▼                ▼
 Baseline         Backup Drift:
 Compare:         "modified" ✅
 NO DRIFT ❌

5. Code Evidence Table

# Class / File Lines Role Key Finding
1 app/Jobs/CompareBaselineToTenantJob.php 785 total; L367-409 (loadCurrentInventory), L435-500 (computeDrift) Core baseline compare job Reads from InventoryItem only; hashes via InventoryMetaContractblind to settings
2 app/Services/Baselines/InventoryMetaContract.php 75 total; L30-57 (build) Meta hash contract builder Hashes only: version, policy_type, external_id, odata_type, etag, scope_tag_ids, assignment_target_count — no settings content
3 app/Services/Baselines/BaselineSnapshotIdentity.php 73 total; L56-67 (hashItemContent) Per-item hash via meta contract Delegates to InventoryMetaContract.build()DriftHasher.hashNormalized()
4 app/Jobs/CaptureBaselineSnapshotJob.php 305 total Captures snapshot from inventory Reads InventoryItem, stores fidelity='meta' and source='inventory'
5 app/Services/Drift/DriftFindingGenerator.php 484 total; L32-80 (generate), L250-267 (recurrenceKey) Backup drift finding generator Uses PolicyVersion.snapshot with SettingsNormalizerdetects settings changes
6 app/Services/Drift/DriftHasher.php 100 total; L13-24 (hashNormalized) Shared hasher sha256(json_encode(normalized)) with volatile key removal. SHARED by both systems.
7 app/Services/Drift/Normalizers/SettingsNormalizer.php 22 total Thin wrapper Delegates to PolicyNormalizer::flattenForDiff(). Used by System B only.
8 app/Services/Intune/PolicyNormalizer.php 67 total Type-specific normalizer router Routes to per-type normalizers for diff operations
9 app/Services/Inventory/InventorySyncService.php 652 total; L340-450 (executeSelectionUnderLock) LIST-based sync Fetches from Graph LIST endpoints; extracts meta signals only; upserts InventoryItem
10 app/Services/Intune/BackupService.php 438 total Backup orchestration Creates BackupSet, uses PolicyCaptureOrchestrator for per-item GET → PolicyVersion
11 app/Services/Intune/PolicyCaptureOrchestrator.php 429 total Per-item GET + hydration Fetches full snapshot, assignments, scope tags; creates PolicyVersion with all content
12 app/Services/Intune/PolicySnapshotService.php 852 total Per-item Graph GET Type-specific hydration (hydrateConfigurationPolicySettings, hydrateGroupPolicyConfiguration, etc.)
13 app/Services/Intune/VersionService.php 312 total; L1-150 (captureVersion) PolicyVersion persistence Transactional, locking, consecutive version_number
14 app/Models/PolicyVersion.php Model PolicyVersion model Casts: snapshot(array), assignments(array), scope_tags(array), plus hash columns
15 app/Models/InventoryItem.php Model Inventory item model Casts: meta_jsonb(array) — no settings content
16 app/Models/BaselineSnapshotItem.php Model Snapshot item model Has baseline_hash(64), meta_jsonb
17 app/Support/Inventory/InventoryCoverage.php 173 total Coverage parser fromContext() extracts per-type status from sync run context
18 app/Services/Drift/DriftRunSelector.php ~60 total Run pair selector Selects 2 most recent sync runs with same selection_hash (System B only)
19 app/Jobs/GenerateDriftFindingsJob.php ~200 total Dispatcher for System B Dispatches DriftFindingGenerator for policy-version-based drift
20 config/graph_contracts.php 867 total Policy type registry Defines endpoints, hydration strategies, subresources, type families per policy type
21 config/tenantpilot.php 385 total; L18-293 (supported_policy_types) Application config 28 supported policy types + 3 foundation types
22 specs/116-baseline-drift-engine/spec.md 237 total Feature spec Defines v1 (meta) and v2 (content fidelity) requirements
23 specs/116-baseline-drift-engine/research.md 200 total Phase 0 research 6 key decisions including v2 architecture strategy
24 specs/116-baseline-drift-engine/plan.md 259 total Implementation plan Steps 1-7 for v1; v2 deferred

6. Type Coverage Matrix

Coverage assessment for deep-drift feasibility: which types have per-type normalization and hydration support?

# policy_type Label Hydration Subresources Per-Type Normalizer Deep-Drift Feasible Notes
1 settingsCatalogPolicy Settings Catalog Policy configurationPolicies settings (list) Yes (via PolicyNormalizer) YES Most impactful — complex nested settings
2 endpointSecurityPolicy Endpoint Security Policies configurationPolicies settings (list) Yes (shared with settings catalog) YES Same endpoint family as settings catalog
3 securityBaselinePolicy Security Baselines configurationPolicies settings (list) Yes (shared) YES Same pipeline
4 groupPolicyConfiguration Administrative Templates groupPolicyConfigurations definitionValuespresentationValues Yes (via PolicyNormalizer) YES Multi-level nesting; hydration required
5 deviceConfiguration Device Configuration deviceConfigurations None (properties on root) Yes (via PolicyNormalizer) YES Properties directly on resource
6 deviceCompliancePolicy Device Compliance deviceCompliancePolicies scheduledActionsForRule (expand) Yes (via PolicyNormalizer) YES Actions subresource needs expand
7 windowsUpdateRing Software Update Ring deviceConfigurations (filtered) None (properties on root) Yes (shared with deviceConfig) YES Subset of deviceConfiguration
8 appProtectionPolicy App Protection (MAM) managedAppPolicies None (properties) Partial (via PolicyNormalizer) YES Mobile-focused
9 conditionalAccessPolicy Conditional Access identity/conditionalAccess/policies None (properties) Yes (via PolicyNormalizer) YES High-risk, preview-only restore
10 deviceManagementScript PowerShell Scripts deviceManagementScripts None (scriptContent base64) Partial PARTIAL Script content is base64 in snapshot
11 deviceShellScript macOS Shell Scripts deviceShellScripts None (scriptContent base64) Partial PARTIAL Same pattern as PS scripts
12 deviceHealthScript Proactive Remediations deviceHealthScripts None Partial PARTIAL Detection + remediation scripts
13 deviceComplianceScript Custom Compliance Scripts deviceComplianceScripts None Partial PARTIAL Script content
14 windowsFeatureUpdateProfile Feature Updates windowsFeatureUpdateProfiles None Yes YES Simple properties
15 windowsQualityUpdateProfile Quality Updates windowsQualityUpdateProfiles None Yes YES Simple properties
16 windowsDriverUpdateProfile Driver Updates windowsDriverUpdateProfiles None Yes YES Simple properties
17 mamAppConfiguration App Config (MAM) targetedManagedAppConfigurations None Partial YES Properties-based
18 managedDeviceAppConfiguration App Config (Device) mobileAppConfigurations None Partial YES Properties-based
19 windowsAutopilotDeploymentProfile Autopilot Profiles windowsAutopilotDeploymentProfiles None Minimal YES Properties-based
20 windowsEnrollmentStatusPage Enrollment Status Page deviceEnrollmentConfigurations None Minimal META-ONLY Enrollment types have limited settings
21 deviceEnrollmentLimitConfiguration Enrollment Limits deviceEnrollmentConfigurations None Minimal META-ONLY Numeric limit only
22 deviceEnrollmentPlatformRestrictionsConfiguration Platform Restrictions deviceEnrollmentConfigurations None Minimal META-ONLY Nested restriction config
23 deviceEnrollmentNotificationConfiguration Enrollment Notifications deviceEnrollmentConfigurations None Minimal META-ONLY Template snapshots nested
24 enrollmentRestriction Enrollment Restrictions deviceEnrollmentConfigurations None Minimal META-ONLY Mixed config type
25 termsAndConditions Terms & Conditions termsAndConditions None Yes YES bodyText, acceptanceStatement
26 endpointSecurityIntent Endpoint Security Intents intents categories/settings (legacy) Partial PARTIAL Legacy intent API; migrating to configPolicies
27 mobileApp Applications mobileApps None Minimal META-ONLY Metadata-only backup per config
28 policySet Policy Sets (if supported) assignments Minimal META-ONLY Container for other policies

Foundation Types:

# foundation_type Label Deep-Drift Notes
F1 assignmentFilter Assignment Filter YES rule property is key content
F2 roleScopeTag Scope Tag META-ONLY displayName + description only
F3 notificationMessageTemplate Notification Template PARTIAL Localized messages are subresource

Summary:

  • Full content-fidelity feasible: 16 types (settingsCatalog, endpointSecurity, securityBaseline, groupPolicy, deviceConfig, compliance, updateRings/profiles, appProtection, conditionalAccess, appConfigs, autopilot, termsAndConditions, assignmentFilter)
  • Partial (script content / legacy APIs): 5 types
  • Meta-only sufficient: 7 types (enrollment configs, mobileApp, roleScopeTag)

7. Proposal: Deep Drift Implementation Plan

Phase v1.5 — Provider Chain (Opportunistic Content Fidelity)

Goal: Enable baseline compare to use existing PolicyVersions for content-fidelity hash when available, with meta-fidelity fallback.

Estimated effort: 3-5 days

Step 1: ContentHashProvider Interface

// app/Contracts/Baselines/ContentHashProvider.php
interface ContentHashProvider
{
    /**
     * @return array{hash: string, fidelity: string, source: string}|null
     */
    public function resolve(string $policyType, string $externalId, int $tenantId, CarbonImmutable $since): ?array;
}

Step 2: PolicyVersionContentProvider

// app/Services/Baselines/PolicyVersionContentProvider.php
// Looks up the latest PolicyVersion for (tenant_id, external_id, policy_type)
// captured_at >= $since (baseline snapshot timestamp)
// Returns SettingsNormalizer → DriftHasher hash with fidelity='content'

Step 3: MetaFallbackProvider (existing logic)

// Wraps InventoryMetaContract → DriftHasher → fidelity='meta'

Step 4: ContentProviderChain

// Iterates [PolicyVersionContentProvider, MetaFallbackProvider]
// Returns first non-null result

Step 5: Integration in CompareBaselineToTenantJob

  • loadCurrentInventory() accepts optional ContentProviderChain
  • For each item: try chain, record fidelity + source
  • computeDrift() unchanged (still hash vs hash comparison)
  • Finding evidence includes fidelity and content_hash_source

Step 6: CaptureBaselineSnapshotJob enhancement

  • Optional: during capture, also try PolicyVersionContentProvider to store content-fidelity baseline_hash
  • Store content_hash_source in baseline_snapshot_items.meta_jsonb
  • This means: if a backup was taken before baseline capture, the baseline itself is content-fidelity

Step 7: Coverage extension

  • Add content_coverage to compare run context: which types had PolicyVersions, which fell back to meta
  • Display in operation detail UI

Migration

-- Optional: add column for source tracking
ALTER TABLE baseline_snapshot_items
    ADD COLUMN content_hash_source VARCHAR(255) NULL DEFAULT 'inventory_meta_v1';

Phase v2.0 — On-Demand Content Capture (Future)

Goal: For types without recent PolicyVersions, perform targeted per-item GET during baseline capture/compare.

Estimated effort: 5-8 days

  • Introduce BaselineContentCaptureJob that, for a given baseline profile's scope, identifies items lacking recent PolicyVersions and performs targeted GET + PolicyVersion creation.
  • Reuses existing PolicyCaptureOrchestrator with a new "baseline-triggered" context.
  • Adds capture_mode to baseline profile: meta_only (v1), opportunistic (v1.5), full_content (v2.0).
  • Rate limiting: per-tenant throttle to avoid Graph API quota issues.
  • Budget guard: max N items per capture run, with continuation support.

Phase v2.5 — Inventory Content Enrichment (Future, Optional)

Goal: Optionally have inventory sync capture settings content inline during LIST (where type supports $expand).

  • Some types support $expand=settings on LIST (settings catalog, endpoint security).
  • This would give "free" content fidelity without per-item GET.
  • High complexity: varies per type, may increase LIST payload size significantly.
  • Evaluate ROI after v2.0 ships.

8. Test Plan (Enterprise)

Unit Tests

# Test File Scope Key Assertions
U1 tests/Unit/Baselines/ContentProviderChainTest.php Provider chain resolution First provider wins; null fallback; fidelity recorded correctly
U2 tests/Unit/Baselines/PolicyVersionContentProviderTest.php PolicyVersion lookup + normalization Correct hash for known snapshot; returns null when no PV; respects $since cutoff
U3 tests/Unit/Baselines/MetaFallbackProviderTest.php Meta contract fallback Produces fidelity='meta'; matches existing InventoryMetaContract behavior exactly
U4 tests/Unit/Baselines/InventoryMetaContractTest.php (Existing) contract stability Null handling, ordering, versioning — extend for edge cases

Feature Tests

# Test File Scope Key Assertions
F1 tests/Feature/Baselines/BaselineCompareContentFidelityTest.php End-to-end compare with PolicyVersions available Settings change → different_version finding with fidelity='content'
F2 tests/Feature/Baselines/BaselineCompareMixedFidelityTest.php Some types have PV, some don't Mixed fidelity values in findings; coverage context records both
F3 tests/Feature/Baselines/BaselineCompareFallbackTest.php No PolicyVersions available Falls back to meta fidelity; identical behavior to v1
F4 tests/Feature/Baselines/BaselineCaptureFidelityTest.php Capture with PolicyVersions present baseline_hash uses content fidelity; content_hash_source recorded
F5 tests/Feature/Baselines/BaselineCompareStaleVersionTest.php PolicyVersion older than snapshot Falls back to meta (stale PV not used)
F6 tests/Feature/Baselines/BaselineCompareCoverageGuardContentTest.php Coverage reporting for content types content_coverage in run context shows which types are content-covered

Existing Tests to Preserve

# Test File Impact
E1 tests/Feature/Baselines/BaselineCompareFindingsTest.php Must still pass — meta fidelity is default when no PV exists
E2 tests/Feature/Baselines/BaselineComparePreconditionsTest.php No change expected
E3 tests/Feature/Baselines/BaselineCompareStatsTest.php Stats remain grouped by scope_key; may need fidelity breakdown
E4 tests/Feature/Baselines/BaselineOperabilityAutoCloseTest.php Auto-close unaffected by fidelity source

Integration / Regression

# Test Scope
I1 Content hash stability across serialization JSON encode/decode round-trip does not change hash
I2 PolicyVersion normalizer alignment Same snapshot → SettingsNormalizer produces same hash in both System A (via provider) and System B (via DriftFindingGenerator)
I3 Hash collision protection Different settings → different hashes (property-based test with sample data)
I4 Empty snapshot edge case PolicyVersion with empty/null snapshot → provider returns null → fallback works

Performance Tests

# Test Acceptance Criteria
P1 Compare job with 500 items, 50% with PolicyVersions Completes in < 30s (DB-only, no Graph calls)
P2 Provider chain query efficiency PolicyVersion lookup uses batch query, not N+1

9. Open Questions / Assumptions

Open Questions

# Question Impact Proposed Resolution
OQ-1 Staleness threshold for PolicyVersions: How old can a PolicyVersion be before we reject it as a content source? Determines false-negative risk Default: PolicyVersion must be captured after the baseline snapshot's captured_at. Configurable per workspace.
OQ-2 Mixed fidelity UX: How should the UI display findings with different fidelity levels? User trust and understanding Badge/icon on finding cards: "High confidence (content)" vs "Structural only (meta)". Filterable in findings table.
OQ-3 Should baseline capture force a backup if no recent PolicyVersions exist? API cost vs accuracy trade-off No for v1.5 (opportunistic only). Yes for v2.0 as opt-in capture_mode: full_content.
OQ-4 etag as change hint: Should we use etag changes as a trigger for on-demand PolicyVersion capture? Could reduce unnecessary GETs Worth investigating in v2.0. If etag changes during inventory sync, schedule targeted per-item GET for that policy only.
OQ-5 Settings Catalog $expand=settings on LIST: Does Microsoft Graph support this? Could give "free" content fidelity for settings catalog types Needs validation against Graph API. If supported, would eliminate per-item GET for the most impactful type.
OQ-6 Retention / pruning interaction: If old PolicyVersions are pruned, does that affect baseline compare? Could lose content fidelity for old baselines Baseline compare only needs versions captured after baseline snapshot. Pruning policy should respect active baseline snapshots.

Assumptions

# Assumption Risk if Wrong
A-1 DriftHasher::hashNormalized() is deterministic across PHP serialization boundaries Hash mismatch → false drift findings. Validated: uses json_encode with stable flags + ksort.
A-2 SettingsNormalizer / PolicyNormalizer produce the same output for the same input regardless of call context (System A vs System B) Hash inconsistency between systems. Low risk: same code path.
A-3 PolicyVersions from backups contain complete settings (not partial hydration) Incomplete content → false negatives or incorrect hashes. Validated: PolicySnapshotService performs full hydration per type.
A-4 The Finding model's fingerprint/recurrence_key identity allows mixed fidelity sources Identity collision if fidelity changes source. Safe: recurrence_key includes snapshot_id, not hash value.
A-5 Graph LIST endpoints do NOT return settings values for any supported policy type If wrong, inventory sync could capture settings "for free". Validated: LIST returns only $select fields per graph_contracts.php.
A-6 Per-type normalizers in backup drift path handle all 28 supported policy types If not, some types would produce unstable hashes. Partially validated: PolicyNormalizer has a fallback for unknown types.

10. Key Questions Answered

KQ-01: Are Baseline Compare and Backup Drift truly separate systems?

Yes. They share DriftHasher and the Finding model, but differ in:

  • Data source: InventoryItem vs PolicyVersion
  • Hash contract: InventoryMetaContract (7 fields, meta only) vs SettingsNormalizer → PolicyNormalizer (full snapshot)
  • Finding generator: CompareBaselineToTenantJob::computeDrift() vs DriftFindingGenerator::generate()
  • Finding identity: different recurrence key structures
  • Scope model: BaselineProfile-scoped vs selection_hash-scoped
  • Trigger: post-inventory-sync vs post-backup
  • Coverage: InventoryCoverage guard vs none (trusts backup completeness)

KQ-02: Should they be unified or remain separate?

Hybrid approach (Provider Chain) — as designed in Spec 116 v2. Keep separate triggering and scoping, but let System A consume data produced by System B (PolicyVersions) via a provider chain. This avoids:

  • Merging two fundamentally different scoping models
  • Introducing new Graph API costs
  • Disrupting existing backup drift workflows

KQ-03: What is the minimal viable "v1.5" to bridge the gap?

Add a PolicyVersionContentProvider that checks for recent PolicyVersions as part of baseline compare's hash computation. For types where a PolicyVersion exists (i.e., a backup was taken), the compare immediately gains content-fidelity. For types without, meta-fidelity continues as before. Net code change: ~200-300 lines (interface + 2 providers + chain + integration).

KQ-04: Which types benefit most from content-fidelity drift?

Top priority (complex settings, high change frequency):

  1. settingsCatalogPolicy — most common, deeply nested settings
  2. groupPolicyConfiguration — multi-level nesting (definitionValues → presentationValues)
  3. deviceCompliancePolicy — compliance rules + scheduled actions
  4. deviceConfiguration — broad category, many OData sub-types
  5. endpointSecurityPolicy — critical security settings
  6. securityBaselinePolicy — security-critical baselines
  7. conditionalAccessPolicy — identity security gate

Medium priority (simpler settings but still valuable): 8. appProtectionPolicy, windowsUpdateRing, windowsFeatureUpdateProfile, windowsQualityUpdateProfile

KQ-05: How does coverage work and how should it extend for content fidelity?

Currently: InventoryCoverage::fromContext(latestSyncRun->context)coveredTypes() returns types with status=succeeded. Uncovered types → findings suppressed, outcome = partially_succeeded.

For v1.5: Add content_coverage alongside meta_coverage:

  • content_covered_types: types where PolicyVersion exists post-baseline
  • meta_only_types: types where only meta is available
  • uncovered_types: types with no coverage at all (findings suppressed)

Finding evidence should include:

{
  "fidelity": "content",
  "content_hash_source": "policy_version:42",
  "note": "Hash computed from PolicyVersion #42 captured 2025-07-14T10:30:00Z"
}

KQ-06: What is the long-term unified architecture?

Provider precedence chain with configurable capture modes:

BaselineProfile.capture_mode:
  'meta_only'        → InventoryMetaContract only (v1)
  'opportunistic'    → PolicyVersion if available → meta fallback (v1.5)
  'full_content'     → On-demand GET for missing types → PolicyVersion → meta (v2.0)

ContentProviderChain:
  1. PolicyVersionContentProvider    (checks existing PolicyVersions)
  2. InventoryContentProvider        (future: if inventory sync enriched)
  3. MetaFallbackProvider            (InventoryMetaContract v1)

The long-term vision is that baseline capture + compare use the same normalizer pipeline as backup drift, producing identical hashes for identical content regardless of which system produced the PolicyVersion. This is achievable because DriftHasher and SettingsNormalizer are already shared code.


Appendix: Database Schema Reference

baseline_snapshot_items (current)

id                    BIGINT PK
baseline_snapshot_id  BIGINT FK → baseline_snapshots
subject_type          VARCHAR(255)    -- 'policy'
subject_external_id   VARCHAR(255)    -- Graph resource GUID
policy_type           VARCHAR(255)    -- e.g. 'settingsCatalogPolicy'
baseline_hash         VARCHAR(64)     -- sha256 of InventoryMetaContract
meta_jsonb            JSONB           -- {display_name, category, platform, meta_contract: {...}, fidelity, source}
created_at            TIMESTAMP
updated_at            TIMESTAMP

inventory_items (current)

id                          BIGINT PK
tenant_id                   BIGINT FK → tenants
policy_type                 VARCHAR(255)
external_id                 VARCHAR(255)
display_name                VARCHAR(255)
category                    VARCHAR(255) NULL
platform                    VARCHAR(255) NULL
meta_jsonb                  JSONB         -- {odata_type, etag, scope_tag_ids, assignment_target_count}
last_seen_at                TIMESTAMP NULL
last_seen_operation_run_id  BIGINT NULL
created_at                  TIMESTAMP
updated_at                  TIMESTAMP

policy_versions (current)

id                  BIGINT PK
tenant_id           BIGINT FK → tenants
policy_id           BIGINT FK → policies
version_number      INTEGER
policy_type         VARCHAR(255)
platform            VARCHAR(255) NULL
created_by          VARCHAR(255) NULL
captured_at         TIMESTAMP
snapshot            JSON          -- FULL Graph GET response (hydrated)
metadata            JSON          -- additional metadata
assignments         JSON NULL     -- full assignments array
scope_tags          JSON NULL     -- scope tag IDs
assignments_hash    VARCHAR(64) NULL
scope_tags_hash     VARCHAR(64) NULL
created_at          TIMESTAMP
updated_at          TIMESTAMP
deleted_at          TIMESTAMP NULL  -- soft delete

Proposed v1.5 Addition

ALTER TABLE baseline_snapshot_items
    ADD COLUMN content_hash_source VARCHAR(255) NULL DEFAULT 'inventory_meta_v1';
-- Values: 'inventory_meta_v1', 'policy_version:{id}', 'inventory_content_v2'