TenantAtlas/spec-candidates/381-baseline-matching-pipeline-canonicalization.md

# Spec Candidate 381 - Baseline Matching Pipeline & Canonicalization v1

## Candidate Status

Candidate for implementation after Spec 380.

This candidate changes baseline compare matching so TenantPilot resolves subjects through identity, canonicalization, and bindings before falling back to display-name matching.

## Depends On

- Spec 380 - Provider Resource Identity & Binding Foundation v1

## Spec Candidate Check

- **Problem**: Baseline compare currently loads baseline/current subjects mainly by normalized `policy_type|subject_key`, so built-ins, virtual assignment targets, foundation objects, duplicate names, and restored/test/copied resources can be misclassified.
- **Today's failure**: Operators get false ambiguity or false missing states, and display-name fallback can look more authoritative than it is.
- **User-visible improvement**: Compare results become more trustworthy because exact identity, canonical provider defaults, active bindings, and safe fingerprints are attempted before display names.
- **Smallest enterprise-capable version**: Add one matching pipeline seam, canonicalizer registry, foundation coverage registry, active-binding lookup, fake-provider contract tests, and bounded Microsoft/Intune adapter behavior behind the provider seam.
- **Explicit non-goals**: No manual resolution UI, no evidence/review readiness remapping, no restore integration, no customer-facing copy changes, no broad historical migration, and no generic provider framework beyond the concrete canonicalization need.
- **Permanent complexity imported**: Matching pipeline service, canonicalizer registry, coverage registry, typed matching result, provider descriptor use, adapter contract tests, and baseline compare integration tests.
- **Why now**: Spec 380 would create durable binding truth, but compare remains unsafe until the matching order actually consumes it before display-name fallback.
- **Why not local**: Local patches in `IntuneCompareStrategy` would keep provider-specific labels in core and would not provide a reusable identity/canonicalization path for evidence and review follow-up.
- **Approval class**: Core Enterprise.
- **Red flags triggered**: New meta-infrastructure, foundation/canonical terminology, and multi-step pipeline. The defense is that matching order changes operator trust and customer-readiness blockers directly; the v1 is bounded to existing compare flows and one fake-provider contract.
- **Score**: Nutzen: 2 | Dringlichkeit: 2 | Scope: 1 | Komplexitaet: 1 | Produktnaehe: 2 | Wiederverwendung: 2 | **Gesamt: 10/12**
- **Decision**: approve after Spec 380, with Microsoft behavior kept behind adapter seams.

## Proportionality Review

1. **Current operator problem**: Operators cannot trust whether compare blockers reflect real tenant-owned duplicates or expected provider defaults.
2. **Why existing structure is insufficient**: Existing compare code keys by `policy_type|subject_key` and current reason codes do not express canonical provider defaults or active bindings first.
3. **Narrowest correct implementation**: Insert one matching pipeline before existing compare strategy behavior; preserve legacy strategies where possible.
4. **Ownership cost**: Baseline compare owners maintain pipeline ordering, registry entries, adapter contracts, and fallback semantics.
5. **Rejected alternative**: Hardcoding Microsoft labels in core was rejected because it deepens provider coupling and still leaves display-name-like truth in shared code.
6. **Current-release truth or future prep**: Current-release trust issue; fake provider tests prove the seam without broad multi-provider productization.

## Problem

Baseline compare currently loads baseline/current subjects mainly by normalized `policy_type|subject_key`. This causes false ambiguity and false missing states when:

- Microsoft/default/provider built-ins exist,
- virtual assignment targets appear as labels,
- foundation objects are not policy-backed,
- tenant-owned resources have duplicate names,
- restored/test/copied resources share display names.

The matching process needs a provider-agnostic pipeline.

## Goal

Introduce a subject matching pipeline that resolves baseline subjects using:

1. active binding,
2. canonical built-in/virtual target recognition,
3. provider object identity,
4. stable external identity,
5. safe fingerprint,
6. unique descriptor match,
7. display-name fallback,
8. unresolved ambiguity,
9. missing/unsupported/limitation classification.

## Scope

### In Scope

- Add `SubjectMatchingPipeline`.
- Add `BuiltInCanonicalizerRegistry`.
- Add `FoundationCoverageRegistry`.
- Integrate active binding lookup from `provider_resource_bindings`.
- Integrate provider resource descriptors from inventory/policy versions.
- Add provider-adapter seam for canonicalization.
- Update baseline compare flow to call the matching pipeline before existing compare strategy.
- Preserve compatibility with existing compare strategies.
- Add fake-provider contract tests.
- Add Microsoft/Intune adapter implementation only behind provider adapter seam, not in core.

### Out of Scope

- Full UI for manual resolution.
- Evidence/review readiness remapping.
- Generic workflow engine.
- Full restore integration.
- Broad historical migration of previous compare results.
- Customer-facing output changes.

## Matching Priority

The matching pipeline must evaluate in this order:

```text
1. Existing active binding
2. Provider built-in / virtual canonical key
3. Exact provider object identity
4. Stable provider-specific external identity
5. Unique fingerprint / payload identity where safe
6. Unique provider resource descriptor match
7. Unique normalized display-name fallback
8. Unresolved ambiguity
9. Missing resource/evidence/unsupported coverage
```

Display-name fallback must be explicitly marked as fallback and should never silently produce high-trust identity if stronger identity is available.

## Built-In Canonicalization

Core baseline logic must not hardcode provider names or Microsoft labels.

Provider adapters may register canonicalizers.

Example Microsoft/Intune canonicalization behind adapter seam:

```text
All users
All devices
Default role scope tag
Known provider-default assignment targets
Known provider-default foundation resources
```

These must resolve by provider discriminator/type/canonical key, not display name.

## Foundation Coverage Registry

The registry must classify resource classes as:

```text
fully_comparable
inventory_only
canonical_only
unsupported
excluded_by_profile
requires_manual_binding
```

Foundation objects must not be forced into policy-backed comparison.

Examples:

```text
roleScopeTag default            -> canonical built-in/default if provider identifies it
roleScopeTag tenant-owned       -> foundation resource by provider object ID
assignmentFilter tenant-owned   -> foundation inventory/comparable depending capability
notificationMessageTemplate     -> foundation/config object depending capability
```

## Integration Points

Expected areas to inspect/modify:

- `BaselineCompareService`
- `CompareBaselineToTenantJob`
- `SubjectResolver`
- `ResolutionOutcome`
- `IntuneCompareStrategy`
- `CompareStrategyRegistry`
- `InventoryPolicyTypeMeta`
- `BaselineSupportCapabilityGuard`
- `GovernanceSubjectTaxonomyRegistry`
- provider gateway / provider adapter seams
- Graph contract registry integration where applicable

## Result Contract

The matching pipeline should return a typed result, for example:

```text
resolved_exact_identity
resolved_active_binding
resolved_canonical_builtin
resolved_canonical_virtual_target
resolved_unique_fallback
unresolved_ambiguous_match
missing_provider_resource
missing_local_evidence
unsupported_resource_class
foundation_inventory_only
excluded_non_governed
accepted_limitation
```

Spec 382 will formalize result semantics, but Spec 381 must produce enough structure for that follow-up.

## Acceptance Criteria

- Baseline compare uses matching pipeline before display-name fallback.
- Built-ins/virtual targets can be resolved by provider canonicalizer.
- Tenant-owned duplicate names remain unresolved unless an active binding exists.
- Foundation inventory-only resources no longer produce false policy-backed matching attempts.
- Existing compare strategies still receive matched baseline/current resources where possible.
- No core class hardcodes Microsoft display names.
- Fake provider can register canonical built-ins and resolve them.
- YPTW2-style cases are representable:
  - `All users` / `All devices` canonicalized,
  - `default roleScopeTag` canonicalized or foundation-classified,
  - tenant-owned duplicate Settings Catalog policies remain ambiguous until binding,
  - assignment filters and notification templates are classified by capability.

## Required Tests

- Built-in canonical object resolves without ambiguity.
- Virtual assignment target resolves without display-name matching.
- Tenant-owned duplicate display names remain unresolved.
- Active manual binding resolves duplicate candidate.
- Display-name fallback is only used after identity/canonical/binding attempts fail.
- Foundation inventory-only object returns inventory-only limitation, not `foundation_not_policy_backed`.
- Unsupported resource class returns unsupported result.
- Fake provider canonicalization contract test.
- Microsoft/Intune adapter does not leak display-name logic into core.

## Validation Commands

```bash
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php tests/Feature/Baselines/BaselineCompareGapClassificationTest.php
```

```bash
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/SubjectResolverTest.php
```

Add new tests for the matching pipeline and canonicalizer registry.

## Risks

- Accidentally making display-name fallback look authoritative.
- Hiding real duplicate tenant resources through over-aggressive canonicalization.
- Hardcoding Microsoft-specific behavior into core.
- Breaking existing compare strategy expectations.

## Recommendation

Implement this second.

This candidate fixes the core matching failure mode while still avoiding UI and evidence/review changes.