TenantAtlas/spec-candidates/381-baseline-matching-pipeline-canonicalization.md
ahmido dbff2a0a90 feat(report): implement management report pdf runtime (#450)
Added jobs, controllers, and PDF generation logic for management report runtime as defined in Spec 379. Includes artifact migrations, payload builders, and testing coverage.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #450
2026-06-15 11:36:29 +00:00

235 lines
10 KiB
Markdown

# Spec Candidate 381 - Baseline Matching Pipeline & Canonicalization v1
## Candidate Status
Candidate for implementation after Spec 380.
This candidate changes baseline compare matching so TenantPilot resolves subjects through identity, canonicalization, and bindings before falling back to display-name matching.
## Depends On
- Spec 380 - Provider Resource Identity & Binding Foundation v1
## Spec Candidate Check
- **Problem**: Baseline compare currently loads baseline/current subjects mainly by normalized `policy_type|subject_key`, so built-ins, virtual assignment targets, foundation objects, duplicate names, and restored/test/copied resources can be misclassified.
- **Today's failure**: Operators get false ambiguity or false missing states, and display-name fallback can look more authoritative than it is.
- **User-visible improvement**: Compare results become more trustworthy because exact identity, canonical provider defaults, active bindings, and safe fingerprints are attempted before display names.
- **Smallest enterprise-capable version**: Add one matching pipeline seam, canonicalizer registry, foundation coverage registry, active-binding lookup, fake-provider contract tests, and bounded Microsoft/Intune adapter behavior behind the provider seam.
- **Explicit non-goals**: No manual resolution UI, no evidence/review readiness remapping, no restore integration, no customer-facing copy changes, no broad historical migration, and no generic provider framework beyond the concrete canonicalization need.
- **Permanent complexity imported**: Matching pipeline service, canonicalizer registry, coverage registry, typed matching result, provider descriptor use, adapter contract tests, and baseline compare integration tests.
- **Why now**: Spec 380 would create durable binding truth, but compare remains unsafe until the matching order actually consumes it before display-name fallback.
- **Why not local**: Local patches in `IntuneCompareStrategy` would keep provider-specific labels in core and would not provide a reusable identity/canonicalization path for evidence and review follow-up.
- **Approval class**: Core Enterprise.
- **Red flags triggered**: New meta-infrastructure, foundation/canonical terminology, and multi-step pipeline. The defense is that matching order changes operator trust and customer-readiness blockers directly; the v1 is bounded to existing compare flows and one fake-provider contract.
- **Score**: Nutzen: 2 | Dringlichkeit: 2 | Scope: 1 | Komplexitaet: 1 | Produktnaehe: 2 | Wiederverwendung: 2 | **Gesamt: 10/12**
- **Decision**: approve after Spec 380, with Microsoft behavior kept behind adapter seams.
## Proportionality Review
1. **Current operator problem**: Operators cannot trust whether compare blockers reflect real tenant-owned duplicates or expected provider defaults.
2. **Why existing structure is insufficient**: Existing compare code keys by `policy_type|subject_key` and current reason codes do not express canonical provider defaults or active bindings first.
3. **Narrowest correct implementation**: Insert one matching pipeline before existing compare strategy behavior; preserve legacy strategies where possible.
4. **Ownership cost**: Baseline compare owners maintain pipeline ordering, registry entries, adapter contracts, and fallback semantics.
5. **Rejected alternative**: Hardcoding Microsoft labels in core was rejected because it deepens provider coupling and still leaves display-name-like truth in shared code.
6. **Current-release truth or future prep**: Current-release trust issue; fake provider tests prove the seam without broad multi-provider productization.
## Problem
Baseline compare currently loads baseline/current subjects mainly by normalized `policy_type|subject_key`. This causes false ambiguity and false missing states when:
- Microsoft/default/provider built-ins exist,
- virtual assignment targets appear as labels,
- foundation objects are not policy-backed,
- tenant-owned resources have duplicate names,
- restored/test/copied resources share display names.
The matching process needs a provider-agnostic pipeline.
## Goal
Introduce a subject matching pipeline that resolves baseline subjects using:
1. active binding,
2. canonical built-in/virtual target recognition,
3. provider object identity,
4. stable external identity,
5. safe fingerprint,
6. unique descriptor match,
7. display-name fallback,
8. unresolved ambiguity,
9. missing/unsupported/limitation classification.
## Scope
### In Scope
- Add `SubjectMatchingPipeline`.
- Add `BuiltInCanonicalizerRegistry`.
- Add `FoundationCoverageRegistry`.
- Integrate active binding lookup from `provider_resource_bindings`.
- Integrate provider resource descriptors from inventory/policy versions.
- Add provider-adapter seam for canonicalization.
- Update baseline compare flow to call the matching pipeline before existing compare strategy.
- Preserve compatibility with existing compare strategies.
- Add fake-provider contract tests.
- Add Microsoft/Intune adapter implementation only behind provider adapter seam, not in core.
### Out of Scope
- Full UI for manual resolution.
- Evidence/review readiness remapping.
- Generic workflow engine.
- Full restore integration.
- Broad historical migration of previous compare results.
- Customer-facing output changes.
## Matching Priority
The matching pipeline must evaluate in this order:
```text
1. Existing active binding
2. Provider built-in / virtual canonical key
3. Exact provider object identity
4. Stable provider-specific external identity
5. Unique fingerprint / payload identity where safe
6. Unique provider resource descriptor match
7. Unique normalized display-name fallback
8. Unresolved ambiguity
9. Missing resource/evidence/unsupported coverage
```
Display-name fallback must be explicitly marked as fallback and should never silently produce high-trust identity if stronger identity is available.
## Built-In Canonicalization
Core baseline logic must not hardcode provider names or Microsoft labels.
Provider adapters may register canonicalizers.
Example Microsoft/Intune canonicalization behind adapter seam:
```text
All users
All devices
Default role scope tag
Known provider-default assignment targets
Known provider-default foundation resources
```
These must resolve by provider discriminator/type/canonical key, not display name.
## Foundation Coverage Registry
The registry must classify resource classes as:
```text
fully_comparable
inventory_only
canonical_only
unsupported
excluded_by_profile
requires_manual_binding
```
Foundation objects must not be forced into policy-backed comparison.
Examples:
```text
roleScopeTag default -> canonical built-in/default if provider identifies it
roleScopeTag tenant-owned -> foundation resource by provider object ID
assignmentFilter tenant-owned -> foundation inventory/comparable depending capability
notificationMessageTemplate -> foundation/config object depending capability
```
## Integration Points
Expected areas to inspect/modify:
- `BaselineCompareService`
- `CompareBaselineToTenantJob`
- `SubjectResolver`
- `ResolutionOutcome`
- `IntuneCompareStrategy`
- `CompareStrategyRegistry`
- `InventoryPolicyTypeMeta`
- `BaselineSupportCapabilityGuard`
- `GovernanceSubjectTaxonomyRegistry`
- provider gateway / provider adapter seams
- Graph contract registry integration where applicable
## Result Contract
The matching pipeline should return a typed result, for example:
```text
resolved_exact_identity
resolved_active_binding
resolved_canonical_builtin
resolved_canonical_virtual_target
resolved_unique_fallback
unresolved_ambiguous_match
missing_provider_resource
missing_local_evidence
unsupported_resource_class
foundation_inventory_only
excluded_non_governed
accepted_limitation
```
Spec 382 will formalize result semantics, but Spec 381 must produce enough structure for that follow-up.
## Acceptance Criteria
- Baseline compare uses matching pipeline before display-name fallback.
- Built-ins/virtual targets can be resolved by provider canonicalizer.
- Tenant-owned duplicate names remain unresolved unless an active binding exists.
- Foundation inventory-only resources no longer produce false policy-backed matching attempts.
- Existing compare strategies still receive matched baseline/current resources where possible.
- No core class hardcodes Microsoft display names.
- Fake provider can register canonical built-ins and resolve them.
- YPTW2-style cases are representable:
- `All users` / `All devices` canonicalized,
- `default roleScopeTag` canonicalized or foundation-classified,
- tenant-owned duplicate Settings Catalog policies remain ambiguous until binding,
- assignment filters and notification templates are classified by capability.
## Required Tests
- Built-in canonical object resolves without ambiguity.
- Virtual assignment target resolves without display-name matching.
- Tenant-owned duplicate display names remain unresolved.
- Active manual binding resolves duplicate candidate.
- Display-name fallback is only used after identity/canonical/binding attempts fail.
- Foundation inventory-only object returns inventory-only limitation, not `foundation_not_policy_backed`.
- Unsupported resource class returns unsupported result.
- Fake provider canonicalization contract test.
- Microsoft/Intune adapter does not leak display-name logic into core.
## Validation Commands
```bash
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php tests/Feature/Baselines/BaselineCompareGapClassificationTest.php
```
```bash
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/SubjectResolverTest.php
```
Add new tests for the matching pipeline and canonicalizer registry.
## Risks
- Accidentally making display-name fallback look authoritative.
- Hiding real duplicate tenant resources through over-aggressive canonicalization.
- Hardcoding Microsoft-specific behavior into core.
- Breaking existing compare strategy expectations.
## Recommendation
Implement this second.
This candidate fixes the core matching failure mode while still avoiding UI and evidence/review changes.