ahmido ea77c8c718 feat(baselines): implement baseline compare result semantics (#454 )

Implemented deterministic Baseline Result Semantics (Spec 383), introducing CompareSubjectResult and CompareEvidenceResult. Replaced generic arrays with strict Data Transfer Objects for Baseline engine output.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #454

2026-06-16 20:20:27 +00:00

47 KiB

Raw Blame History

Feature Specification: Spec 383 - Baseline Compare Result Semantics and Gap Classification v1

Feature Branch: 383-baseline-result-semantics Created: 2026-06-16 Status: Draft / Ready for implementation preparation review Input: User-provided draft candidate "Spec 383 - Baseline Compare Result Semantics & Gap Classification v1" from /Users/ahmeddarrazi/.codex/attachments/11205670-04a2-4b8a-abde-a7e89adf9b79/pasted-text.txt.

Repo-Truth Adjustment

The user supplied a complete numbered draft for Spec 383. Repo truth confirms that specs/381-provider-resource-identity-binding/ and specs/382-baseline-matching-canonicalization/ are implemented and closed out. Spec 382 added SubjectMatchingPipeline, MatchingOutcome, BaselineSubjectDescriptor, and FoundationCoverageResolver, but current runtime truth still exposes overloaded or legacy-flavored result reasons such as ambiguous_match, policy_record_missing, foundation_not_policy_backed, missing_current, unsupported_subject, unsupported_subjects, and strategy_failed.

This prepared Spec 383 narrows the draft to the smallest implementation-ready slice:

V1 replaces overloaded baseline compare result and gap semantics with provider-neutral subject outcome dimensions.
V1 maps Spec 382 MatchingOutcome and existing compare strategy output into explicit identity, comparison, coverage, actionability, readiness, trust, reason, and category semantics.
V1 updates existing baseline compare OperationRun proof and gap payloads so downstream surfaces can distinguish blockers, limitations, unsupported coverage, missing provider resource, missing local evidence, drift, no drift, and exclusions.
V1 updates existing baseline compare detail/status presentation only as needed to render the new grouped truth from existing surfaces.
V1 does not add resolution UI, operator binding/exclusion screens, final Evidence Snapshot readiness mapping, final Review Pack publication readiness mapping, customer-facing Review Pack wording, report/PDF runtime work, or a generic workflow engine.
V1 does not preserve legacy compare result compatibility. TenantPilot is pre-production, so old local/dev compare records and old tests may be reset or rewritten to the new truth model.

Candidate Selection Gate

Selected candidate: Spec 383 - Baseline Compare Result Semantics and Gap Classification v1.
Source: Direct user-provided candidate attachment, plus follow-up references in specs/381-provider-resource-identity-binding/spec.md, specs/381-provider-resource-identity-binding/implementation-close-out.md, and specs/382-baseline-matching-canonicalization/spec.md.
Why selected: Specs 381 and 382 are completed, and current code now has stable matching inputs but still maps those inputs into ambiguous legacy result semantics. Spec 383 is the next dependency before resolution UI and evidence/review readiness can safely consume compare output.
Roadmap relationship: Supports provider-neutral baseline identity, governance truth, OperationRun proof quality, and customer-safe evidence/review foundations without reopening completed UI/productization or report lanes.
Close alternatives deferred:
- Spec 384 - Baseline Subject Resolution UI & Operator Decisions v1: depends on 383 result/actionability semantics.
- Spec 385 - Evidence & Review Readiness Integration v1: depends on 383 readiness-impact and category semantics.
- Management Report/PDF runtime validation: unrelated current working-tree/report lane and explicitly out of scope.
- Broad baseline compare UI redesign: not required for V1; this spec only updates existing status/grouping presentation where the new result truth reaches current surfaces.
Completed-spec guardrail result:
- specs/381-provider-resource-identity-binding/ is completed and validated. It is dependency context only.
- specs/382-baseline-matching-canonicalization/ is completed and validated. It is upstream runtime context only.
- specs/163-baseline-subject-resolution/, specs/336-baseline-compare-product-process-flow-alignment/, specs/347-review-pack-output-contract-readiness-semantics/, specs/350-operator-resolution-guidance-framework-v1/, and specs/380-management-report-pdf-staging-runtime-validation/ are historical or adjacent context only and must not be rewritten by this spec.
- No existing specs/383-* package or 383-* local/remote branch was found before the Spec Kit create script ran.
Smallest viable implementation slice: Replace the current overloaded compare reason strings and run/gap payload aggregation with a narrow provider-neutral outcome classifier and mapped structured payloads over existing compare and matching code.
Gate result: PASS. The candidate is user-provided, not already specced or completed, directly follows completed Specs 381 and 382, and can be scoped as a bounded runtime/result-semantics slice.

Spec Candidate Check (mandatory - SPEC-GATE-001)

Problem: Baseline compare can now resolve provider-resource identity through Spec 382, but the resulting gaps and run summaries still use overloaded, policy-shaped, or legacy reason meanings.
Today's failure: Operators and downstream evidence/review code cannot reliably tell whether a result is true drift, unresolved identity, missing provider resource, missing local evidence, unsupported coverage, accepted limitation, excluded non-governed subject, or a technical compare failure.
User-visible improvement: Operators see honest compare outcomes: blockers require action, limitations are visible but not false failures, unsupported scope is explicit, low-trust identity cannot become no drift, and provider/default/foundation cases do not appear as missing policy records.
Smallest enterprise-capable version: Add a provider-neutral result semantics family and classifier, map Spec 382 matching outcomes plus compare strategy states into it, replace old gap reason strings as authoritative truth, write structured OperationRun gap/proof payloads, and update focused tests and existing display grouping.
Explicit non-goals: No resolution UI, no manual bind/exclude/accept-limitation screens, no Evidence Snapshot final readiness mapping, no Review Pack publication final mapping, no customer-facing Review Pack copy, no Management Report/PDF work, no generic workflow engine, no legacy compare result mapper, no historical OperationRun context reader, and no new persisted entity unless the spec/plan are updated first.
Permanent complexity imported: A narrow result semantics model, reason/category/actionability/readiness/trust values, an outcome classifier/mapper, updated OperationRun payload shape, focused unit/feature tests, and existing-surface label/grouping updates. No new table, route, panel, or broad UI framework is approved.
Why now: Spec 382 deliberately left final result semantics to Spec 383. Leaving legacy reason truth in place would make Specs 384 and 385 build on ambiguous signals.
Why not local: Patching individual labels such as policy_record_missing or foundation_not_policy_backed would preserve ambiguity across MatchingOutcome, SubjectResolver, CompareState, strategy diagnostics, OperationRun context, and evidence/review consumers. The current workflow needs one canonical semantic mapping.
Approval class: Core Enterprise.
Red flags triggered: New status/reason taxonomy, classifier layer, OperationRun payload semantics, and existing status presentation changes. Defense: the scope replaces overloaded truth instead of adding a parallel UI taxonomy, stays inside baseline compare result semantics, adds no persistence, and defers UI decisions and evidence/review readiness to explicit follow-ups.
Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexitaet: 1 | Produktnaehe: 2 | Wiederverwendung: 2 | Gesamt: 11/12
Decision: approve as a narrowed Core Enterprise runtime/result-semantics slice.

Problem Statement

Baseline compare currently has multiple result and gap labels that compress incompatible meanings into one string. For example:

ambiguous_match can mean unresolved duplicate identity, low-trust identity, or compare-key collision.
policy_record_missing and missing_current can mean missing provider resource, missing local evidence, or stale collection state.
foundation_not_policy_backed can mean inventory-only coverage, identity-only coverage, canonical-only coverage, unsupported class, or accepted limitation.
unsupported_subjects and strategy_failed can represent expected scope limitations, implementation gaps, or technical failures.

After Spec 382, compare has better upstream matching truth, but the downstream semantics still hide whether the operator needs to refresh provider data, create a binding, accept a limitation, exclude a subject, investigate a technical failure, or trust a no-drift result.

TenantPilot needs provider-neutral result semantics that keep drift, identity, coverage, actionability, readiness, and trust separate.

Business / Product Value

Prevents false green outcomes by ensuring unresolved, low-trust, unsupported, missing, excluded, or accepted-limitation subjects cannot count as clean no drift.
Prevents false red outcomes by making limitations, unsupported resource classes, and inventory-only foundations explicit instead of treating them as missing policies.
Gives OperationRun proof payloads enough structure for reliable support diagnosis and future evidence/review readiness mapping.
Creates clean inputs for Spec 384 resolution UI and Spec 385 evidence/review readiness without shipping those workflows now.
Keeps platform-core compare language provider-neutral as TenantPilot moves beyond Microsoft-shaped labels.

Primary Users / Operators

MSP or tenant operator reviewing baseline compare outcomes.
Workspace manager assessing whether baseline drift posture is trustworthy.
Support/platform operator diagnosing why compare output is blocked, partial, limited, or failed.
Release reviewer validating that compare semantics stay provider-neutral, pre-production lean, and test-proven.

Spec Scope Fields (mandatory)

Scope: tenant-owned baseline compare runtime and existing compare status/detail presentation within established workspace and managed-environment boundaries.
Primary Routes: No new route or navigation entry. Existing baseline compare landing/detail/operation surfaces and evidence-gap display paths may render updated status groups/labels from the new backend semantics.
Data Ownership: Existing OperationRun context/result payloads remain tenant-owned operational proof. Existing baseline snapshots, snapshot items, inventory items, policy versions, findings, evidence snapshots, review packs, and provider resource bindings keep their current ownership. No new persisted entity is approved.
RBAC: Existing baseline compare authorization remains. Reads and rendered results must stay workspace and managed-environment scoped. Non-members are denied as not found. Entitled members missing compare/view capability receive forbidden where existing policies apply.

For canonical-view specs:

Default filter behavior when tenant-context is active: Not applicable. No canonical-view route is added.
Explicit entitlement checks preventing cross-tenant leakage: Existing OperationRun, baseline, evidence, and review surfaces must continue resolving records through scoped workspace/managed-environment access before exposing result semantics.

UI Surface Impact (mandatory - UI-COV-001)

Does this spec add, remove, rename, or materially change any reachable UI surface?

No UI surface impact
Existing page changed
New page/route added
Navigation changed
Filament panel/provider surface changed
New modal/drawer/wizard/action added
New table/form/state added
Customer-facing surface changed
Dangerous action changed
Status/evidence/review presentation changed
Workspace/environment context presentation changed

UI/Productization Coverage

Route/page/surface: Existing baseline compare result/evidence-gap presentation in the admin panel, including BaselineCompareLanding, BaselineCompareMatrix, BaselineCompareEvidenceGapTable, and OperationRun detail contexts that render baseline compare evidence gaps.
Current or new page archetype: Existing baseline compare/domain status surfaces; no new archetype.
Design depth: Domain Pattern Surface. This spec changes result truth and grouping, not layout strategy.
Repo-truth level: repo-verified.
Existing pattern reused: Existing baseline compare and OperationRun detail surfaces; existing page report context from docs/ui-ux-enterprise-audit/page-reports/ui-015-baseline-compare.md.
New pattern required: none. Use existing status badge/detail grouping patterns and central badge/status helpers where applicable.
Screenshot required: no for V1 unless implementation materially changes layout, navigation, action hierarchy, or customer-facing output.
Page audit required: no for V1 unless implementation changes reachable layout or introduces a new visible group beyond existing baseline compare/status sections.
Customer-safe review required: no. Customer-facing readiness and Review Pack copy belong to Spec 385.
Dangerous-action review required: no. This spec adds no destructive or high-impact action.
Coverage files updated or explicitly not needed:
- docs/ui-ux-enterprise-audit/route-inventory.md
- docs/ui-ux-enterprise-audit/design-coverage-matrix.md
- docs/ui-ux-enterprise-audit/page-reports/...
- docs/ui-ux-enterprise-audit/strategic-surfaces.md
- docs/ui-ux-enterprise-audit/grouped-follow-up-candidates.md
- docs/ui-ux-enterprise-audit/unresolved-pages.md
- N/A - existing surface, no new route/page/archetype
No-impact rationale when applicable: Not applicable. Existing status presentation may change; route/layout coverage updates are not required unless implementation changes surface structure rather than labels/groups fed by existing components.

Cross-Cutting / Shared Pattern Reuse

Cross-cutting feature?: yes.
Interaction class(es): status messaging, evidence-gap detail, OperationRun proof context, badge/status labels where existing surfaces render reason state.
Systems touched: MatchingOutcome, SubjectMatchingPipeline, FoundationCoverageResolver, CompareState, CompareSubjectResult, IntuneCompareStrategy, CompareBaselineToTenantJob, BaselineCompareReasonCode, BaselineCompareEvidenceGapDetails, OperationRun summary counts/context, baseline compare tests, evidence/review regression tests.
Existing pattern(s) to extend: Existing compare strategy output, OperationRun lifecycle/service ownership, existing evidence-gap table/detail rendering, existing badge/status helper paths where used.
Shared contract / presenter / builder / renderer to reuse: Reuse OperationRunService for status/outcome transitions, OperationSummaryKeys for summary counts, existing badge/status helpers for rendered labels, and existing compare/evidence gap rendering surfaces.
Why the existing shared path is sufficient or insufficient: Existing paths are sufficient for lifecycle, rendering, and DB-only monitoring. They are insufficient for result truth because reason strings are overloaded and not mapped from Spec 382 matching outcomes into explicit dimensions.
Allowed deviation and why: A narrow baseline compare outcome classifier/mapper is allowed because it replaces overloaded domain truth. It must not become a generic workflow engine, customer output engine, or parallel badge taxonomy.
Consistency impact: Run context, gap subjects, rendered evidence-gap labels, support diagnostics, and tests must use the same reason/category/actionability/readiness semantics.
Review focus: no old reason strings as authoritative values, no low-trust no-drift, no policy-only terms at platform-core boundaries, no local OperationRun lifecycle transitions, no customer-facing readiness scope.

OperationRun UX Impact

Touches OperationRun start/completion/link UX?: no.
Shared OperationRun UX contract/layer reused: Existing baseline compare OperationRun start/completion/link UX remains unchanged.
Delegated start/completion UX behaviors: N/A - no queued toast, Open operation, browser event, queued DB notification, or run-link behavior changes.
Local surface-owned behavior that remains: N/A.
Queued DB-notification policy: N/A - no new queued DB notifications.
Terminal notification path: Existing OperationRun lifecycle only.
Exception required?: none.

Spec 383 changes OperationRun proof/context/result semantics for existing baseline compare runs. It must not transition OperationRun.status or OperationRun.outcome outside OperationRunService, and any new summary count keys must go through OperationSummaryKeys::all().

Provider Boundary / Platform Core Check

Shared provider/platform boundary touched?: yes.
Boundary classification: platform-core for result dimensions, categories, readiness impact, actionability, and trust semantics; provider-owned only for provider-specific metadata that feeds matching or compare strategies.
Seams affected: baseline compare result semantics, matching outcome mapping, compare strategy diagnostics, OperationRun proof payloads, evidence-gap detail labels, provider-neutral operator vocabulary.
Neutral platform terms preserved or introduced: provider resource, governed subject, identity, binding, canonicalization, comparison, coverage, limitation, drift, evidence, actionability, readiness impact, trust level.
Provider-specific semantics retained and why: provider key, provider resource type, provider resource ID, and provider-owned metadata remain low-level identity/proof fields. Microsoft/Intune terms must not be top-level result semantics.
Why this does not deepen provider coupling accidentally: V1 replaces policy-only reason codes with provider-neutral semantics and keeps provider details inside subject/proof metadata.
Follow-up path: Spec 384 may add operator decision UI. Spec 385 may map these semantics into evidence/review readiness and customer-safe output.

UI / Surface Guardrail Impact

Surface / Change	Operator-facing surface change?	Native vs Custom	Shared-Family Relevance	State Layers Touched	Exception Needed?	Low-Impact / N/A Note
Baseline compare evidence-gap/status presentation	yes	Existing native/shared surfaces	status messaging, evidence-gap detail, badge/status labels	page/detail data state	no	Existing surfaces only; no new route, navigation, action, modal, or layout pattern
OperationRun baseline compare proof context	yes, where rendered by existing run detail/support contexts	Existing OperationRun surfaces	OperationRun proof and diagnostics	detail data state	no	Start/link UX unchanged

Decision-First Surface Role

Surface	Decision Role	Human-in-the-loop Moment	Immediately Visible for First Decision	On-Demand Detail / Evidence	Why This Is Primary or Why Not	Workflow Alignment	Attention-load Reduction
Baseline compare evidence-gap/status presentation	Secondary Context Surface	Decide whether compare output is trusted, blocked, limited, or needs follow-up	result group, blocker/limitation/missing/evidence category, next action class	subject proof, matching proof, provider metadata, raw diagnostics	Secondary because it supports a compare/review decision but does not create a new workflow surface	Follows existing baseline compare and operation detail workflows	Reduces interpretation work by splitting real blockers from limitations and missing evidence

Audience-Aware Disclosure

Surface	Audience Modes In Scope	Decision-First Default-Visible Content	Operator Diagnostics	Support / Raw Evidence	One Dominant Next Action	Hidden / Gated By Default	Duplicate-Truth Prevention
Existing baseline compare status/detail surfaces	operator-MSP, support-platform	grouped status, readiness impact, actionability, subject label	matching status, reason category, source proof summary	structured matching proof and provider identifiers where existing support contexts allow	resolve blocker through later Spec 384 or refresh evidence where V1 says refresh is required	raw payloads and low-level proof remain diagnostics/support detail	one canonical reason/category/actionability set feeds all display paths

UI/UX Surface Classification

Surface	Action Surface Class	Surface Type	Likely Next Operator Action	Primary Inspect/Open Model	Row Click	Secondary Actions Placement	Destructive Actions Placement	Canonical Collection Route	Canonical Detail Route	Scope Signals	Canonical Noun	Critical Truth Visible by Default	Exception Type / Justification
Existing baseline compare result/gap surfaces	List / Detail / Diagnostics	Status/evidence detail	Inspect blocker, limitation, missing evidence, or drift	Existing page/run detail inspect path	unchanged	unchanged	N/A	existing baseline compare route	existing operation/baseline detail route	Workspace and managed environment from existing surfaces	Baseline compare result	category, actionability, readiness impact, trust	none

Operator Surface Contract

Surface	Primary Persona	Decision / Operator Action Supported	Surface Type	Primary Operator Question	Default-visible Information	Diagnostics-only Information	Status Dimensions Used	Mutation Scope	Primary Actions	Dangerous Actions
Existing baseline compare result/gap surfaces	Tenant operator / support operator	Decide whether compare output is trusted or blocked and what follow-up class applies	Secondary status/evidence detail	Can I trust this compare result, and what action is required?	result group, blocker/limitation/missing/drift/no-drift status, actionability, readiness impact	matching proof, raw provider IDs, diagnostics payloads	identity, comparison, coverage, actionability, readiness, trust	read-only status/proof only	Existing inspect/open actions	none

Proportionality Review (mandatory when structural complexity is introduced)

New source of truth?: yes, a canonical derived result-semantics truth for baseline compare subject outcomes. It replaces overloaded reason strings as authoritative runtime/result truth.
New persisted entity/table/artifact?: no new persisted entity/table/artifact. Structured payloads are stored in existing OperationRun/context or existing compare result paths only.
New abstraction?: yes, a narrow baseline compare outcome classifier/mapper and value families for dimensions/reasons/categories/actionability/readiness/trust.
New enum/state/reason family?: yes. Each value must change operator interpretation, run aggregation, future resolution routing, evidence/review readiness preparation, or testable behavior.
New cross-domain UI framework/taxonomy?: no. Rendered UI uses existing surfaces/helpers and maps directly from domain truth.
Current operator problem: Operators cannot distinguish real drift, missing evidence, missing provider resource, unsupported coverage, accepted limitation, excluded scope, unresolved identity, and compare failure from current overloaded strings.
Existing structure is insufficient because: Spec 382 upstream matching outputs are still collapsed into legacy gap reasons and broad run reason codes. Individual label patches would preserve ambiguity across job context, strategy results, support diagnostics, evidence/review consumers, and tests.
Narrowest correct implementation: One baseline compare semantics layer that maps existing matching and strategy outputs into structured payloads, updates existing reason values, and updates focused tests.
Ownership cost: New values and mapping tests must be maintained. Reviewers must prevent the model from growing into a broad workflow, UI, report, or evidence readiness framework.
Alternative intentionally rejected: Keep old reasons and add display labels. This would make old overloaded states remain product truth and would not protect future Specs 384 and 385.
Release truth: Current-release truth. Spec 382 is already implemented and its matching outcomes require final compare semantics now.

Compatibility posture

TenantPilot is pre-production. V1 must not add legacy result-code aliases, historical compare payload readers, old OperationRun context mappers, dual old/new reason readers, or compatibility tests that preserve old reason meanings. Existing local/dev compare records may be invalidated, reset, or destructively migrated if needed. Old tests that encode legacy reason semantics must be rewritten to the new truth model.

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

Test purpose / classification: Unit for result value mapping and classifiers; Feature for baseline compare integration, OperationRun gap payloads, run summary calculation, existing UI/status rendering where affected, and evidence/review regressions.
Validation lane(s): fast-feedback and confidence. PostgreSQL lane only if implementation introduces migrations, JSONB index/query behavior, locks, or constraints. Browser lane only if rendered layout/navigation/action behavior changes beyond existing labels/groups.
Why this classification and these lanes are sufficient: The feature changes deterministic domain semantics and existing DB-backed compare results. No new route, workflow, or JavaScript interaction is planned.
New or expanded test families: focused unit tests for result semantics and feature tests in tests/Feature/Baselines, plus existing evidence/review regression tests. No broad heavy-governance or browser family by default.
Fixture / helper cost impact: reuse existing baseline compare fixtures and Spec 382 provider-resource identity fixtures. Do not widen global workspace/provider defaults.
Heavy-family visibility / justification: none by default.
Special surface test profile: standard-native-filament relief unless implementation changes actual rendered layout/action hierarchy.
Standard-native relief or required special coverage: ordinary feature and existing Filament/status rendering coverage only; browser smoke becomes required only if UI structure changes.
Reviewer handoff: verify no old reasons remain authoritative, all result values have behavioral consequences, no low-trust no-drift, no provider-specific top-level semantics, no legacy mapper, no direct OperationRun lifecycle transitions, and no Spec 384/385 scope.
Budget / baseline / trend impact: expected small unit/feature runtime increase. Escalate as follow-up-spec if implementation needs broad evidence/review or UI productization coverage.
Escalation needed: document-in-feature for contained surface label/grouping changes; follow-up-spec for resolution UI, evidence/review readiness, customer output, or broad UI redesign.
Active feature PR close-out entry: Baseline Compare Result Semantics / Gap Classification.
Planned validation commands:
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/CompareSemantics
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/Matching tests/Unit/Baselines/CompareStrategyRegistryTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareGapClassificationTest.php tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php tests/Feature/Baselines/BaselineCompareProviderResourceBindingCanonicalIdentityTest.php tests/Feature/Baselines/BaselineCompareExecutionGuardTest.php tests/Feature/Baselines/BaselineCompareResumeTokenTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php tests/Feature/Filament/BaselineCompareEvidenceGapTableTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Evidence/BaselineDriftPostureSourceTest.php tests/Feature/ReviewPack/Spec347ReviewPackReadinessSemanticsTest.php tests/Feature/ReviewPack/Spec349ReviewPackResolutionGuidanceTest.php
- cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent
- git diff --check

User Scenarios & Testing (mandatory)

User Story 1 - Classify compare subject outcomes explicitly (Priority: P1)

As a baseline governance operator, I need each subject outcome to state whether identity, comparison, coverage, actionability, readiness, and trust are healthy, blocked, missing, limited, unsupported, excluded, or failed, so I can understand why compare output is trustworthy or incomplete.

Why this priority: This is the core source-of-truth change and unlocks all other stories.

Independent Test: Unit tests cover every result reason and prove it maps to exactly one category, actionability, readiness impact, and trust rule.

Acceptance Scenarios:

Given a trusted comparable subject with equal payload, When result semantics are calculated, Then the subject is no_drift, fully_verified, no_impact, high trust, and no action required.
Given a trusted comparable subject with changed payload, When result semantics are calculated, Then the subject is drift_detected with trusted comparison state and no identity blocker.
Given unresolved identity, duplicate candidates, missing local evidence, missing provider resource, unsupported class, accepted limitation, or excluded non-governed scope, When result semantics are calculated, Then each produces a distinct reason/category/actionability/readiness combination.

User Story 2 - Map Spec 382 matching outcomes without false green or false red output (Priority: P1)

As a release reviewer, I need MatchingOutcome states from Spec 382 to map into final compare semantics without legacy labels, so resolved identity can proceed to comparison and unresolved or limited identity cannot become healthy no drift.

Why this priority: Spec 383 depends on Spec 382 and must not leave matching results as a second taxonomy.

Independent Test: Unit and feature tests map active binding, canonical identity, missing local evidence, missing provider resource, unsupported, limitation, excluded, ambiguous, and identity-required outcomes into final result semantics.

Acceptance Scenarios:

Given an active binding that resolves to a current provider descriptor, When compare runs, Then identity is resolved by binding and comparison proceeds.
Given a low-trust, display-label-only, unresolved, duplicate, unsupported, excluded, or accepted-limitation matching outcome, When compare runs, Then it cannot produce clean no_drift.
Given an inventory-only or identity-only foundation subject, When compare runs, Then the result is a limitation rather than a missing policy or no-drift result.

User Story 3 - Store structured OperationRun proof and run summaries (Priority: P2)

As a support/platform operator, I need baseline compare OperationRun context to expose structured subject outcome proof and summary counts, so Monitoring and support diagnostics can explain what happened without decoding legacy strings.

Why this priority: OperationRun proof is the audit and troubleshooting surface for queued compare operations.

Independent Test: Feature tests assert OperationRun baseline compare context includes structured subject outcome payloads, reason/category counts, and summary outcome decisions derived from new semantics.

Acceptance Scenarios:

Given blockers such as identity_required, duplicate candidates, missing required provider resource, or compare failure, When the run completes, Then the run summary marks the compare blocked or partial according to the explicit required-scope rule and records blocker counts.
Given only trusted no-drift and trusted drift results with no blockers, When the run completes, Then the operation is completed with trustworthy counts and drift findings are represented as findings, not operation failure.
Given limitations, unsupported resource classes, accepted limitations, or excluded non-governed subjects, When the run completes, Then the run records warnings/partial or limitation counts without claiming verified no drift for those subjects.

User Story 4 - Render existing compare detail groups from the new truth (Priority: P2)

As an operator reviewing compare output, I need existing compare detail surfaces to group results into verified, drift, action required, missing evidence, missing provider resource, unsupported, limitations, excluded, and failed, so I can scan the result without interpreting internal codes.

Why this priority: The runtime truth should be visible enough on existing surfaces without waiting for the full resolution UI.

Independent Test: Existing Filament/Livewire feature tests assert baseline compare gap/detail surfaces render new group labels and do not expose old reason strings as primary operator truth.

Acceptance Scenarios:

Given mixed compare results, When an existing compare detail or OperationRun detail surface renders evidence gaps, Then it groups subjects by the new result categories.
Given old reasons are removed as authoritative values, When existing status/explanation helpers render copy, Then they use provider-neutral labels and next-action classes.
Given customer-facing review output is generated, When Spec 383 is implemented, Then customer-ready publication wording remains unchanged until Spec 385.

Edge Cases

Active binding points to a provider resource that is not present in latest current descriptors.
Current provider descriptor collection is stale or absent, so absence is not proven.
Duplicate tenant-owned candidates exist without an active binding.
A subject has only display-label identity or old local/dev subject-key data.
A foundation resource is inventory-only, identity-only, canonical-only, unsupported, accepted as a limitation, or excluded as non-governed.
Compare strategy throws a technical exception.
Compare strategy returns an incomplete/gap result after identity was resolved.
Drift exists with no blockers or limitations.
Excluded subjects appear in totals but must not count as verified or no drift.
Existing evidence/review consumers read baseline drift posture before Spec 385 exists.

Requirements (mandatory)

Functional Requirements

FR-383-001: TenantPilot MUST define provider-neutral baseline compare subject outcome semantics with identity status, comparison status, coverage status, actionability, readiness impact, trust level, reason, and category dimensions.
FR-383-002: TenantPilot MUST replace old overloaded compare/gap reasons as authoritative product semantics, including ambiguous_match, policy_record_missing, foundation_not_policy_backed, missing_policy, missing_current, unsupported_subjects, unsupported_subject, coverage_unproven, and strategy_failed.
FR-383-003: TenantPilot MUST distinguish resolved identity, binding-resolved identity, canonicalization-resolved identity, unresolved identity, missing identity, and unsupported identity.
FR-383-004: TenantPilot MUST distinguish no drift, drift detected, not compared, compare not supported, and compare failed.
FR-383-005: TenantPilot MUST distinguish fully verified, verified with limitations, inventory-only, identity-only, canonical-only, unsupported, missing local evidence, missing provider resource, excluded, and accepted limitation coverage.
FR-383-006: TenantPilot MUST classify each subject outcome by actionability, including none, operator action required, provider data refresh required, binding required, scope decision required, implementation gap, accepted, and excluded.
FR-383-007: TenantPilot MUST classify each subject outcome by readiness impact, including no impact, internal limitation, customer limitation, customer blocker, and internal blocker.
FR-383-008: TenantPilot MUST classify each subject outcome by trust level, including high, medium, low, untrusted, not_applicable, and failed.
FR-383-009: TenantPilot MUST classify each subject outcome by result category, including verified, drift_detected, action_required, missing_evidence, missing_provider_resource, unsupported, limitation, excluded, and failed.
FR-383-010: TenantPilot MUST define the V1 provider-neutral result reasons as: verified_no_drift, verified_drift_detected, resolved_active_binding, resolved_canonical_identity, resolved_provider_identity, identity_required, unresolved_duplicate_candidates, unresolved_low_trust_match, unresolved_ambiguous_identity, missing_local_evidence, missing_provider_resource, unsupported_resource_class, foundation_inventory_only, foundation_identity_only, foundation_canonical_only, accepted_limitation, excluded_non_governed, compare_not_supported, and compare_failed.
FR-383-011: Low-trust, label-only, unresolved, duplicate, unsupported, excluded, accepted-limitation, or missing-evidence outcomes MUST NOT produce clean no drift.
FR-383-012: Missing provider resource MUST be distinct from missing local evidence and may only be used when current provider evidence proves absence or an active binding explicitly marks the expected resource missing.
FR-383-013: Foundation resources MUST use provider-neutral limitation or unsupported semantics rather than policy-backed or missing-policy semantics.
FR-383-014: Accepted limitation MUST NOT count as verified no drift.
FR-383-015: Excluded non-governed subjects MUST be represented separately from no drift, unsupported, and accepted limitation.
FR-383-016: Spec 382 MatchingOutcome values MUST map into the new compare outcome semantics before run/gap aggregation.
FR-383-017: Compare strategies MUST only produce trusted drift/no-drift when identity is trusted and the subject is comparable.
FR-383-018: OperationRun baseline compare context/gap payloads MUST include structured subject outcome semantics and counts by reason/category/actionability/readiness.
FR-383-019: Run-level completed/partial/blocked/failed decisions MUST derive from explicit subject outcome semantics, not legacy reason strings.
FR-383-020: Existing compare detail/status presentation MUST render provider-neutral result groups from the new semantics without adding a new page or resolution workflow.
FR-383-021: Tests asserting old overloaded reason behavior MUST be rewritten or removed unless the old string is retained only as non-authoritative migration/test fixture input explicitly documented by this spec.
FR-383-022: No legacy compatibility mapper, dual reader, historical OperationRun context reader, or old reason alias system may be introduced.
FR-383-023: Spec 383 MUST NOT implement resolution UI, Evidence/Review readiness final mapping, customer-facing Review Pack wording, report/PDF runtime work, or generic workflow orchestration.

Non-Functional Requirements

NFR-383-001: Outcome classification MUST be deterministic for the same matching, compare, evidence, and scope inputs.
NFR-383-002: Top-level result semantics MUST be provider-neutral and must not depend on Microsoft/Intune-specific labels.
NFR-383-003: Structured proof payloads MUST not store secrets, credentials, raw sensitive provider payloads, raw Graph error bodies, or unredacted operator notes.
NFR-383-004: OperationRun status and outcome transitions MUST remain service-owned through OperationRunService.
NFR-383-005: Summary count keys MUST remain compatible with OperationSummaryKeys::all() or update that canonical list with tests.
NFR-383-006: The implementation MUST avoid broad UI presenter, badge, workflow, or evidence readiness frameworks; direct mapping from canonical result truth to existing surfaces is preferred.
NFR-383-007: Test fixtures must keep provider, workspace, membership, and evidence setup explicit and must not widen global defaults.
NFR-383-008: No browser test is required unless implementation changes rendered layout/navigation/action behavior beyond existing labels/groups.

UI Action Matrix (mandatory when Filament is changed)

Spec 383 may update existing labels/grouping rendered by existing Filament/Livewire surfaces. It does not add new actions.

Surface	Location	Header Actions	Inspect Affordance (List/Table)	Row Actions (max 2 visible)	Bulk Actions (grouped)	Empty-State CTA(s)	View Header Actions	Create/Edit Save+Cancel	Audit log?	Notes / Exemptions
Baseline compare result/gap display	existing baseline compare and OperationRun detail paths	unchanged	unchanged	unchanged	unchanged	unchanged	unchanged	N/A	N/A	Only status/group/label rendering may change; no new mutation/action

Key Entities (include if feature involves data)

Baseline compare subject outcome: Derived result truth for one governed subject, carrying identity status, comparison status, coverage status, actionability, readiness impact, trust level, reason, category, and sanitized proof.
Result reason: Provider-neutral reason value that explains why the subject is verified, drifted, blocked, limited, unsupported, missing, excluded, or failed.
Result category: High-level grouping used for run aggregation and existing detail/status presentation.
Required governed subject: A governed subject whose baseline profile, subject scope, or provider-resource binding marks it as required for compare coverage. In V1, subjects included only for inventory context, accepted limitation, or explicit non-governed exclusion are not required governed subjects.
OperationRun compare proof payload: Structured existing OperationRun context payload containing subject outcome summaries, counts, and sanitized proof.
Run summary classifier: Derived aggregation that maps subject outcome categories into completed, partial/warning, blocked, or failed operation result semantics.

No new persisted entity/table/artifact is approved.

Success Criteria (mandatory)

Measurable Outcomes

SC-383-001: Automated tests prove every supported result reason maps to exactly one category, actionability, readiness impact, and trust rule.
SC-383-002: Automated tests prove low-trust, label-only, unresolved, duplicate, unsupported, accepted-limitation, excluded, and missing-evidence outcomes never produce clean no drift.
SC-383-003: Automated tests prove missing provider resource and missing local evidence produce distinct structured payloads and distinct operator actionability.
SC-383-004: Automated tests prove foundation inventory-only, identity-only, canonical-only, unsupported, and accepted-limitation cases do not use policy-backed or missing-policy semantics.
SC-383-005: OperationRun baseline compare context includes structured subject outcome payloads and counts by category/reason/actionability/readiness in covered integration tests.
SC-383-006: Existing evidence/review regression tests pass without adding customer-facing readiness or Review Pack wording changes.
SC-383-007: No old overloaded reason string remains as an authoritative enum/constant/test expectation for baseline compare result semantics.

Acceptance Criteria

AC-383-001: Provider-neutral result semantics exist and are used by baseline compare.
AC-383-002: Spec 382 matching outcomes map into final compare result semantics.
AC-383-003: Trusted identity and comparable subjects are required before drift/no-drift is produced.
AC-383-004: Missing provider resource and missing local evidence are separate states with separate actionability.
AC-383-005: Foundation and unsupported coverage are represented as limitations/unsupported, not missing policies.
AC-383-006: OperationRun gap/proof payloads contain structured semantics.
AC-383-007: Run summary classification derives from new categories, not legacy strings.
AC-383-008: Existing compare/detail status surfaces render the new group labels where they expose compare result truth.
AC-383-009: Old result-code compatibility is not introduced.
AC-383-010: No application implementation outside the selected runtime/result-semantics slice is included.

V1 Decisions And Assumptions

identity_required, unresolved duplicate candidates, missing required provider resource, required coverage unproven, and compare failure are blockers for required governed subjects.
Required governed subjects are determined from the existing baseline profile subject scope and provider-resource binding context. When the current data cannot prove a subject is optional, V1 treats the subject as required to avoid false green compare output.
missing_local_evidence requests provider data refresh unless current provider evidence proves absence.
unsupported_resource_class, foundation_inventory_only, foundation_identity_only, and foundation_canonical_only are limitations by default unless profile/scope rules mark the subject required enough to block readiness.
accepted_limitation is accepted and limited, not verified no drift.
Trusted drift is a completed compare result with findings when no blockers/failures exist; it is not an operation failure.
Excluded non-governed subjects may appear in totals and excluded counts but must not count as verified/no-drift.
V1 stores dimensions inside existing structured OperationRun/compare payloads and derived objects, not new columns or tables.
Existing customer-facing review output remains unchanged until Spec 385.

Risks

False green: Mitigated by clean-success rules and tests proving low-trust, limitation, unsupported, excluded, and missing-evidence cases cannot be no drift.
False red: Mitigated by separate limitation, unsupported, missing, blocker, and failed categories.
Taxonomy bloat: Mitigated by requiring each value to have a behavioral, aggregation, readiness, or operator-action consequence.
Provider-specific leakage: Mitigated by provider-neutral top-level terms and tests/grep checks for old policy-only top-level reasons.
Scope creep into Specs 384/385: Mitigated by explicit non-goals and regression tests that customer-facing readiness output is unchanged.
UI noise: Mitigated by reusing existing surfaces and grouping, not creating a new page or resolution workflow.

Open Questions

None blocking. Implementation must stop and update this spec and plan before continuing if it requires new persistence, a new UI workflow, a customer-facing readiness mapping, or a compatibility reader for old result payloads.

Follow-up Spec Candidates

Spec 384 - Baseline Subject Resolution UI & Operator Decisions v1.
Spec 385 - Evidence & Review Readiness Integration v1.
Optional later UI productization if existing compare result/detail surfaces need a broader redesign after 383 data semantics are implemented.

47 KiB Raw Blame History

Feature Specification: Spec 383 - Baseline Compare Result Semantics and Gap Classification v1

Repo-Truth Adjustment

Candidate Selection Gate

Spec Candidate Check (mandatory - SPEC-GATE-001)

Problem Statement

Business / Product Value

Primary Users / Operators

Spec Scope Fields (mandatory)

UI Surface Impact (mandatory - UI-COV-001)

UI/Productization Coverage

Cross-Cutting / Shared Pattern Reuse

OperationRun UX Impact

Provider Boundary / Platform Core Check

UI / Surface Guardrail Impact

Decision-First Surface Role

Audience-Aware Disclosure

UI/UX Surface Classification

Operator Surface Contract

Proportionality Review (mandatory when structural complexity is introduced)

Compatibility posture

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

User Scenarios & Testing (mandatory)

User Story 1 - Classify compare subject outcomes explicitly (Priority: P1)

User Story 2 - Map Spec 382 matching outcomes without false green or false red output (Priority: P1)

User Story 3 - Store structured OperationRun proof and run summaries (Priority: P2)

User Story 4 - Render existing compare detail groups from the new truth (Priority: P2)

Edge Cases

Requirements (mandatory)

Functional Requirements

Non-Functional Requirements

UI Action Matrix (mandatory when Filament is changed)

Key Entities (include if feature involves data)

Success Criteria (mandatory)

Measurable Outcomes

Acceptance Criteria

V1 Decisions And Assumptions

Risks

Open Questions

Follow-up Spec Candidates

47 KiB

Raw Blame History