ahmido ea77c8c718 feat(baselines): implement baseline compare result semantics (#454 )

Implemented deterministic Baseline Result Semantics (Spec 383), introducing CompareSubjectResult and CompareEvidenceResult. Replaced generic arrays with strict Data Transfer Objects for Baseline engine output.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #454

2026-06-16 20:20:27 +00:00

20 KiB

Raw Blame History

Tasks: Spec 383 - Baseline Compare Result Semantics and Gap Classification v1

Input: Design documents from /specs/383-baseline-result-semantics/ Prerequisites: spec.md, plan.md, completed Specs 381 and 382 close-outs

Tests: Runtime behavior changes require Pest unit and feature tests before or alongside implementation. Browser tests are not required unless implementation changes rendered layout, navigation, actions, or JavaScript behavior.

Test Governance Checklist

TGC001 Lane assignment is named and is the narrowest sufficient proof for the changed behavior.
TGC002 New or changed tests stay in the smallest honest family, and any heavy-governance or browser addition is explicit.
TGC003 Shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default; any widening is isolated or documented.
TGC004 Planned validation commands cover the change without pulling in unrelated lane cost.
TGC005 The declared surface test profile or standard-native-filament relief is explicit.
TGC006 Any material budget, baseline, trend, or escalation note is recorded in the active spec or PR.

Phase 1: Preparation And Guardrails

Purpose: Protect completed history, confirm repo truth, and keep the implementation bounded.

T001 Confirm specs/381-provider-resource-identity-binding/implementation-close-out.md and specs/382-baseline-matching-canonicalization/implementation-close-out.md exist and treat both as dependency context only.
T002 Confirm no code or artifact changes are made to completed specs specs/381-provider-resource-identity-binding/, specs/382-baseline-matching-canonicalization/, specs/163-baseline-subject-resolution/, specs/336-baseline-compare-product-process-flow-alignment/, specs/347-review-pack-output-contract-readiness-semantics/, specs/350-operator-resolution-guidance-framework-v1/, or specs/380-management-report-pdf-staging-runtime-validation/.
T003 Re-read apps/platform/app/Support/Baselines/Matching/MatchingOutcome.php, apps/platform/app/Services/Baselines/Matching/SubjectMatchingPipeline.php, and apps/platform/app/Services/Baselines/Matching/FoundationCoverageResolver.php before implementation.
T004 Re-read apps/platform/app/Jobs/CompareBaselineToTenantJob.php, apps/platform/app/Support/Baselines/Compare/CompareSubjectResult.php, apps/platform/app/Support/Baselines/Compare/IntuneCompareStrategy.php, and apps/platform/app/Support/Baselines/Compare/CompareState.php before implementation.
T005 Re-read apps/platform/app/Support/Baselines/BaselineCompareReasonCode.php, apps/platform/app/Support/Baselines/BaselineCompareEvidenceGapDetails.php, apps/platform/app/Support/Baselines/SubjectResolver.php, and apps/platform/app/Support/Baselines/ResolutionOutcome.php before implementation.
T006 Confirm no new route, navigation entry, destructive action, Filament panel provider, Livewire component, queue name, scheduler entry, env var, storage path, or persisted entity is needed; if any is needed, stop and update specs/383-baseline-result-semantics/spec.md and plan.md.

Phase 2: Tests First - Core Semantics

Purpose: Lock the new source of truth before changing runtime code.

T007 [P] [US1] Add coverage in apps/platform/tests/Unit/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifierTest.php for every V1 reason, category, actionability, readiness impact, and trust-level mapping.
T008 [P] [US1] Add coverage in apps/platform/tests/Unit/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifierTest.php for clean success rules, drift, no drift, blocker, limitation, unsupported, missing, excluded, and failed outcomes.
T009 [P] [US1] Add apps/platform/tests/Unit/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifierTest.php covering trusted no-drift, trusted drift, identity required, duplicate candidates, missing provider resource, missing local evidence, unsupported resource, inventory-only foundation, identity-only foundation, accepted limitation, excluded non-governed, low-trust not no-drift, and compare failure.
T010 [P] [US3] Add coverage in apps/platform/tests/Unit/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifierTest.php for completed, completed-with-drift, partial/limited, blocked, and failed run aggregation.
T011 [P] [US1] Add coverage in apps/platform/tests/Unit/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifierTest.php asserting old overloaded reason values are not authoritative enum/constant values in the new semantics model.

Phase 3: Tests First - Matching And Compare Integration

Purpose: Prove Spec 382 matching outcomes map to final compare semantics without false green or false red output.

T012 [P] [US2] Update apps/platform/tests/Unit/Support/Baselines/Matching/MatchingOutcomeTest.php so matching reasons expected by Spec 383 are provider-neutral and no longer assert ambiguous_match, unsupported_subject, or foundation_not_policy_backed as final result truth.
T013 [P] [US2] Update apps/platform/tests/Unit/Support/Baselines/Matching/SubjectMatchingPipelineTest.php to assert active binding, canonical identity, duplicate candidates, missing local evidence, missing provider resource, unsupported, limited, excluded, and identity-required outcomes map through the classifier.
T014 [P] [US2] Update apps/platform/tests/Unit/Services/Baselines/Matching/FoundationCoverageResolverTest.php so inventory-only, identity-only, canonical-only, unsupported, and accepted limitation coverage expect the new provider-neutral reason names.
T015 [P] [US2] Update apps/platform/tests/Feature/Baselines/BaselineCompareProviderResourceBindingCanonicalIdentityTest.php to assert binding-resolved identity produces trusted comparison eligibility but not no-drift by itself.
T016 [P] [US2] Update apps/platform/tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php to expect unresolved_duplicate_candidates or unresolved_ambiguous_identity instead of ambiguous_match.
T017 [P] [US1] Update apps/platform/tests/Feature/Baselines/BaselineCompareGapClassificationTest.php to assert missing local evidence, missing provider resource, and foundation limitation states are distinct.
T018 [P] [US3] Update apps/platform/tests/Feature/Baselines/BaselineCompareExecutionGuardTest.php so compare strategy exceptions map to compare-failed semantics without relying on strategy_failed as an authoritative subject reason.
T019 [P] [US3] Update apps/platform/tests/Feature/Baselines/BaselineCompareResumeTokenTest.php so resumed full-content gaps use new missing-local-evidence semantics instead of policy_record_missing.
T020 [P] [US1] Update apps/platform/tests/Feature/Baselines/BaselineCompareGapClassificationTest.php to prove stale or absent current provider descriptors do not by themselves emit missing_provider_resource, and that missing-provider semantics require current-provider absence proof or an active binding that marks the expected resource missing.

Phase 4: Tests First - Existing Presentation And Downstream Regressions

Purpose: Keep existing surfaces and downstream consumers honest without implementing Spec 384 or 385.

T021 [P] [US4] Update apps/platform/tests/Feature/Filament/BaselineCompareEvidenceGapTableTest.php to assert existing evidence-gap rows render the new result groups and do not expose old reason strings as primary operator truth.
T022 [P] [US4] Update apps/platform/tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php to assert provider-neutral blocker/limitation/missing/failure explanations.
T023 [P] [US4] Update apps/platform/tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php to prove no-drift explanation appears only when the new clean-success rules allow it.
T024 [P] [US4] Update apps/platform/tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php to assert result group totals stay consistent with OperationRun context counts.
T025 [P] [US3] Update apps/platform/tests/Feature/Evidence/BaselineDriftPostureSourceTest.php to prove evidence posture does not treat blockers/limitations as verified no drift before Spec 385.
T026 [P] [US3] Update apps/platform/tests/Feature/ReviewPack/Spec347ReviewPackReadinessSemanticsTest.php and apps/platform/tests/Feature/ReviewPack/Spec349ReviewPackResolutionGuidanceTest.php to prove customer-facing readiness/output wording is unchanged by Spec 383.

Phase 5: Define The Narrow Result Semantics Model

Purpose: Add provider-neutral value families with direct behavioral consequences.

T027 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareResultIdentityStatus.php with resolved, binding-resolved, canonicalization-resolved, unresolved, missing, and unsupported identity values.
T028 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareResultComparisonStatus.php with not-compared, no-drift, drift-detected, compare-failed, and compare-not-supported values.
T029 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareResultCoverageStatus.php with fully-verified, verified-with-limitations, inventory-only, identity-only, canonical-only, unsupported, missing-local-evidence, missing-provider-resource, excluded, and accepted-limitation values.
T030 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareResultActionability.php, CompareResultReadinessImpact.php, CompareResultTrustLevel.php, and CompareResultCategory.php with the V1 values from specs/383-baseline-result-semantics/spec.md.
T031 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareResultReason.php with provider-neutral reasons and mapping methods for category, actionability, readiness impact, and default trust.
T032 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/CompareSubjectOutcome.php as a derived result object with sanitized toArray() output for OperationRun/context use.
T033 [US1] Create apps/platform/app/Support/Baselines/CompareSemantics/BaselineCompareOutcomeClassifier.php to map matching outcomes plus compare strategy outputs into CompareSubjectOutcome.
T034 [US3] Create apps/platform/app/Support/Baselines/CompareSemantics/BaselineCompareRunSummaryClassifier.php to aggregate subject outcomes into run-level completed, partial, blocked, or failed decisions and count buckets.

Phase 6: Replace Legacy Matching And Gap Reasons

Purpose: Stop old matching/gap strings from remaining authoritative.

T035 [US2] Update apps/platform/app/Support/Baselines/Matching/MatchingOutcome.php so factory methods use new provider-neutral reason names and keep old strings out of authoritative output.
T036 [US2] Update apps/platform/app/Services/Baselines/Matching/FoundationCoverageResolver.php so unsupported and foundation coverage returns unsupported_resource_class, foundation_inventory_only, foundation_identity_only, or foundation_canonical_only as appropriate.
T037 [US2] Update apps/platform/app/Services/Baselines/Matching/SubjectMatchingPipeline.php to map duplicate candidates to unresolved_duplicate_candidates, low-trust/identity gaps to identity_required or unresolved_low_trust_match, active binding resolution to resolved_active_binding, and canonical/provider identity to provider-neutral resolved reasons.
T038 [US1] Update apps/platform/app/Support/Baselines/SubjectResolver.php, apps/platform/app/Support/Baselines/ResolutionOutcome.php, and apps/platform/app/Support/Baselines/ResolutionOutcomeRecord.php so legacy policy-shaped reasons are no longer final compare result truth; retain only non-authoritative helper behavior if still needed by capture flows and document any boundary in code/tests.
T039 [US1] Update apps/platform/app/Support/Baselines/BaselineCompareReasonCode.php so run-level reasons are either provider-neutral summary reasons or delegated to the new run summary classifier.

Phase 7: Compare Strategy And OperationRun Integration

Purpose: Make runtime compare output and proof payloads use the new truth.

T040 [US2] Update apps/platform/app/Support/Baselines/Compare/CompareState.php or its mapping layer so unsupported, incomplete, ambiguous, failed, drift, and no-drift states map to CompareSubjectOutcome without old reason strings.
T041 [US2] Update apps/platform/app/Support/Baselines/Compare/CompareSubjectResult.php to expose structured semantic payloads or enough diagnostics for BaselineCompareOutcomeClassifier without duplicating result truth.
T042 [US2] Update apps/platform/app/Support/Baselines/Compare/IntuneCompareStrategy.php so missing current evidence, unsupported subjects, ambiguous conditions, and compare failures emit provider-neutral diagnostics and keep drift/no-drift limited to trusted comparable subjects.
T043 [US2] Update apps/platform/tests/Feature/Baselines/Support/FakeCompareStrategy.php to emit provider-neutral diagnostics used by Spec 383 tests.
T044 [US3] Update apps/platform/app/Jobs/CompareBaselineToTenantJob.php to build structured CompareSubjectOutcome records from matching outcomes and strategy results before gap aggregation.
T045 [US3] Update apps/platform/app/Jobs/CompareBaselineToTenantJob.php so baseline_compare.evidence_gaps includes structured counts by reason, category, actionability, readiness impact, and subject outcome payloads.
T046 [US3] Update apps/platform/app/Jobs/CompareBaselineToTenantJob.php so run outcome and summary_counts derive from BaselineCompareRunSummaryClassifier and stay compatible with OperationSummaryKeys::all().
T047 [US3] Update apps/platform/app/Support/OpsUx/OperationSummaryKeys.php and apps/platform/tests/Feature/OpsUx/OperationSummaryKeysSpecTest.php only if Spec 383 needs new count keys not representable by the existing canonical list.
T048 [US3] Add or update a focused test or guard assertion proving baseline compare aggregation does not mutate OperationRun.status or OperationRun.outcome outside OperationRunService while summary semantics change.

Phase 8: Existing Surface Labels And Downstream Consumers

Purpose: Render the new truth on existing surfaces without new UI workflows.

T049 [US4] Update apps/platform/app/Support/Baselines/BaselineCompareEvidenceGapDetails.php to render provider-neutral group labels for verified, drift detected, action required, missing evidence, missing provider resource, unsupported, limitations, excluded, and failed.
T050 [US4] Update apps/platform/app/Support/Baselines/BaselineCompareExplanationRegistry.php and apps/platform/app/Support/ReasonTranslation/ReasonPresenter.php; search for OperationRun baseline-compare presentation helpers directly touched by the implementation and update any matches so primary operator text no longer uses old reason strings.
T051 [US4] Confirm apps/platform/app/Livewire/BaselineCompareEvidenceGapTable.php uses existing data paths and does not add a new action, route, modal, drawer, or layout pattern.
T052 [US3] Update apps/platform/app/Services/Evidence/Sources/BaselineDriftPostureSource.php only if needed to avoid treating blocked/limited compare runs as complete before Spec 385.
T053 [US4] If implementation changes route/layout/action structure instead of labels/groups only, update the active spec/plan plus docs/ui-ux-enterprise-audit/route-inventory.md, docs/ui-ux-enterprise-audit/design-coverage-matrix.md, and docs/ui-ux-enterprise-audit/page-reports/ui-015-baseline-compare.md before continuing.

Phase 9: Legacy Removal And Scope Guard

Purpose: Remove old authoritative truth and prevent accidental compatibility scope.

T054 [US1] Search apps/platform/app and apps/platform/tests for ambiguous_match, policy_record_missing, foundation_not_policy_backed, missing_policy, missing_current, unsupported_subject, unsupported_subjects, coverage_unproven, and strategy_failed; remove or convert compare-result usages to the new semantics.
T055 [US1] Keep any old string that remains only if it is outside baseline compare result truth or is explicitly transitional fixture input; document the boundary in the nearest test or close-out note.
T056 [US1] Confirm no legacy result-code mapper, old OperationRun context reader, dual old/new result reader, or compatibility alias is introduced.
T057 [US4] Confirm no Spec 384 resolution UI, manual bind/exclude/accept-limitation workflow, or operator decision screen is implemented.
T058 [US3] Confirm no Spec 385 Evidence Snapshot readiness final mapping, Review Pack publication blocker mapping, or customer-facing wording is implemented.
T059 [US4] Confirm no Management Report/PDF runtime or report wording work is included.

Phase 10: Validation And Close-Out

Purpose: Prove the implementation and document the exact operational impact.

T060 Run cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/CompareSemantics.
T061 Run cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/Matching tests/Unit/Baselines/CompareStrategyRegistryTest.php.
T062 Run cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareGapClassificationTest.php tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php tests/Feature/Baselines/BaselineCompareProviderResourceBindingCanonicalIdentityTest.php tests/Feature/Baselines/BaselineCompareExecutionGuardTest.php tests/Feature/Baselines/BaselineCompareResumeTokenTest.php.
T063 Run cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php tests/Feature/Filament/BaselineCompareEvidenceGapTableTest.php.
T064 Run cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Evidence/BaselineDriftPostureSourceTest.php tests/Feature/ReviewPack/Spec347ReviewPackReadinessSemanticsTest.php tests/Feature/ReviewPack/Spec349ReviewPackResolutionGuidanceTest.php.
T065 Run a PostgreSQL lane only if implementation adds migrations, JSONB indexes/query behavior, locks, or constraints.
T066 Run a browser smoke test only if implementation changes rendered layout, navigation, actions, or JavaScript behavior beyond labels/groups.
T067 Run cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent.
T068 Run git diff --check.
T069 Record specs/383-baseline-result-semantics/implementation-close-out.md with Livewire v4 compliance, provider registration location, global search status, destructive/high-impact action status, asset strategy, tests run, browser decision, and deployment impact.

Dependencies

Phase 1 must finish before runtime work.
Phases 2-4 should be written before or alongside implementation changes.
Phase 5 unblocks Phases 6 and 7.
Phase 6 must complete before OperationRun/gap aggregation can be trusted.
Phase 7 unblocks Phase 8.
Phase 9 and Phase 10 validate the completed implementation.

Parallel Opportunities

T007-T011 can be drafted in parallel.
T012-T020 can be drafted in parallel if each test file remains scoped.
T021-T026 can be drafted in parallel with the core semantics tests.
T027-T034 can be implemented in parallel after names are agreed, but mapping methods should converge before integration.
T035-T039 and T040-T048 should be coordinated because they touch shared reason mappings.
T060-T064 can be run independently after implementation, but close-out should cite the complete targeted set.

Explicit Non-Goals

Do not add new persisted entities/tables/artifacts without updating spec and plan first.
Do not add new routes, navigation entries, Filament actions, modals, drawers, wizards, panel providers, or assets.
Do not add operator resolution UI.
Do not change final Evidence Snapshot readiness, Review Pack readiness, or customer-facing report/review wording.
Do not add historical payload mappers, OperationRun context compatibility readers, or old reason aliases.
Do not create a generic workflow engine, report engine, provider framework, badge framework, or evidence readiness framework.

20 KiB Raw Blame History