ahmido ea77c8c718 feat(baselines): implement baseline compare result semantics (#454 )

Implemented deterministic Baseline Result Semantics (Spec 383), introducing CompareSubjectResult and CompareEvidenceResult. Replaced generic arrays with strict Data Transfer Objects for Baseline engine output.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #454

2026-06-16 20:20:27 +00:00

22 KiB

Raw Blame History

Implementation Plan: Spec 383 - Baseline Compare Result Semantics and Gap Classification v1

Branch: 383-baseline-result-semantics | Date: 2026-06-16 | Spec: spec.md Input: Feature specification from /specs/383-baseline-result-semantics/spec.md

Summary

Replace overloaded baseline compare result/gap semantics with a provider-neutral outcome model over existing Spec 382 matching and compare strategy output. The plan adds a narrow classifier/mapper, rewrites legacy authoritative reason strings, stores structured subject outcome proof in existing OperationRun/compare payloads, updates existing status/detail grouping, and keeps resolution UI, Evidence/Review final readiness, customer-facing Review Pack wording, report/PDF runtime work, and compatibility readers out of scope.

Technical Context

Language/Version: PHP 8.4.15 Primary Dependencies: Laravel 12.52, Filament 5.2.1, Livewire 4.1.4, Pest 4.3.1, PostgreSQL 16 through Sail/Dokploy Storage: Existing OperationRun context/result payloads and existing compare structures only. No new persisted entity/table/artifact is approved. Testing: Pest unit and feature tests; Filament/Livewire feature tests only for existing status rendering touched by the new grouping. Browser lane only if implementation changes layout/navigation/action behavior. Validation Lanes: fast-feedback, confidence; conditional pgsql/browser if implementation triggers those scopes. Target Platform: Laravel monolith in apps/platform. Project Type: Web admin application, runtime/result-semantics change with limited existing-surface status presentation impact. Performance Goals: Deterministic in-process classification over existing matching/compare results; no new remote work and no UI-render Graph/provider calls. Constraints: Provider-neutral top-level semantics, no legacy result compatibility, no new UI workflow, no final evidence/review readiness mapping, no OperationRun lifecycle transition outside OperationRunService. Scale/Scope: Existing baseline compare workflow and existing OperationRun/evidence-gap consumers.

Existing Repository Surfaces Likely Affected

apps/platform/app/Support/Baselines/Matching/MatchingOutcome.php
apps/platform/app/Services/Baselines/Matching/SubjectMatchingPipeline.php
apps/platform/app/Services/Baselines/Matching/FoundationCoverageResolver.php
apps/platform/app/Support/Baselines/BaselineCompareReasonCode.php
apps/platform/app/Support/Baselines/BaselineCompareEvidenceGapDetails.php
apps/platform/app/Support/Baselines/SubjectResolver.php
apps/platform/app/Support/Baselines/ResolutionOutcome.php
apps/platform/app/Support/Baselines/ResolutionOutcomeRecord.php
apps/platform/app/Support/Baselines/Compare/CompareState.php
apps/platform/app/Support/Baselines/Compare/CompareSubjectResult.php
apps/platform/app/Support/Baselines/Compare/IntuneCompareStrategy.php
apps/platform/app/Jobs/CompareBaselineToTenantJob.php
apps/platform/app/Support/OpsUx/OperationSummaryKeys.php only if new summary count keys are required
apps/platform/app/Services/Evidence/Sources/BaselineDriftPostureSource.php only for regression-safe consumption of new run summary truth
Existing baseline compare and OperationRun detail presentation tests under apps/platform/tests/Feature/Filament/
Existing baseline compare, evidence, and review-pack regression tests under apps/platform/tests/Feature/

Likely new focused support namespace if implementation keeps the plan shape:

apps/platform/app/Support/Baselines/CompareSemantics/
├── BaselineCompareOutcomeClassifier.php
├── BaselineCompareRunSummaryClassifier.php
├── CompareResultActionability.php
├── CompareResultCategory.php
├── CompareResultCoverageStatus.php
├── CompareResultIdentityStatus.php
├── CompareResultReadinessImpact.php
├── CompareResultReason.php
├── CompareResultTrustLevel.php
└── CompareSubjectOutcome.php

If implementation can satisfy the spec by extending existing classes with less structure, prefer the narrower shape and update this plan before adding broader abstractions.

UI / Surface Guardrail Plan

Guardrail scope: existing status/evidence/detail presentation changes only.
Affected routes/pages/actions/states/navigation/panel/provider surfaces: existing baseline compare and OperationRun detail contexts that render evidence gaps/status groups. No new route, navigation entry, action, modal, drawer, wizard, form, or panel provider.
No-impact class, if applicable: N/A.
Native vs custom classification summary: existing native/shared Filament/Livewire surfaces; no local design system.
Shared-family relevance: status messaging, evidence-gap detail, badge/status labels.
State layers in scope: backend payload state and existing detail/list grouping.
Audience modes in scope: operator-MSP and support-platform. Customer/read-only output is out of scope until Spec 385.
Decision/diagnostic/raw hierarchy plan: default-visible group/category/actionability/readiness first; matching proof/provider identifiers remain diagnostics/support detail.
Raw/support gating plan: no new raw payload exposure. Keep existing diagnostics/support gating.
One-primary-action / duplicate-truth control: no new actions. Use one canonical reason/category/actionability set for all rendered labels.
Handling modes by drift class or surface: limitations, unsupported, missing evidence, missing provider, blockers, drift, no drift, excluded, and failed map to distinct groups.
Repository-signal treatment: if implementation changes only labels/groups on existing surfaces, document in feature close-out. If route/layout/action hierarchy changes, update UI coverage artifacts before merge.
Special surface test profiles: standard-native-filament relief for label/group changes; browser smoke only if layout/navigation/action behavior changes.
Required tests or manual smoke: feature tests for existing Filament/Livewire status rendering when touched; no browser smoke by default.
Exception path and spread control: none planned.
Active feature PR close-out entry: Baseline Compare Result Semantics / Gap Classification.
UI/Productization coverage decision: existing surface, no new route/page/archetype.
Coverage artifacts to update: none during preparation. Implementation must update docs/ui-ux-enterprise-audit/ only if actual rendered structure or route/archetype changes.
No-impact rationale: N/A, because existing status presentation may change.
Navigation / Filament provider-panel handling: unchanged; Laravel 12 panel providers remain in apps/platform/bootstrap/providers.php.
Screenshot or page-report need: no unless implementation changes layout/navigation or customer-facing output.

Shared Pattern & System Fit

Cross-cutting feature marker: yes.
Systems touched: baseline matching, compare strategies, OperationRun proof context, evidence-gap rendering, support diagnostics where they consume baseline compare context.
Shared abstractions reused: Spec 382 MatchingOutcome and SubjectMatchingPipeline, existing compare strategy result objects, OperationRunService, OperationSummaryKeys, existing Filament/Livewire surfaces and badge/status helpers.
New abstraction introduced? why?: yes, a narrow result semantics classifier/mapper is expected. It replaces overloaded result truth and gives future Specs 384/385 a stable input.
Why the existing abstraction was sufficient or insufficient: Existing matching and compare abstractions identify subjects and payload differences, but they still express final result truth through old policy-shaped strings. Existing UI helpers can render mapped truth once the domain semantics are explicit.
Bounded deviation / spread control: The classifier is baseline-compare-owned. It must not become a workflow engine, broad evidence readiness engine, customer report wording engine, or generic provider framework.

OperationRun UX Impact

Touches OperationRun start/completion/link UX?: no.
Central contract reused: existing baseline compare operation lifecycle and Monitoring detail route/link behavior.
Delegated UX behaviors: N/A.
Surface-owned behavior kept local: N/A.
Queued DB-notification policy: N/A.
Terminal notification path: existing lifecycle only.
Exception path: none.

Implementation changes baseline compare OperationRun context/proof and summary semantics. It must keep OperationRun.status and OperationRun.outcome transitions inside OperationRunService, and any new summary count keys must be added to OperationSummaryKeys::all() with tests.

Provider Boundary & Portability Fit

Shared provider/platform boundary touched?: yes.
Provider-owned seams: provider metadata/proof fields that feed Spec 382 matching and compare strategies.
Platform-core seams: result dimensions, result reasons, categories, actionability, readiness impact, trust level, OperationRun proof payload contract, and operator-facing result vocabulary.
Neutral platform terms / contracts preserved: provider resource, governed subject, identity, binding, canonicalization, comparison, coverage, limitation, drift, evidence, actionability, readiness impact, trust level.
Retained provider-specific semantics and why: provider key/type/id/discriminator remain proof metadata. They are not top-level result categories.
Bounded extraction or follow-up path: document-in-feature for any contained provider-specific proof metadata; follow-up-spec for resolution UI or evidence/review readiness integration.

Constitution Check

Inventory-first: result semantics consume last-observed inventory, snapshots, policy versions, Spec 382 descriptors, and existing compare strategy output. Microsoft remains external truth.
Read/write separation: V1 adds no write action. Existing compare operation remains queued/observable.
Graph contract path: no new Graph calls. No Graph/provider runtime call during UI render or classification.
Deterministic capabilities: no new capability family planned.
RBAC-UX: existing workspace/managed-environment access checks remain required before baseline compare results are visible.
Workspace isolation: OperationRun and baseline/evidence reads remain workspace scoped.
Tenant isolation: managed-environment scoped compare result proof must not leak across environments.
Run observability: existing baseline compare OperationRun remains canonical execution truth.
OperationRun start UX: unchanged.
Ops-UX lifecycle: no direct status/outcome transitions may be added.
Ops-UX summary counts: new keys require OperationSummaryKeys::all() update and tests; otherwise reuse existing keys.
Data minimization: structured proof must be sanitized and exclude secrets/raw provider payloads.
Test governance: unit and feature lanes are narrowest; browser/pgsql conditional only.
Proportionality: new semantic family is justified because old reason strings are product truth and block future resolution/readiness work.
No premature abstraction: only baseline compare semantics, not a generic workflow/evidence/report framework.
Persisted truth: no new table/entity approved; structured payloads use existing OperationRun context/result paths.
Behavioral state: every new value must change actionability, readiness, aggregation, trust, or operator interpretation.
UI semantics: direct domain-to-existing-surface mapping; no new UI taxonomy framework.
Shared pattern first: existing OperationRun and Filament/Livewire rendering paths are reused.
Provider boundary: top-level compare semantics are provider-neutral.
V1 explicitness / few layers: replace old strings rather than stack compatibility aliases.
Spec discipline / bloat check: result semantics grouped in one coherent spec; resolution UI and evidence/review readiness remain follow-ups.
Filament-native UI: no new Filament surface/action/layout. Existing native/shared surfaces only.
UI/Productization coverage: existing status presentation changes are documented; no new route/page/archetype.

Test Governance Check

Test purpose / classification by changed surface: Unit for semantic values/classifiers; Feature for compare integration, OperationRun payloads, existing status presentation, evidence/review regressions.
Affected validation lanes: fast-feedback, confidence; pgsql/browser conditional.
Why this lane mix is the narrowest sufficient proof: The behavior is deterministic classification and existing DB-backed compare result context. UI browser proof is not needed unless layout/navigation changes.
Narrowest proving command(s):
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines/CompareSemantics
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareGapClassificationTest.php tests/Feature/Baselines/BaselineCompareAmbiguousMatchGapTest.php tests/Feature/Baselines/BaselineCompareProviderResourceBindingCanonicalIdentityTest.php tests/Feature/Baselines/BaselineCompareExecutionGuardTest.php tests/Feature/Baselines/BaselineCompareResumeTokenTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php tests/Feature/Filament/BaselineCompareEvidenceGapTableTest.php
- cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Evidence/BaselineDriftPostureSourceTest.php tests/Feature/ReviewPack/Spec347ReviewPackReadinessSemanticsTest.php tests/Feature/ReviewPack/Spec349ReviewPackResolutionGuidanceTest.php
Fixture / helper / factory / seed / context cost risks: reuse existing baseline compare and Spec 382 fixtures. No global provider/workspace defaults.
Expensive defaults or shared helper growth introduced?: no.
Heavy-family additions, promotions, or visibility changes: none planned.
Surface-class relief / special coverage rule: standard-native-filament relief unless UI structure changes.
Closing validation and reviewer handoff: reviewers verify no legacy reason compatibility, no provider-specific top-level semantics, no false no-drift, no Spec 384/385 scope, and no hidden browser/pgsql lane change.
Budget / baseline / trend follow-up: none expected.
Review-stop questions: taxonomy bloat, old string leftovers, summary count key ownership, fixture cost, and scope bleed into evidence/review/customer output.
Escalation path: document-in-feature for contained existing-surface label changes; follow-up-spec for structural UI or evidence/readiness integration.
Active feature PR close-out entry: Baseline Compare Result Semantics / Gap Classification.
Why no dedicated follow-up spec is needed: This spec is the dedicated follow-up to Spec 382. Specs 384 and 385 remain separate for UI decisions and readiness integration.

Project Structure

Documentation (this feature)

specs/383-baseline-result-semantics/
├── checklists/
│   └── requirements.md
├── plan.md
├── spec.md
└── tasks.md

Source Code (repository root)

apps/platform/app/
├── Jobs/
│   └── CompareBaselineToTenantJob.php
├── Services/
│   ├── Baselines/
│   │   └── Matching/
│   └── Evidence/
│       └── Sources/
└── Support/
    ├── Baselines/
    │   ├── Compare/
    │   ├── CompareSemantics/          # expected new narrow support namespace
    │   ├── Matching/
    │   ├── BaselineCompareEvidenceGapDetails.php
    │   ├── BaselineCompareReasonCode.php
    │   ├── ResolutionOutcome.php
    │   └── SubjectResolver.php
    └── OpsUx/
        └── OperationSummaryKeys.php

apps/platform/tests/
├── Unit/Support/Baselines/CompareSemantics/
├── Unit/Support/Baselines/Matching/
├── Feature/Baselines/
├── Feature/Filament/
├── Feature/Evidence/
└── Feature/ReviewPack/

Structure Decision: Use the existing Laravel monolith under apps/platform. Keep semantics code baseline-compare-owned. Do not create a new package, module root, route family, UI framework, or persistence layer.

Complexity Tracking

Violation	Why Needed	Simpler Alternative Rejected Because
New result reason/category/actionability/readiness family	Current reason strings mix identity, evidence, provider absence, limitations, unsupported scope, and failures	Renaming labels would preserve ambiguous product truth and leave future Specs 384/385 unsafe
New classifier/mapper	Spec 382 matching and existing compare strategies need one canonical mapping into final result semantics	Scattering mappings in `CompareBaselineToTenantJob`, `SubjectResolver`, and UI helpers would create duplicate truth
Structured OperationRun proof payload	Monitoring/support/evidence consumers need machine-readable result truth	Keeping flat `by_reason` strings forces every consumer to decode overloaded legacy labels

Proportionality Review

Current operator problem: Operators cannot tell which compare outcomes are trusted, blocked, missing evidence, missing provider resource, unsupported, limited, excluded, or failed.
Existing structure is insufficient because: Current runtime still uses old strings in MatchingOutcome, SubjectResolver, compare strategy diagnostics, OperationRun context, and tests.
Narrowest correct implementation: One baseline compare semantics layer plus mapped structured payloads over existing matching/compare outputs.
Ownership cost created: New value families and mapping tests; reviewer vigilance against compatibility aliases and UI/evidence/report scope creep.
Alternative intentionally rejected: Keep old strings and add display labels. That would not remove false green/false red risk and would leave downstream readiness work ambiguous.
Release truth: Current-release truth required after Spec 382.

Domain And Data Model Implications

MatchingOutcome remains upstream matching truth, but its reason codes must map to final compare semantics.
CompareSubjectResult remains compare strategy output, but strategy gap reasons must map to final compare semantics.
BaselineCompareReasonCode may be replaced, narrowed, or kept only as run-level summary codes if it no longer carries overloaded subject-level truth.
ResolutionOutcome and SubjectResolver must not remain authoritative for new compare result semantics if their old values are policy-shaped.
OperationRun baseline compare context may preserve the current rendering envelope only where existing surfaces still read it, but this is not legacy semantic compatibility: authoritative result truth must be structured under a new semantic payload path, and old reason aliases/readers remain prohibited.
Existing local/dev rows need no compatibility reader. If implementation needs to purge/reset old local/dev payloads, document the operational step in close-out.
No new table, migration, index, queue, scheduler, env var, or storage path is expected. If implementation needs any of these, update spec and plan before continuing.

Implementation Phases

Confirm completed dependency guardrails for Specs 381 and 382, and confirm no changes to completed spec history.
Add unit tests for result reasons, categories, actionability, readiness impact, trust, clean-success rules, and run summary classification.
Add feature tests for baseline compare gap payloads, missing provider vs missing local evidence, foundation limitation mapping, active binding/matching outcome mapping, and old reason removal.
Add or update the narrow result semantics value family and classifier.
Map Spec 382 MatchingOutcome to final compare subject outcomes.
Map compare strategy states and diagnostics to final compare subject outcomes.
Update CompareBaselineToTenantJob to aggregate structured subject outcomes, gap subjects, category counts, actionability counts, readiness counts, and run summary decisions.
Update existing evidence-gap/detail/status label helpers and Filament/Livewire feature tests if rendered groups change.
Run evidence/review regression tests to prove no final readiness/customer output mapping is introduced.
Run targeted tests, Pint, and diff check; record close-out with Filament/Livewire/deploy impact.

Filament v5 Output Contract For Later Implementation Report

Livewire v4.0+ compliance: unchanged unless implementation unexpectedly touches Livewire. Project currently uses Livewire 4.1.4.
Provider registration location: unchanged. Laravel 12 panel providers remain in apps/platform/bootstrap/providers.php.
Global search: no resource is added or changed; no global search behavior is planned.
Destructive/high-impact actions: no Filament action is added and no destructive action is introduced. Existing compare start behavior keeps existing authorization/OperationRun rules.
Asset strategy: no Filament assets are registered; no Spec 383-specific filament:assets deployment concern beyond normal release process.
Testing plan: unit/feature tests cover semantics, compare integration, OperationRun payloads, existing status rendering, and evidence/review regressions. No browser test unless UI layout/navigation/action behavior changes.

Rollout And Deployment Considerations

No environment variables, queue names, scheduler entries, storage volumes, reverse proxy changes, route changes, panel provider changes, or asset build changes are expected.
No schema migration is expected. Because TenantPilot is pre-production, old local/dev compare payloads may be invalidated/reset instead of read through a compatibility mapper.
Staging validation should run targeted compare/semantics/evidence/review tests and normal formatting checks before production promotion.
Rollback is code rollback plus clearing/regenerating local/dev compare OperationRun payloads if necessary; no persisted compatibility layer is planned.

22 KiB Raw Blame History