ahmido 02e75e1cda feat: harden baseline compare summary trust surfaces (#196 )

## Summary
- add a shared baseline compare summary assessment and assessor for compact trust propagation
- harden dashboard, landing, and banner baseline compare surfaces against false all-clear claims
- add focused Pest coverage for dashboard, landing, banner, reason translation, and canonical detail parity

## Validation
- vendor/bin/sail bin pint --dirty --format agent
- vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareSummaryAssessmentTest.php tests/Feature/Baselines/BaselineCompareExplanationFallbackTest.php tests/Feature/Filament/BaselineCompareNowWidgetTest.php tests/Feature/Filament/NeedsAttentionWidgetTest.php tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php tests/Feature/Filament/BaselineCompareCoverageBannerTest.php tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php tests/Feature/Filament/OperationRunBaselineTruthSurfaceTest.php tests/Feature/ReasonTranslation/ReasonTranslationExplanationTest.php

## Notes
- Livewire compliance: Filament v5 / Livewire v4 stack unchanged
- Provider registration: unchanged, Laravel 12 providers remain in bootstrap/providers.php
- Global search: no searchable resource behavior changed
- Destructive actions: none introduced by this change
- Assets: no new assets registered; existing deploy process remains unchanged

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #196

2026-03-27 00:19:53 +00:00

24 KiB

Raw Blame History

Implementation Plan: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening

Branch: 165-baseline-summary-trust | Date: 2026-03-26 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md

Summary

Harden all in-scope baseline and drift summary surfaces so that no compact widget, KPI-adjacent summary, banner, or landing headline can imply Compliant, No drift, or an equivalent all-clear unless the underlying compare result is genuinely trustworthy. The implementation will introduce a shared baseline summary-state contract derived from the existing baseline compare truth and explanation layers, replace findings-count shortcuts on the tenant dashboard, propagate evidence-gap and coverage limitations into summary claims, keep the landing surface and run drilldown semantically aligned, and lock the behavior down with focused Pest and Livewire coverage.

Key approach: reuse the current baseline domain seams already present in BaselineCompareStats, BaselineCompareExplanationRegistry, reason translation, and badge semantics; add one reusable summary assessment layer for compact surfaces; preserve the existing Compare now action, routes, and DB-only render behavior; and avoid any model, enum, or schema changes.

Technical Context

Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing BaselineCompareStats, BaselineCompareExplanationRegistry, ReasonPresenter, BadgeCatalog or BadgeRenderer, UiEnforcement, and OperationRunLinks
Storage: PostgreSQL with existing baseline, findings, and operation_runs tables plus JSONB-backed compare context; no schema change planned
Testing: Pest feature tests, Livewire component tests, dashboard DB-only render regression, all executed through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production Project Type: Laravel monolith web application
Performance Goals: Keep dashboard and landing renders DB-only, preserve existing lazy-widget behavior, avoid new outbound HTTP or background dispatch during render, and keep summary claims understandable within one short scan of each surface
Constraints: No new database tables, no new outcome enums, no compare-engine rewrite, no route or RBAC drift, no new global assets, no dashboard summary more optimistic than landing or run detail, and no new ad hoc status-color language
Scale/Scope: Four primary summary surface families, one shared support-layer contract, one tenant landing page, one canonical drilldown alignment path, and focused regression coverage across trustworthy, limited, failed, missing, and evidence-gap-affected compare scenarios

Constitution Check

GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.

Principle	Status	Notes
Inventory-first	Pass	No inventory, snapshot, or evidence ownership semantics change; the work is presentation hardening only
Read/write separation	Pass	No new mutation path is introduced; the feature remains read-only except for the already-existing guarded `Compare now` action
Graph contract path	Pass	No new Graph call path, contract-registry entry, or render-time network access is introduced
Deterministic capabilities	Pass	No new capability derivation; existing capability and tenant-view checks remain authoritative
RBAC-UX planes and 404 vs 403	Pass	All covered surfaces stay in the tenant/admin plane except the existing canonical run drilldown, which keeps current tenant-safe access rules
Workspace isolation	Pass	No workspace-context broadening; tenant summary surfaces still require an established workspace context
Tenant isolation	Pass	Covered surfaces remain tenant-scoped, and canonical run drilldown remains entitlement-checked before revealing tenant-linked evidence
Destructive confirmation	Pass	No new destructive action; existing `Compare now` already uses confirmation and capability gating
Global search safety	Pass	No global-search behavior or searchable resource configuration changes are part of this feature
Run observability	Pass	Existing baseline compare `OperationRun` behavior remains unchanged; the feature only reads and interprets current run evidence
Ops-UX 3-surface feedback	Pass	No new toasts, progress surfaces, or terminal notifications are introduced
Ops-UX lifecycle ownership	Pass	`OperationRun.status` and `OperationRun.outcome` remain service-owned and untouched by this feature
Ops-UX summary counts	Pass	Existing `summary_counts` rules stay unchanged; the feature consumes result meaning rather than redefining counts
Ops-UX guards	Pass	Existing lifecycle guards remain intact; new tests will focus on summary truth and cross-surface consistency
Data minimization	Pass	No new secrets, raw Graph payloads, or low-level diagnostics are elevated into default-visible summaries
Badge semantics (BADGE-001)	Pass	Status tones and badges must continue to come from central badge or shared primitive semantics rather than page-local green or warning shortcuts
Filament-native UI (UI-FIL-001)	Pass	Widgets, banners, and landing summaries continue to use Filament sections, badges, links, and shared primitives rather than bespoke status components
UI naming (UI-NAMING-001)	Pass	Operator-facing copy remains domain-first and must avoid false-calming phrases when the result is not decision-grade
Operator surfaces (OPSURF-001)	Pass	The feature explicitly strengthens operator-first meaning by carrying governance result, evidence completeness, and next step into compact surfaces
Filament Action Surface Contract	Pass	Existing action inventory stays stable; only summary semantics and wording change
Filament UX-001	Pass with documented variance	The landing page remains a custom enterprise layout rather than a stock infolist, but it still honors sectioning, centralized badges, and operator-first hierarchy
Filament v5 / Livewire v4 compliance	Pass	All work stays within the current Filament v5 and Livewire v4 stack
Provider registration location	Pass	No panel or provider changes are required; Laravel 11+ provider registration remains in `bootstrap/providers.php`
Global-search hard rule	Pass	No globally searchable resource is added or modified; no Edit/View-page requirement changes are triggered
Asset strategy	Pass	No new Filament assets are planned; deployment expectations for `php artisan filament:assets` remain unchanged because no asset registration changes are introduced

Phase 0 Research

Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/research.md.

Key decisions:

Use one shared summary assessment derived from BaselineCompareStats and operatorExplanation() instead of findings-count-only widget logic.
Treat the current BaselineCompareNow success pill and the NeedsAttention healthy state as the highest-risk false-calm surfaces and harden them first.
Propagate evidence gaps into summary semantics even when uncovered-types coverage warnings are absent.
Keep KPI cards quantitative only; they may link to deeper surfaces but must not become semantic all-clear claims.
Extend existing landing, widget, and baseline-truth tests rather than creating a separate UI harness.

Phase 1 Design

Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/:

data-model.md: derived summary-state entities, fields, and state-family rules
contracts/baseline-summary-surface.openapi.yaml: internal surface-contract schema for compact baseline compare claims across dashboard, landing, banner, and run drilldown
quickstart.md: focused verification workflow for manual and automated validation

Design decisions:

No schema migration is required; the design uses existing baseline compare stats, reason translation, operator explanation, findings, and operation-run evidence.
The primary implementation seam is a new shared support-layer summary assessment in app/Support/Baselines, consumed by dashboard widgets, the landing page, and any summary-adjacent banner or headline.
The existing BaselineCompareStats::forWidget() shortcut is too lossy for trust propagation, so covered summary surfaces must consume either the richer tenant stats or a derived contract built from them.
BaselineCompareNow and NeedsAttention must stop deriving healthy or compliant claims from zero findings alone.
The coverage banner must consider evidence gaps as summary-limiting signals, not only uncovered policy types and missing snapshots.
Canonical run detail remains the deepest truth surface and becomes the semantic ceiling: compact surfaces may be equally cautious or more cautious, never more optimistic.

Project Structure

Documentation (this feature)

specs/165-baseline-summary-trust/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── baseline-summary-surface.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md

Source Code (repository root)

app/
├── Filament/
│   ├── Pages/
│   │   └── BaselineCompareLanding.php
│   └── Widgets/
│       ├── Dashboard/
│       │   ├── BaselineCompareNow.php
│       │   ├── DashboardKpis.php
│       │   └── NeedsAttention.php
│       └── Tenant/
│           └── BaselineCompareCoverageBanner.php
├── Support/
│   ├── Baselines/
│   │   ├── BaselineCompareStats.php
│   │   ├── BaselineCompareExplanationRegistry.php
│   │   ├── BaselineCompareEvidenceGapDetails.php
│   │   └── BaselineCompareReasonCode.php
│   ├── Badges/
│   │   ├── BadgeCatalog.php
│   │   └── BadgeRenderer.php
│   └── ReasonTranslation/
│       └── ReasonTranslator.php

resources/
└── views/
    └── filament/
        ├── pages/
        │   └── baseline-compare-landing.blade.php
        └── widgets/
            ├── dashboard/
            │   ├── baseline-compare-now.blade.php
            │   └── needs-attention.blade.php
            └── tenant/
                └── baseline-compare-coverage-banner.blade.php

tests/
├── Feature/
│   ├── Baselines/
│   │   ├── BaselineCompareStatsTest.php
│   │   ├── BaselineCompareSummaryAssessmentTest.php
│   │   ├── BaselineCompareExplanationFallbackTest.php
│   │   └── BaselineCompareWhyNoFindingsReasonCodeTest.php
│   ├── ReasonTranslation/
│   │   └── ReasonTranslationExplanationTest.php
│   └── Filament/
│       ├── BaselineCompareCoverageBannerTest.php
│       ├── BaselineCompareExplanationSurfaceTest.php
│       ├── BaselineCompareLandingWhyNoFindingsTest.php
│       ├── BaselineCompareLandingStartSurfaceTest.php
│       ├── BaselineCompareNowWidgetTest.php
│       ├── BaselineCompareSummaryConsistencyTest.php
│       ├── NeedsAttentionWidgetTest.php
│       ├── OperationRunBaselineTruthSurfaceTest.php
│       └── TenantDashboardDbOnlyTest.php

Structure Decision: Standard Laravel monolith. The feature is confined to the existing support-layer baseline truth objects, a small number of tenant-facing Filament widgets and pages, and focused Pest coverage. No new base directories or architectural layers are required beyond a shared compact-summary support seam inside app/Support/Baselines.

Implementation Strategy

Phase A — Establish One Shared Summary Truth Contract

Goal: Derive one reusable summary-state assessment from existing baseline compare truth and explanation layers so widgets and landing summaries stop improvising their own semantics.

Step	File	Change
A.1	`app/Support/Baselines/BaselineCompareStats.php`	Refactor or extend the compact-summary seam so covered surfaces can consume trustworthiness, evidence completeness, reason semantics, and result availability instead of findings counts only
A.2	`app/Support/Baselines/BaselineCompareExplanationRegistry.php` and an adjacent new support type	Introduce a shared summary assessment or presenter that maps stats plus explanation into summary state family, safe headline, tone, and next step
A.3	`app/Support/Baselines/BaselineCompareReasonCode.php` and reason translation seams if needed	Ensure positive claim eligibility and limited-confidence semantics stay aligned with current explanation-family and trustworthiness rules
A.4	Shared badge or UI support helpers if needed	Keep badge or tone selection centralized and avoid page-local success shortcuts

Phase B — Harden Tenant Dashboard Summary Surfaces

Goal: Remove the most dangerous false-calm claims from the tenant dashboard without breaking lazy loading or DB-only behavior, while keeping stale, failed, missing, in-progress, and unavailable compare states visibly distinct.

Step	File	Change
B.1	`app/Filament/Widgets/Dashboard/BaselineCompareNow.php`	Replace the findings-only widget payload with the shared summary assessment contract
B.2	`resources/views/filament/widgets/dashboard/baseline-compare-now.blade.php`	Replace `No open drift — baseline compliant` with contract-driven positive, cautionary, stale, unavailable, in-progress, or review-oriented states
B.3	`app/Filament/Widgets/Dashboard/NeedsAttention.php`	Feed healthy-check and attention-item generation from the shared summary contract so limited, stale, in-progress, unavailable, or incomplete compare results cannot fall through to `Everything looks healthy right now.`
B.4	`resources/views/filament/widgets/dashboard/needs-attention.blade.php`	Keep the widget compact while showing truthful caution and next-step language when compare evidence is limited
B.5	`app/Filament/Widgets/Dashboard/DashboardKpis.php`	Verify that KPI cards remain quantitative-only and do not imply stronger semantic claims than the shared contract allows

Phase C — Align Landing And Banner Surfaces With The Same Claim Guard

Goal: Ensure the Baseline Compare landing surface and findings-adjacent banner use the same claim-strength rules as the dashboard, including distinct stale, in-progress, and unavailable result handling.

Step	File	Change
C.1	`app/Filament/Pages/BaselineCompareLanding.php`	Expose the shared summary assessment to the Blade view alongside existing explanation and diagnostics payloads
C.2	`resources/views/filament/pages/baseline-compare-landing.blade.php`	Make the visible headline and zero-findings explanation obey the hardened positive-claim rules rather than findings count alone, including distinct stale, in-progress, and unavailable states
C.3	`app/Filament/Widgets/Tenant/BaselineCompareCoverageBanner.php`	Expand the banner trigger and text so evidence gaps and limited-confidence results can influence the summary, not only uncovered types or missing snapshots
C.4	`resources/views/filament/widgets/tenant/baseline-compare-coverage-banner.blade.php`	Preserve compact warning language while clearly distinguishing incomplete evidence, suppressed output, and baseline unavailability

Phase D — Keep Canonical Drilldown As The Semantic Ceiling

Goal: Preserve the operation-run detail surface as the deepest truth surface and ensure summary surfaces cannot out-claim it.

Step	File	Change
D.1	Existing baseline compare run-detail presentation seams	Verify that compact summary wording does not become stronger than current artifact-truth and operator-explanation wording on the run detail surface
D.2	Shared reason or explanation helpers if needed	Reuse the same explanation-family semantics across summary and detail instead of duplicating widget-only logic
D.3	No route or action change	Keep existing dashboard, banner, and landing drilldowns to `Compare now`, `View run`, and `Open findings` intact so limited states have a clear resolution path, and keep `Needs Attention` explicitly non-navigational if it exposes no existing drilldown

Phase E — Regression Protection And Focused Validation

Goal: Lock the summary truth contract into tests, including the dashboard false-calm case that currently passes as compliant.

Step	File	Change
E.1	`tests/Feature/Filament/BaselineCompareNowWidgetTest.php`	Replace the current compliant assertion with scenario coverage for trustworthy, limited-confidence, stale, failed, in-progress, and unavailable summary states
E.2	`tests/Feature/Filament/NeedsAttentionWidgetTest.php`	Cover `NeedsAttention` healthy-state fallback and evidence-gap-, stale-, in-progress-, and unavailable-driven caution on the dashboard
E.3	`tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php`, `tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php`, and `tests/Feature/Filament/BaselineCompareCoverageBannerTest.php`	Extend landing and banner assertions so zero findings plus limited evidence, stale history, or in-progress or unavailable compare state never becomes an all-clear claim
E.4	`tests/Feature/Baselines/BaselineCompareSummaryAssessmentTest.php` and adjacent explanation tests	Add or adjust support-layer assertions around positive-claim eligibility, stale-versus-not-ready distinction, and summary-state derivation
E.5	`tests/Feature/ReasonTranslation/ReasonTranslationExplanationTest.php`	Preserve reason-translation trust-impact and absence-pattern semantics for compact summary claims and deeper artifact-truth surfaces
E.6	`tests/Feature/Filament/BaselineCompareNowWidgetTest.php`, `tests/Feature/Filament/NeedsAttentionWidgetTest.php`, `tests/Feature/Filament/BaselineCompareCoverageBannerTest.php`, `tests/Feature/Filament/BaselineCompareLandingStartSurfaceTest.php`, and `tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php`	Preserve deny-as-not-found semantics, compare-now capability gating, dashboard, banner, and landing summary-to-run-detail or findings drilldown expectations, and the intentionally non-navigational `Needs Attention` behavior while summary wording changes
E.7	`tests/Feature/Filament/OperationRunBaselineTruthSurfaceTest.php` and `tests/Feature/Filament/TenantDashboardDbOnlyTest.php`	Preserve cross-surface semantic consistency, drilldown parity, and DB-only dashboard render behavior
E.8	`vendor/bin/sail bin pint --dirty --format agent` and focused Pest runs	Required formatting and targeted verification before implementation is considered complete

Key Design Decisions

BaselineCompareNow currently consumes BaselineCompareStats::forWidget(), which only knows counts, assignment, snapshot presence, and last compare time. That shortcut is too weak for trust propagation. The design therefore promotes a shared summary contract built from the richer compare truth and explanation seams.

D-002 — Zero findings is a count descriptor, not a governance verdict

The existing landing explanation layer already distinguishes trustworthy no-result, suppressed output, incomplete result, unavailable result, and blocked or missing inputs. The compact summary contract must preserve that distinction instead of translating 0 findings directly into baseline compliant.

D-003 — Dashboard healthy states are part of the truth surface, not decorative filler

NeedsAttention currently falls back to Everything looks healthy right now. whenever no high-severity findings, stale compare, failure, or active runs are present. That fallback is itself a semantic claim and must be driven by the shared compare summary contract.

D-004 — Coverage gaps and evidence gaps both qualify summary truth

The current coverage banner understands uncovered types and missing snapshots, but evidence-gap-driven incompleteness can still remain invisible. The plan therefore treats evidence gaps as first-class summary-limiting inputs even when coverage proof technically exists.

D-005 — KPI cards stay numeric; claim-bearing surfaces carry the semantic burden

The KPI cards can remain simple quantitative indicators so long as they do not add healthy or compliant phrasing. This keeps the plan focused on the surfaces that actually communicate reassurance.

D-006 — Stale and not-ready are separate operator states, not generic unavailability

The spec explicitly distinguishes empty, missing, failed, stale, and not-ready compare situations. The shared summary contract therefore must keep stale-history separate from the formal in_progress and unavailable cases so operators can tell whether they should rerun, wait, or inspect deeper evidence.

D-007 — Summary hardening must preserve guardrails and drilldowns, not just wording

Because the feature changes meaning on operator-facing surfaces, it must also preserve the existing landing guard contract: deny-as-not-found for non-members, capability-gated Compare now, and the current drilldown paths to landing, findings, and canonical run detail.

Risk Assessment

Risk	Impact	Likelihood	Mitigation
Shared summary contract becomes another parallel truth model	High	Medium	Derive it directly from existing stats plus operator explanation instead of inventing a new independent state machine
Dashboard widgets become too noisy or verbose	Medium	Medium	Use compact state families and one primary next step rather than dumping diagnostics into widgets
Landing and widget wording drift apart again over time	Medium	Medium	Centralize claim eligibility and state-family mapping in the shared support layer and cover it with tests
Evidence gaps over-trigger warnings and hide genuinely trustworthy no-drift states	Medium	Low	Keep positive claims allowed when trustworthiness is decision-grade and no material limitation is present
Summary hardening accidentally introduces extra queries or render-time side effects	Medium	Low	Reuse existing DB-only stats paths, preserve lazy widgets, and keep dashboard DB-only regression coverage

Test Strategy

Extend existing baseline compare feature and Livewire tests rather than introducing a new UI test harness.
Add explicit scenario coverage for trustworthy no-drift, limited-confidence zero-findings, incomplete evidence, stale compare history, failed compare, in-progress states, and unavailable no-result-yet or no-snapshot states.
Add at least one cross-surface consistency assertion ensuring a dashboard or banner summary is never more optimistic than the landing or canonical run detail for the same compare state.
Preserve and extend existing reason-translation assertions so compact summary claims reuse the same trust-impact and absence-pattern semantics as deeper artifact-truth surfaces.
Preserve existing compare-start and access assertions so the feature does not regress deny-as-not-found behavior, Compare now confirmation, capability gating, or summary-to-detail drilldown language.
Preserve TenantDashboardDbOnlyTest so dashboard hardening cannot introduce outbound HTTP or background work during render.
Run the minimum focused Pest subset through Sail for touched files and ask separately before running the full suite.

Complexity Tracking

No constitution violations or justified complexity exceptions were identified.

24 KiB Raw Blame History