## Summary - add a shared baseline compare summary assessment and assessor for compact trust propagation - harden dashboard, landing, and banner baseline compare surfaces against false all-clear claims - add focused Pest coverage for dashboard, landing, banner, reason translation, and canonical detail parity ## Validation - vendor/bin/sail bin pint --dirty --format agent - vendor/bin/sail artisan test --compact tests/Feature/Baselines/BaselineCompareSummaryAssessmentTest.php tests/Feature/Baselines/BaselineCompareExplanationFallbackTest.php tests/Feature/Filament/BaselineCompareNowWidgetTest.php tests/Feature/Filament/NeedsAttentionWidgetTest.php tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php tests/Feature/Filament/BaselineCompareCoverageBannerTest.php tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php tests/Feature/Filament/OperationRunBaselineTruthSurfaceTest.php tests/Feature/ReasonTranslation/ReasonTranslationExplanationTest.php ## Notes - Livewire compliance: Filament v5 / Livewire v4 stack unchanged - Provider registration: unchanged, Laravel 12 providers remain in bootstrap/providers.php - Global search: no searchable resource behavior changed - Destructive actions: none introduced by this change - Assets: no new assets registered; existing deploy process remains unchanged Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #196
24 KiB
Implementation Plan: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening
Branch: 165-baseline-summary-trust | Date: 2026-03-26 | Spec: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md
Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md
Summary
Harden all in-scope baseline and drift summary surfaces so that no compact widget, KPI-adjacent summary, banner, or landing headline can imply Compliant, No drift, or an equivalent all-clear unless the underlying compare result is genuinely trustworthy. The implementation will introduce a shared baseline summary-state contract derived from the existing baseline compare truth and explanation layers, replace findings-count shortcuts on the tenant dashboard, propagate evidence-gap and coverage limitations into summary claims, keep the landing surface and run drilldown semantically aligned, and lock the behavior down with focused Pest and Livewire coverage.
Key approach: reuse the current baseline domain seams already present in BaselineCompareStats, BaselineCompareExplanationRegistry, reason translation, and badge semantics; add one reusable summary assessment layer for compact surfaces; preserve the existing Compare now action, routes, and DB-only render behavior; and avoid any model, enum, or schema changes.
Technical Context
Language/Version: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4
Primary Dependencies: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing BaselineCompareStats, BaselineCompareExplanationRegistry, ReasonPresenter, BadgeCatalog or BadgeRenderer, UiEnforcement, and OperationRunLinks
Storage: PostgreSQL with existing baseline, findings, and operation_runs tables plus JSONB-backed compare context; no schema change planned
Testing: Pest feature tests, Livewire component tests, dashboard DB-only render regression, all executed through Sail
Target Platform: Laravel web application in Sail locally and containerized Linux deployment in staging and production
Project Type: Laravel monolith web application
Performance Goals: Keep dashboard and landing renders DB-only, preserve existing lazy-widget behavior, avoid new outbound HTTP or background dispatch during render, and keep summary claims understandable within one short scan of each surface
Constraints: No new database tables, no new outcome enums, no compare-engine rewrite, no route or RBAC drift, no new global assets, no dashboard summary more optimistic than landing or run detail, and no new ad hoc status-color language
Scale/Scope: Four primary summary surface families, one shared support-layer contract, one tenant landing page, one canonical drilldown alignment path, and focused regression coverage across trustworthy, limited, failed, missing, and evidence-gap-affected compare scenarios
Constitution Check
GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.
| Principle | Status | Notes |
|---|---|---|
| Inventory-first | Pass | No inventory, snapshot, or evidence ownership semantics change; the work is presentation hardening only |
| Read/write separation | Pass | No new mutation path is introduced; the feature remains read-only except for the already-existing guarded Compare now action |
| Graph contract path | Pass | No new Graph call path, contract-registry entry, or render-time network access is introduced |
| Deterministic capabilities | Pass | No new capability derivation; existing capability and tenant-view checks remain authoritative |
| RBAC-UX planes and 404 vs 403 | Pass | All covered surfaces stay in the tenant/admin plane except the existing canonical run drilldown, which keeps current tenant-safe access rules |
| Workspace isolation | Pass | No workspace-context broadening; tenant summary surfaces still require an established workspace context |
| Tenant isolation | Pass | Covered surfaces remain tenant-scoped, and canonical run drilldown remains entitlement-checked before revealing tenant-linked evidence |
| Destructive confirmation | Pass | No new destructive action; existing Compare now already uses confirmation and capability gating |
| Global search safety | Pass | No global-search behavior or searchable resource configuration changes are part of this feature |
| Run observability | Pass | Existing baseline compare OperationRun behavior remains unchanged; the feature only reads and interprets current run evidence |
| Ops-UX 3-surface feedback | Pass | No new toasts, progress surfaces, or terminal notifications are introduced |
| Ops-UX lifecycle ownership | Pass | OperationRun.status and OperationRun.outcome remain service-owned and untouched by this feature |
| Ops-UX summary counts | Pass | Existing summary_counts rules stay unchanged; the feature consumes result meaning rather than redefining counts |
| Ops-UX guards | Pass | Existing lifecycle guards remain intact; new tests will focus on summary truth and cross-surface consistency |
| Data minimization | Pass | No new secrets, raw Graph payloads, or low-level diagnostics are elevated into default-visible summaries |
| Badge semantics (BADGE-001) | Pass | Status tones and badges must continue to come from central badge or shared primitive semantics rather than page-local green or warning shortcuts |
| Filament-native UI (UI-FIL-001) | Pass | Widgets, banners, and landing summaries continue to use Filament sections, badges, links, and shared primitives rather than bespoke status components |
| UI naming (UI-NAMING-001) | Pass | Operator-facing copy remains domain-first and must avoid false-calming phrases when the result is not decision-grade |
| Operator surfaces (OPSURF-001) | Pass | The feature explicitly strengthens operator-first meaning by carrying governance result, evidence completeness, and next step into compact surfaces |
| Filament Action Surface Contract | Pass | Existing action inventory stays stable; only summary semantics and wording change |
| Filament UX-001 | Pass with documented variance | The landing page remains a custom enterprise layout rather than a stock infolist, but it still honors sectioning, centralized badges, and operator-first hierarchy |
| Filament v5 / Livewire v4 compliance | Pass | All work stays within the current Filament v5 and Livewire v4 stack |
| Provider registration location | Pass | No panel or provider changes are required; Laravel 11+ provider registration remains in bootstrap/providers.php |
| Global-search hard rule | Pass | No globally searchable resource is added or modified; no Edit/View-page requirement changes are triggered |
| Asset strategy | Pass | No new Filament assets are planned; deployment expectations for php artisan filament:assets remain unchanged because no asset registration changes are introduced |
Phase 0 Research
Research outcomes are captured in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/research.md.
Key decisions:
- Use one shared summary assessment derived from
BaselineCompareStatsandoperatorExplanation()instead of findings-count-only widget logic. - Treat the current
BaselineCompareNowsuccess pill and theNeedsAttentionhealthy state as the highest-risk false-calm surfaces and harden them first. - Propagate evidence gaps into summary semantics even when uncovered-types coverage warnings are absent.
- Keep KPI cards quantitative only; they may link to deeper surfaces but must not become semantic all-clear claims.
- Extend existing landing, widget, and baseline-truth tests rather than creating a separate UI harness.
Phase 1 Design
Design artifacts are created under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/:
data-model.md: derived summary-state entities, fields, and state-family rulescontracts/baseline-summary-surface.openapi.yaml: internal surface-contract schema for compact baseline compare claims across dashboard, landing, banner, and run drilldownquickstart.md: focused verification workflow for manual and automated validation
Design decisions:
- No schema migration is required; the design uses existing baseline compare stats, reason translation, operator explanation, findings, and operation-run evidence.
- The primary implementation seam is a new shared support-layer summary assessment in
app/Support/Baselines, consumed by dashboard widgets, the landing page, and any summary-adjacent banner or headline. - The existing
BaselineCompareStats::forWidget()shortcut is too lossy for trust propagation, so covered summary surfaces must consume either the richer tenant stats or a derived contract built from them. BaselineCompareNowandNeedsAttentionmust stop deriving healthy or compliant claims from zero findings alone.- The coverage banner must consider evidence gaps as summary-limiting signals, not only uncovered policy types and missing snapshots.
- Canonical run detail remains the deepest truth surface and becomes the semantic ceiling: compact surfaces may be equally cautious or more cautious, never more optimistic.
Project Structure
Documentation (this feature)
specs/165-baseline-summary-trust/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│ └── baseline-summary-surface.openapi.yaml
├── checklists/
│ └── requirements.md
└── tasks.md
Source Code (repository root)
app/
├── Filament/
│ ├── Pages/
│ │ └── BaselineCompareLanding.php
│ └── Widgets/
│ ├── Dashboard/
│ │ ├── BaselineCompareNow.php
│ │ ├── DashboardKpis.php
│ │ └── NeedsAttention.php
│ └── Tenant/
│ └── BaselineCompareCoverageBanner.php
├── Support/
│ ├── Baselines/
│ │ ├── BaselineCompareStats.php
│ │ ├── BaselineCompareExplanationRegistry.php
│ │ ├── BaselineCompareEvidenceGapDetails.php
│ │ └── BaselineCompareReasonCode.php
│ ├── Badges/
│ │ ├── BadgeCatalog.php
│ │ └── BadgeRenderer.php
│ └── ReasonTranslation/
│ └── ReasonTranslator.php
resources/
└── views/
└── filament/
├── pages/
│ └── baseline-compare-landing.blade.php
└── widgets/
├── dashboard/
│ ├── baseline-compare-now.blade.php
│ └── needs-attention.blade.php
└── tenant/
└── baseline-compare-coverage-banner.blade.php
tests/
├── Feature/
│ ├── Baselines/
│ │ ├── BaselineCompareStatsTest.php
│ │ ├── BaselineCompareSummaryAssessmentTest.php
│ │ ├── BaselineCompareExplanationFallbackTest.php
│ │ └── BaselineCompareWhyNoFindingsReasonCodeTest.php
│ ├── ReasonTranslation/
│ │ └── ReasonTranslationExplanationTest.php
│ └── Filament/
│ ├── BaselineCompareCoverageBannerTest.php
│ ├── BaselineCompareExplanationSurfaceTest.php
│ ├── BaselineCompareLandingWhyNoFindingsTest.php
│ ├── BaselineCompareLandingStartSurfaceTest.php
│ ├── BaselineCompareNowWidgetTest.php
│ ├── BaselineCompareSummaryConsistencyTest.php
│ ├── NeedsAttentionWidgetTest.php
│ ├── OperationRunBaselineTruthSurfaceTest.php
│ └── TenantDashboardDbOnlyTest.php
Structure Decision: Standard Laravel monolith. The feature is confined to the existing support-layer baseline truth objects, a small number of tenant-facing Filament widgets and pages, and focused Pest coverage. No new base directories or architectural layers are required beyond a shared compact-summary support seam inside app/Support/Baselines.
Implementation Strategy
Phase A — Establish One Shared Summary Truth Contract
Goal: Derive one reusable summary-state assessment from existing baseline compare truth and explanation layers so widgets and landing summaries stop improvising their own semantics.
| Step | File | Change |
|---|---|---|
| A.1 | app/Support/Baselines/BaselineCompareStats.php |
Refactor or extend the compact-summary seam so covered surfaces can consume trustworthiness, evidence completeness, reason semantics, and result availability instead of findings counts only |
| A.2 | app/Support/Baselines/BaselineCompareExplanationRegistry.php and an adjacent new support type |
Introduce a shared summary assessment or presenter that maps stats plus explanation into summary state family, safe headline, tone, and next step |
| A.3 | app/Support/Baselines/BaselineCompareReasonCode.php and reason translation seams if needed |
Ensure positive claim eligibility and limited-confidence semantics stay aligned with current explanation-family and trustworthiness rules |
| A.4 | Shared badge or UI support helpers if needed | Keep badge or tone selection centralized and avoid page-local success shortcuts |
Phase B — Harden Tenant Dashboard Summary Surfaces
Goal: Remove the most dangerous false-calm claims from the tenant dashboard without breaking lazy loading or DB-only behavior, while keeping stale, failed, missing, in-progress, and unavailable compare states visibly distinct.
| Step | File | Change |
|---|---|---|
| B.1 | app/Filament/Widgets/Dashboard/BaselineCompareNow.php |
Replace the findings-only widget payload with the shared summary assessment contract |
| B.2 | resources/views/filament/widgets/dashboard/baseline-compare-now.blade.php |
Replace No open drift — baseline compliant with contract-driven positive, cautionary, stale, unavailable, in-progress, or review-oriented states |
| B.3 | app/Filament/Widgets/Dashboard/NeedsAttention.php |
Feed healthy-check and attention-item generation from the shared summary contract so limited, stale, in-progress, unavailable, or incomplete compare results cannot fall through to Everything looks healthy right now. |
| B.4 | resources/views/filament/widgets/dashboard/needs-attention.blade.php |
Keep the widget compact while showing truthful caution and next-step language when compare evidence is limited |
| B.5 | app/Filament/Widgets/Dashboard/DashboardKpis.php |
Verify that KPI cards remain quantitative-only and do not imply stronger semantic claims than the shared contract allows |
Phase C — Align Landing And Banner Surfaces With The Same Claim Guard
Goal: Ensure the Baseline Compare landing surface and findings-adjacent banner use the same claim-strength rules as the dashboard, including distinct stale, in-progress, and unavailable result handling.
| Step | File | Change |
|---|---|---|
| C.1 | app/Filament/Pages/BaselineCompareLanding.php |
Expose the shared summary assessment to the Blade view alongside existing explanation and diagnostics payloads |
| C.2 | resources/views/filament/pages/baseline-compare-landing.blade.php |
Make the visible headline and zero-findings explanation obey the hardened positive-claim rules rather than findings count alone, including distinct stale, in-progress, and unavailable states |
| C.3 | app/Filament/Widgets/Tenant/BaselineCompareCoverageBanner.php |
Expand the banner trigger and text so evidence gaps and limited-confidence results can influence the summary, not only uncovered types or missing snapshots |
| C.4 | resources/views/filament/widgets/tenant/baseline-compare-coverage-banner.blade.php |
Preserve compact warning language while clearly distinguishing incomplete evidence, suppressed output, and baseline unavailability |
Phase D — Keep Canonical Drilldown As The Semantic Ceiling
Goal: Preserve the operation-run detail surface as the deepest truth surface and ensure summary surfaces cannot out-claim it.
| Step | File | Change |
|---|---|---|
| D.1 | Existing baseline compare run-detail presentation seams | Verify that compact summary wording does not become stronger than current artifact-truth and operator-explanation wording on the run detail surface |
| D.2 | Shared reason or explanation helpers if needed | Reuse the same explanation-family semantics across summary and detail instead of duplicating widget-only logic |
| D.3 | No route or action change | Keep existing dashboard, banner, and landing drilldowns to Compare now, View run, and Open findings intact so limited states have a clear resolution path, and keep Needs Attention explicitly non-navigational if it exposes no existing drilldown |
Phase E — Regression Protection And Focused Validation
Goal: Lock the summary truth contract into tests, including the dashboard false-calm case that currently passes as compliant.
| Step | File | Change |
|---|---|---|
| E.1 | tests/Feature/Filament/BaselineCompareNowWidgetTest.php |
Replace the current compliant assertion with scenario coverage for trustworthy, limited-confidence, stale, failed, in-progress, and unavailable summary states |
| E.2 | tests/Feature/Filament/NeedsAttentionWidgetTest.php |
Cover NeedsAttention healthy-state fallback and evidence-gap-, stale-, in-progress-, and unavailable-driven caution on the dashboard |
| E.3 | tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php, tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php, and tests/Feature/Filament/BaselineCompareCoverageBannerTest.php |
Extend landing and banner assertions so zero findings plus limited evidence, stale history, or in-progress or unavailable compare state never becomes an all-clear claim |
| E.4 | tests/Feature/Baselines/BaselineCompareSummaryAssessmentTest.php and adjacent explanation tests |
Add or adjust support-layer assertions around positive-claim eligibility, stale-versus-not-ready distinction, and summary-state derivation |
| E.5 | tests/Feature/ReasonTranslation/ReasonTranslationExplanationTest.php |
Preserve reason-translation trust-impact and absence-pattern semantics for compact summary claims and deeper artifact-truth surfaces |
| E.6 | tests/Feature/Filament/BaselineCompareNowWidgetTest.php, tests/Feature/Filament/NeedsAttentionWidgetTest.php, tests/Feature/Filament/BaselineCompareCoverageBannerTest.php, tests/Feature/Filament/BaselineCompareLandingStartSurfaceTest.php, and tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php |
Preserve deny-as-not-found semantics, compare-now capability gating, dashboard, banner, and landing summary-to-run-detail or findings drilldown expectations, and the intentionally non-navigational Needs Attention behavior while summary wording changes |
| E.7 | tests/Feature/Filament/OperationRunBaselineTruthSurfaceTest.php and tests/Feature/Filament/TenantDashboardDbOnlyTest.php |
Preserve cross-surface semantic consistency, drilldown parity, and DB-only dashboard render behavior |
| E.8 | vendor/bin/sail bin pint --dirty --format agent and focused Pest runs |
Required formatting and targeted verification before implementation is considered complete |
Key Design Decisions
D-001 — The summary contract must originate from the truth layer, not from widget-local counts
BaselineCompareNow currently consumes BaselineCompareStats::forWidget(), which only knows counts, assignment, snapshot presence, and last compare time. That shortcut is too weak for trust propagation. The design therefore promotes a shared summary contract built from the richer compare truth and explanation seams.
D-002 — Zero findings is a count descriptor, not a governance verdict
The existing landing explanation layer already distinguishes trustworthy no-result, suppressed output, incomplete result, unavailable result, and blocked or missing inputs. The compact summary contract must preserve that distinction instead of translating 0 findings directly into baseline compliant.
D-003 — Dashboard healthy states are part of the truth surface, not decorative filler
NeedsAttention currently falls back to Everything looks healthy right now. whenever no high-severity findings, stale compare, failure, or active runs are present. That fallback is itself a semantic claim and must be driven by the shared compare summary contract.
D-004 — Coverage gaps and evidence gaps both qualify summary truth
The current coverage banner understands uncovered types and missing snapshots, but evidence-gap-driven incompleteness can still remain invisible. The plan therefore treats evidence gaps as first-class summary-limiting inputs even when coverage proof technically exists.
D-005 — KPI cards stay numeric; claim-bearing surfaces carry the semantic burden
The KPI cards can remain simple quantitative indicators so long as they do not add healthy or compliant phrasing. This keeps the plan focused on the surfaces that actually communicate reassurance.
D-006 — Stale and not-ready are separate operator states, not generic unavailability
The spec explicitly distinguishes empty, missing, failed, stale, and not-ready compare situations. The shared summary contract therefore must keep stale-history separate from the formal in_progress and unavailable cases so operators can tell whether they should rerun, wait, or inspect deeper evidence.
D-007 — Summary hardening must preserve guardrails and drilldowns, not just wording
Because the feature changes meaning on operator-facing surfaces, it must also preserve the existing landing guard contract: deny-as-not-found for non-members, capability-gated Compare now, and the current drilldown paths to landing, findings, and canonical run detail.
Risk Assessment
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| Shared summary contract becomes another parallel truth model | High | Medium | Derive it directly from existing stats plus operator explanation instead of inventing a new independent state machine |
| Dashboard widgets become too noisy or verbose | Medium | Medium | Use compact state families and one primary next step rather than dumping diagnostics into widgets |
| Landing and widget wording drift apart again over time | Medium | Medium | Centralize claim eligibility and state-family mapping in the shared support layer and cover it with tests |
| Evidence gaps over-trigger warnings and hide genuinely trustworthy no-drift states | Medium | Low | Keep positive claims allowed when trustworthiness is decision-grade and no material limitation is present |
| Summary hardening accidentally introduces extra queries or render-time side effects | Medium | Low | Reuse existing DB-only stats paths, preserve lazy widgets, and keep dashboard DB-only regression coverage |
Test Strategy
- Extend existing baseline compare feature and Livewire tests rather than introducing a new UI test harness.
- Add explicit scenario coverage for trustworthy no-drift, limited-confidence zero-findings, incomplete evidence, stale compare history, failed compare, in-progress states, and unavailable no-result-yet or no-snapshot states.
- Add at least one cross-surface consistency assertion ensuring a dashboard or banner summary is never more optimistic than the landing or canonical run detail for the same compare state.
- Preserve and extend existing reason-translation assertions so compact summary claims reuse the same trust-impact and absence-pattern semantics as deeper artifact-truth surfaces.
- Preserve existing compare-start and access assertions so the feature does not regress deny-as-not-found behavior,
Compare nowconfirmation, capability gating, or summary-to-detail drilldown language. - Preserve
TenantDashboardDbOnlyTestso dashboard hardening cannot introduce outbound HTTP or background work during render. - Run the minimum focused Pest subset through Sail for touched files and ask separately before running the full suite.
Complexity Tracking
No constitution violations or justified complexity exceptions were identified.