# Implementation Plan: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening

**Branch**: `165-baseline-summary-trust` | **Date**: 2026-03-26 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md`
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/spec.md`

## Summary

Harden all in-scope baseline and drift summary surfaces so that no compact widget, KPI-adjacent summary, banner, or landing headline can imply `Compliant`, `No drift`, or an equivalent all-clear unless the underlying compare result is genuinely trustworthy. The implementation will introduce a shared baseline summary-state contract derived from the existing baseline compare truth and explanation layers, replace findings-count shortcuts on the tenant dashboard, propagate evidence-gap and coverage limitations into summary claims, keep the landing surface and run drilldown semantically aligned, and lock the behavior down with focused Pest and Livewire coverage.

Key approach: reuse the current baseline domain seams already present in `BaselineCompareStats`, `BaselineCompareExplanationRegistry`, reason translation, and badge semantics; add one reusable summary assessment layer for compact surfaces; preserve the existing `Compare now` action, routes, and DB-only render behavior; and avoid any model, enum, or schema changes.

## Technical Context

**Language/Version**: PHP 8.4, Laravel 12, Blade, Filament v5, Livewire v4  
**Primary Dependencies**: Filament v5, Livewire v4, Pest v4, Laravel Sail, existing `BaselineCompareStats`, `BaselineCompareExplanationRegistry`, `ReasonPresenter`, `BadgeCatalog` or `BadgeRenderer`, `UiEnforcement`, and `OperationRunLinks`  
**Storage**: PostgreSQL with existing baseline, findings, and `operation_runs` tables plus JSONB-backed compare context; no schema change planned  
**Testing**: Pest feature tests, Livewire component tests, dashboard DB-only render regression, all executed through Sail  
**Target Platform**: Laravel web application in Sail locally and containerized Linux deployment in staging and production
**Project Type**: Laravel monolith web application  
**Performance Goals**: Keep dashboard and landing renders DB-only, preserve existing lazy-widget behavior, avoid new outbound HTTP or background dispatch during render, and keep summary claims understandable within one short scan of each surface  
**Constraints**: No new database tables, no new outcome enums, no compare-engine rewrite, no route or RBAC drift, no new global assets, no dashboard summary more optimistic than landing or run detail, and no new ad hoc status-color language  
**Scale/Scope**: Four primary summary surface families, one shared support-layer contract, one tenant landing page, one canonical drilldown alignment path, and focused regression coverage across trustworthy, limited, failed, missing, and evidence-gap-affected compare scenarios

## Constitution Check

*GATE: Passed before Phase 0 research. Re-checked after Phase 1 design and still passing.*

| Principle | Status | Notes |
|-----------|--------|-------|
| Inventory-first | Pass | No inventory, snapshot, or evidence ownership semantics change; the work is presentation hardening only |
| Read/write separation | Pass | No new mutation path is introduced; the feature remains read-only except for the already-existing guarded `Compare now` action |
| Graph contract path | Pass | No new Graph call path, contract-registry entry, or render-time network access is introduced |
| Deterministic capabilities | Pass | No new capability derivation; existing capability and tenant-view checks remain authoritative |
| RBAC-UX planes and 404 vs 403 | Pass | All covered surfaces stay in the tenant/admin plane except the existing canonical run drilldown, which keeps current tenant-safe access rules |
| Workspace isolation | Pass | No workspace-context broadening; tenant summary surfaces still require an established workspace context |
| Tenant isolation | Pass | Covered surfaces remain tenant-scoped, and canonical run drilldown remains entitlement-checked before revealing tenant-linked evidence |
| Destructive confirmation | Pass | No new destructive action; existing `Compare now` already uses confirmation and capability gating |
| Global search safety | Pass | No global-search behavior or searchable resource configuration changes are part of this feature |
| Run observability | Pass | Existing baseline compare `OperationRun` behavior remains unchanged; the feature only reads and interprets current run evidence |
| Ops-UX 3-surface feedback | Pass | No new toasts, progress surfaces, or terminal notifications are introduced |
| Ops-UX lifecycle ownership | Pass | `OperationRun.status` and `OperationRun.outcome` remain service-owned and untouched by this feature |
| Ops-UX summary counts | Pass | Existing `summary_counts` rules stay unchanged; the feature consumes result meaning rather than redefining counts |
| Ops-UX guards | Pass | Existing lifecycle guards remain intact; new tests will focus on summary truth and cross-surface consistency |
| Data minimization | Pass | No new secrets, raw Graph payloads, or low-level diagnostics are elevated into default-visible summaries |
| Badge semantics (BADGE-001) | Pass | Status tones and badges must continue to come from central badge or shared primitive semantics rather than page-local green or warning shortcuts |
| Filament-native UI (UI-FIL-001) | Pass | Widgets, banners, and landing summaries continue to use Filament sections, badges, links, and shared primitives rather than bespoke status components |
| UI naming (UI-NAMING-001) | Pass | Operator-facing copy remains domain-first and must avoid false-calming phrases when the result is not decision-grade |
| Operator surfaces (OPSURF-001) | Pass | The feature explicitly strengthens operator-first meaning by carrying governance result, evidence completeness, and next step into compact surfaces |
| Filament Action Surface Contract | Pass | Existing action inventory stays stable; only summary semantics and wording change |
| Filament UX-001 | Pass with documented variance | The landing page remains a custom enterprise layout rather than a stock infolist, but it still honors sectioning, centralized badges, and operator-first hierarchy |
| Filament v5 / Livewire v4 compliance | Pass | All work stays within the current Filament v5 and Livewire v4 stack |
| Provider registration location | Pass | No panel or provider changes are required; Laravel 11+ provider registration remains in `bootstrap/providers.php` |
| Global-search hard rule | Pass | No globally searchable resource is added or modified; no Edit/View-page requirement changes are triggered |
| Asset strategy | Pass | No new Filament assets are planned; deployment expectations for `php artisan filament:assets` remain unchanged because no asset registration changes are introduced |

## Phase 0 Research

Research outcomes are captured in `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/research.md`.

Key decisions:

- Use one shared summary assessment derived from `BaselineCompareStats` and `operatorExplanation()` instead of findings-count-only widget logic.
- Treat the current `BaselineCompareNow` success pill and the `NeedsAttention` healthy state as the highest-risk false-calm surfaces and harden them first.
- Propagate evidence gaps into summary semantics even when uncovered-types coverage warnings are absent.
- Keep KPI cards quantitative only; they may link to deeper surfaces but must not become semantic all-clear claims.
- Extend existing landing, widget, and baseline-truth tests rather than creating a separate UI harness.

## Phase 1 Design

Design artifacts are created under `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/165-baseline-summary-trust/`:

- `data-model.md`: derived summary-state entities, fields, and state-family rules
- `contracts/baseline-summary-surface.openapi.yaml`: internal surface-contract schema for compact baseline compare claims across dashboard, landing, banner, and run drilldown
- `quickstart.md`: focused verification workflow for manual and automated validation

Design decisions:

- No schema migration is required; the design uses existing baseline compare stats, reason translation, operator explanation, findings, and operation-run evidence.
- The primary implementation seam is a new shared support-layer summary assessment in `app/Support/Baselines`, consumed by dashboard widgets, the landing page, and any summary-adjacent banner or headline.
- The existing `BaselineCompareStats::forWidget()` shortcut is too lossy for trust propagation, so covered summary surfaces must consume either the richer tenant stats or a derived contract built from them.
- `BaselineCompareNow` and `NeedsAttention` must stop deriving healthy or compliant claims from zero findings alone.
- The coverage banner must consider evidence gaps as summary-limiting signals, not only uncovered policy types and missing snapshots.
- Canonical run detail remains the deepest truth surface and becomes the semantic ceiling: compact surfaces may be equally cautious or more cautious, never more optimistic.

## Project Structure

### Documentation (this feature)

```text
specs/165-baseline-summary-trust/
├── spec.md
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   └── baseline-summary-surface.openapi.yaml
├── checklists/
│   └── requirements.md
└── tasks.md
```

### Source Code (repository root)

```text
app/
├── Filament/
│   ├── Pages/
│   │   └── BaselineCompareLanding.php
│   └── Widgets/
│       ├── Dashboard/
│       │   ├── BaselineCompareNow.php
│       │   ├── DashboardKpis.php
│       │   └── NeedsAttention.php
│       └── Tenant/
│           └── BaselineCompareCoverageBanner.php
├── Support/
│   ├── Baselines/
│   │   ├── BaselineCompareStats.php
│   │   ├── BaselineCompareExplanationRegistry.php
│   │   ├── BaselineCompareEvidenceGapDetails.php
│   │   └── BaselineCompareReasonCode.php
│   ├── Badges/
│   │   ├── BadgeCatalog.php
│   │   └── BadgeRenderer.php
│   └── ReasonTranslation/
│       └── ReasonTranslator.php

resources/
└── views/
    └── filament/
        ├── pages/
        │   └── baseline-compare-landing.blade.php
        └── widgets/
            ├── dashboard/
            │   ├── baseline-compare-now.blade.php
            │   └── needs-attention.blade.php
            └── tenant/
                └── baseline-compare-coverage-banner.blade.php

tests/
├── Feature/
│   ├── Baselines/
│   │   ├── BaselineCompareStatsTest.php
│   │   ├── BaselineCompareSummaryAssessmentTest.php
│   │   ├── BaselineCompareExplanationFallbackTest.php
│   │   └── BaselineCompareWhyNoFindingsReasonCodeTest.php
│   ├── ReasonTranslation/
│   │   └── ReasonTranslationExplanationTest.php
│   └── Filament/
│       ├── BaselineCompareCoverageBannerTest.php
│       ├── BaselineCompareExplanationSurfaceTest.php
│       ├── BaselineCompareLandingWhyNoFindingsTest.php
│       ├── BaselineCompareLandingStartSurfaceTest.php
│       ├── BaselineCompareNowWidgetTest.php
│       ├── BaselineCompareSummaryConsistencyTest.php
│       ├── NeedsAttentionWidgetTest.php
│       ├── OperationRunBaselineTruthSurfaceTest.php
│       └── TenantDashboardDbOnlyTest.php
```

**Structure Decision**: Standard Laravel monolith. The feature is confined to the existing support-layer baseline truth objects, a small number of tenant-facing Filament widgets and pages, and focused Pest coverage. No new base directories or architectural layers are required beyond a shared compact-summary support seam inside `app/Support/Baselines`.

## Implementation Strategy

### Phase A — Establish One Shared Summary Truth Contract

**Goal**: Derive one reusable summary-state assessment from existing baseline compare truth and explanation layers so widgets and landing summaries stop improvising their own semantics.

| Step | File | Change |
|------|------|--------|
| A.1 | `app/Support/Baselines/BaselineCompareStats.php` | Refactor or extend the compact-summary seam so covered surfaces can consume trustworthiness, evidence completeness, reason semantics, and result availability instead of findings counts only |
| A.2 | `app/Support/Baselines/BaselineCompareExplanationRegistry.php` and an adjacent new support type | Introduce a shared summary assessment or presenter that maps stats plus explanation into summary state family, safe headline, tone, and next step |
| A.3 | `app/Support/Baselines/BaselineCompareReasonCode.php` and reason translation seams if needed | Ensure positive claim eligibility and limited-confidence semantics stay aligned with current explanation-family and trustworthiness rules |
| A.4 | Shared badge or UI support helpers if needed | Keep badge or tone selection centralized and avoid page-local success shortcuts |

### Phase B — Harden Tenant Dashboard Summary Surfaces

**Goal**: Remove the most dangerous false-calm claims from the tenant dashboard without breaking lazy loading or DB-only behavior, while keeping stale, failed, missing, in-progress, and unavailable compare states visibly distinct.

| Step | File | Change |
|------|------|--------|
| B.1 | `app/Filament/Widgets/Dashboard/BaselineCompareNow.php` | Replace the findings-only widget payload with the shared summary assessment contract |
| B.2 | `resources/views/filament/widgets/dashboard/baseline-compare-now.blade.php` | Replace `No open drift — baseline compliant` with contract-driven positive, cautionary, stale, unavailable, in-progress, or review-oriented states |
| B.3 | `app/Filament/Widgets/Dashboard/NeedsAttention.php` | Feed healthy-check and attention-item generation from the shared summary contract so limited, stale, in-progress, unavailable, or incomplete compare results cannot fall through to `Everything looks healthy right now.` |
| B.4 | `resources/views/filament/widgets/dashboard/needs-attention.blade.php` | Keep the widget compact while showing truthful caution and next-step language when compare evidence is limited |
| B.5 | `app/Filament/Widgets/Dashboard/DashboardKpis.php` | Verify that KPI cards remain quantitative-only and do not imply stronger semantic claims than the shared contract allows |

### Phase C — Align Landing And Banner Surfaces With The Same Claim Guard

**Goal**: Ensure the Baseline Compare landing surface and findings-adjacent banner use the same claim-strength rules as the dashboard, including distinct stale, in-progress, and unavailable result handling.

| Step | File | Change |
|------|------|--------|
| C.1 | `app/Filament/Pages/BaselineCompareLanding.php` | Expose the shared summary assessment to the Blade view alongside existing explanation and diagnostics payloads |
| C.2 | `resources/views/filament/pages/baseline-compare-landing.blade.php` | Make the visible headline and zero-findings explanation obey the hardened positive-claim rules rather than findings count alone, including distinct stale, in-progress, and unavailable states |
| C.3 | `app/Filament/Widgets/Tenant/BaselineCompareCoverageBanner.php` | Expand the banner trigger and text so evidence gaps and limited-confidence results can influence the summary, not only uncovered types or missing snapshots |
| C.4 | `resources/views/filament/widgets/tenant/baseline-compare-coverage-banner.blade.php` | Preserve compact warning language while clearly distinguishing incomplete evidence, suppressed output, and baseline unavailability |

### Phase D — Keep Canonical Drilldown As The Semantic Ceiling

**Goal**: Preserve the operation-run detail surface as the deepest truth surface and ensure summary surfaces cannot out-claim it.

| Step | File | Change |
|------|------|--------|
| D.1 | Existing baseline compare run-detail presentation seams | Verify that compact summary wording does not become stronger than current artifact-truth and operator-explanation wording on the run detail surface |
| D.2 | Shared reason or explanation helpers if needed | Reuse the same explanation-family semantics across summary and detail instead of duplicating widget-only logic |
| D.3 | No route or action change | Keep existing dashboard, banner, and landing drilldowns to `Compare now`, `View run`, and `Open findings` intact so limited states have a clear resolution path, and keep `Needs Attention` explicitly non-navigational if it exposes no existing drilldown |

### Phase E — Regression Protection And Focused Validation

**Goal**: Lock the summary truth contract into tests, including the dashboard false-calm case that currently passes as compliant.

| Step | File | Change |
|------|------|--------|
| E.1 | `tests/Feature/Filament/BaselineCompareNowWidgetTest.php` | Replace the current compliant assertion with scenario coverage for trustworthy, limited-confidence, stale, failed, in-progress, and unavailable summary states |
| E.2 | `tests/Feature/Filament/NeedsAttentionWidgetTest.php` | Cover `NeedsAttention` healthy-state fallback and evidence-gap-, stale-, in-progress-, and unavailable-driven caution on the dashboard |
| E.3 | `tests/Feature/Filament/BaselineCompareExplanationSurfaceTest.php`, `tests/Feature/Filament/BaselineCompareLandingWhyNoFindingsTest.php`, and `tests/Feature/Filament/BaselineCompareCoverageBannerTest.php` | Extend landing and banner assertions so zero findings plus limited evidence, stale history, or in-progress or unavailable compare state never becomes an all-clear claim |
| E.4 | `tests/Feature/Baselines/BaselineCompareSummaryAssessmentTest.php` and adjacent explanation tests | Add or adjust support-layer assertions around positive-claim eligibility, stale-versus-not-ready distinction, and summary-state derivation |
| E.5 | `tests/Feature/ReasonTranslation/ReasonTranslationExplanationTest.php` | Preserve reason-translation trust-impact and absence-pattern semantics for compact summary claims and deeper artifact-truth surfaces |
| E.6 | `tests/Feature/Filament/BaselineCompareNowWidgetTest.php`, `tests/Feature/Filament/NeedsAttentionWidgetTest.php`, `tests/Feature/Filament/BaselineCompareCoverageBannerTest.php`, `tests/Feature/Filament/BaselineCompareLandingStartSurfaceTest.php`, and `tests/Feature/Filament/BaselineCompareSummaryConsistencyTest.php` | Preserve deny-as-not-found semantics, compare-now capability gating, dashboard, banner, and landing summary-to-run-detail or findings drilldown expectations, and the intentionally non-navigational `Needs Attention` behavior while summary wording changes |
| E.7 | `tests/Feature/Filament/OperationRunBaselineTruthSurfaceTest.php` and `tests/Feature/Filament/TenantDashboardDbOnlyTest.php` | Preserve cross-surface semantic consistency, drilldown parity, and DB-only dashboard render behavior |
| E.8 | `vendor/bin/sail bin pint --dirty --format agent` and focused Pest runs | Required formatting and targeted verification before implementation is considered complete |

## Key Design Decisions

### D-001 — The summary contract must originate from the truth layer, not from widget-local counts

`BaselineCompareNow` currently consumes `BaselineCompareStats::forWidget()`, which only knows counts, assignment, snapshot presence, and last compare time. That shortcut is too weak for trust propagation. The design therefore promotes a shared summary contract built from the richer compare truth and explanation seams.

### D-002 — Zero findings is a count descriptor, not a governance verdict

The existing landing explanation layer already distinguishes trustworthy no-result, suppressed output, incomplete result, unavailable result, and blocked or missing inputs. The compact summary contract must preserve that distinction instead of translating `0 findings` directly into `baseline compliant`.

### D-003 — Dashboard healthy states are part of the truth surface, not decorative filler

`NeedsAttention` currently falls back to `Everything looks healthy right now.` whenever no high-severity findings, stale compare, failure, or active runs are present. That fallback is itself a semantic claim and must be driven by the shared compare summary contract.

### D-004 — Coverage gaps and evidence gaps both qualify summary truth

The current coverage banner understands uncovered types and missing snapshots, but evidence-gap-driven incompleteness can still remain invisible. The plan therefore treats evidence gaps as first-class summary-limiting inputs even when coverage proof technically exists.

### D-005 — KPI cards stay numeric; claim-bearing surfaces carry the semantic burden

The KPI cards can remain simple quantitative indicators so long as they do not add healthy or compliant phrasing. This keeps the plan focused on the surfaces that actually communicate reassurance.

### D-006 — Stale and not-ready are separate operator states, not generic unavailability

The spec explicitly distinguishes empty, missing, failed, stale, and not-ready compare situations. The shared summary contract therefore must keep stale-history separate from the formal `in_progress` and `unavailable` cases so operators can tell whether they should rerun, wait, or inspect deeper evidence.

### D-007 — Summary hardening must preserve guardrails and drilldowns, not just wording

Because the feature changes meaning on operator-facing surfaces, it must also preserve the existing landing guard contract: deny-as-not-found for non-members, capability-gated `Compare now`, and the current drilldown paths to landing, findings, and canonical run detail.

## Risk Assessment

| Risk | Impact | Likelihood | Mitigation |
|------|--------|------------|------------|
| Shared summary contract becomes another parallel truth model | High | Medium | Derive it directly from existing stats plus operator explanation instead of inventing a new independent state machine |
| Dashboard widgets become too noisy or verbose | Medium | Medium | Use compact state families and one primary next step rather than dumping diagnostics into widgets |
| Landing and widget wording drift apart again over time | Medium | Medium | Centralize claim eligibility and state-family mapping in the shared support layer and cover it with tests |
| Evidence gaps over-trigger warnings and hide genuinely trustworthy no-drift states | Medium | Low | Keep positive claims allowed when trustworthiness is decision-grade and no material limitation is present |
| Summary hardening accidentally introduces extra queries or render-time side effects | Medium | Low | Reuse existing DB-only stats paths, preserve lazy widgets, and keep dashboard DB-only regression coverage |

## Test Strategy

- Extend existing baseline compare feature and Livewire tests rather than introducing a new UI test harness.
- Add explicit scenario coverage for trustworthy no-drift, limited-confidence zero-findings, incomplete evidence, stale compare history, failed compare, in-progress states, and unavailable no-result-yet or no-snapshot states.
- Add at least one cross-surface consistency assertion ensuring a dashboard or banner summary is never more optimistic than the landing or canonical run detail for the same compare state.
- Preserve and extend existing reason-translation assertions so compact summary claims reuse the same trust-impact and absence-pattern semantics as deeper artifact-truth surfaces.
- Preserve existing compare-start and access assertions so the feature does not regress deny-as-not-found behavior, `Compare now` confirmation, capability gating, or summary-to-detail drilldown language.
- Preserve `TenantDashboardDbOnlyTest` so dashboard hardening cannot introduce outbound HTTP or background work during render.
- Run the minimum focused Pest subset through Sail for touched files and ask separately before running the full suite.

## Complexity Tracking

No constitution violations or justified complexity exceptions were identified.