Research: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening

Decision 1: Derive compact summary claims from existing compare truth and explanation seams

Decision: Build the compact summary contract from BaselineCompareStats plus operatorExplanation() rather than from findings counts or widget-local conditions.
Rationale: The current landing surface already understands explanation family, trustworthiness, coverage statements, reliability statements, reason codes, and next steps. The widget path is currently too lossy because BaselineCompareStats::forWidget() collapses the compare state into assignment, snapshot, counts, and last-compare timing. Reusing the richer truth layer ensures summary surfaces do not invent a stronger meaning than the deeper surfaces already carry.
Alternatives considered:
- Patch each widget with bespoke if conditions around findingsCount, coverageStatus, and evidenceGapsCount. Rejected because that would create another parallel truth model and would drift from the explanation layer over time.
- Re-architect compare persistence or introduce new result enums. Rejected because the spec explicitly rules out backend or model rewrites and the current truth signals already exist.

Decision: Positive all-clear wording is allowed only when the shared summary contract marks the compare result as trustworthy and free from material evidence or coverage limitations.
Rationale: Existing baseline compare explanation logic already distinguishes trustworthy no-result, completed-but-limited, suppressed output, unavailable, and blocked states. The specific false-calm bug is that compact summaries translate 0 findings into baseline compliant even when the reason code, coverage proof, or evidence gaps make that interpretation unsafe.
Alternatives considered:
- Keep the No open drift wording and just add a small warning badge nearby. Rejected because the primary claim would still be too strong and operators would continue to read the surface as an all-clear.
- Remove all positive wording entirely from summary surfaces. Rejected because the product still needs a truthful positive state when the compare result is genuinely decision-grade.

Decision: Prioritize BaselineCompareNow and NeedsAttention as the first summary consumers of the new contract.
Rationale: BaselineCompareNow currently renders No open drift — baseline compliant whenever findingsCount is zero. NeedsAttention falls back to Everything looks healthy right now. when no attention items are generated, even though it does not currently incorporate compare trust or evidence completeness. These are the highest-risk reassurance surfaces because they sit on the tenant dashboard and are read at a glance.
Alternatives considered:
- Fix only the Baseline Compare landing page. Rejected because the landing page already has richer explanation semantics and is not the primary false-calm entry point.
- Patch the dashboard copy only. Rejected because wording alone would still be backed by inconsistent state-selection logic.

Decision: Treat evidence gaps as first-class summary-limiting inputs on banners and compact summaries, not merely as deep-diagnostic detail.
Rationale: The current coverage banner shows when coverage is warning or unproven or when there is no snapshot, but it does not surface evidence-gap-driven partiality when coverage proof exists. The spec explicitly requires evidence gaps to influence summary semantics, so the banner and other compact summaries need the same visibility into evidence limitations that the landing page already has.
Alternatives considered:
- Keep evidence gaps only on the landing page and canonical run detail. Rejected because the summary-truth contract would still fail on the dashboard and findings-adjacent surfaces.
- Promote all evidence-gap diagnostics into the summary surface. Rejected because compact surfaces need cautionary meaning and next action, not full bucket-level diagnostics.

Decision: Keep dashboard KPI cards as numeric indicators and ensure any semantic reassurance comes only from the shared summary contract on claim-bearing surfaces.
Rationale: The KPI cards currently show counts such as open drift findings and high-severity drift. They are not the source of the false compliant claim, and keeping them numeric avoids unnecessary redesign. The feature should harden claim-bearing summaries, not turn every count card into a mini explanation surface.
Alternatives considered:
- Add semantic healthy or compliant captions to KPI cards. Rejected because that would widen the surface area of the problem.
- Remove KPI cards from scope entirely. Rejected because the spec includes KPI-adjacent summaries and they still need to remain semantically subordinate to the hardened truth contract.

Decision: Expand the existing baseline compare widget, landing, stats, and run-detail tests with scenario-specific summary-truth assertions.
Rationale: The repository already has strong feature coverage around BaselineCompareStats, explanation families, landing explanations, and the widget claim that currently asserts No open drift — baseline compliant. Updating those tests keeps the regression guard close to the implementation seams and preserves the current Sail-first workflow.
Alternatives considered:
- Rely on manual QA alone. Rejected because the bug is semantic and cross-surface, so it needs automated regression protection.
- Introduce browser tests as the primary guard. Rejected because the affected logic is mainly view-model and rendered-text behavior already well-covered by feature and Livewire tests.