6.1 KiB
6.1 KiB
Research: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening
Decision 1: Derive compact summary claims from existing compare truth and explanation seams
- Decision: Build the compact summary contract from
BaselineCompareStatsplusoperatorExplanation()rather than from findings counts or widget-local conditions. - Rationale: The current landing surface already understands explanation family, trustworthiness, coverage statements, reliability statements, reason codes, and next steps. The widget path is currently too lossy because
BaselineCompareStats::forWidget()collapses the compare state into assignment, snapshot, counts, and last-compare timing. Reusing the richer truth layer ensures summary surfaces do not invent a stronger meaning than the deeper surfaces already carry. - Alternatives considered:
- Patch each widget with bespoke
ifconditions aroundfindingsCount,coverageStatus, andevidenceGapsCount. Rejected because that would create another parallel truth model and would drift from the explanation layer over time. - Re-architect compare persistence or introduce new result enums. Rejected because the spec explicitly rules out backend or model rewrites and the current truth signals already exist.
- Patch each widget with bespoke
Decision 2: Treat zero findings as an output count, never as automatic compliance
- Decision: Positive all-clear wording is allowed only when the shared summary contract marks the compare result as trustworthy and free from material evidence or coverage limitations.
- Rationale: Existing baseline compare explanation logic already distinguishes trustworthy no-result, completed-but-limited, suppressed output, unavailable, and blocked states. The specific false-calm bug is that compact summaries translate
0 findingsintobaseline complianteven when the reason code, coverage proof, or evidence gaps make that interpretation unsafe. - Alternatives considered:
- Keep the
No open driftwording and just add a small warning badge nearby. Rejected because the primary claim would still be too strong and operators would continue to read the surface as an all-clear. - Remove all positive wording entirely from summary surfaces. Rejected because the product still needs a truthful positive state when the compare result is genuinely decision-grade.
- Keep the
Decision 3: Harden the dashboard first because it contains the strongest false-calm claims
- Decision: Prioritize
BaselineCompareNowandNeedsAttentionas the first summary consumers of the new contract. - Rationale:
BaselineCompareNowcurrently rendersNo open drift — baseline compliantwheneverfindingsCountis zero.NeedsAttentionfalls back toEverything looks healthy right now.when no attention items are generated, even though it does not currently incorporate compare trust or evidence completeness. These are the highest-risk reassurance surfaces because they sit on the tenant dashboard and are read at a glance. - Alternatives considered:
- Fix only the Baseline Compare landing page. Rejected because the landing page already has richer explanation semantics and is not the primary false-calm entry point.
- Patch the dashboard copy only. Rejected because wording alone would still be backed by inconsistent state-selection logic.
Decision 4: Evidence gaps must influence compact summaries even without uncovered-type coverage warnings
- Decision: Treat evidence gaps as first-class summary-limiting inputs on banners and compact summaries, not merely as deep-diagnostic detail.
- Rationale: The current coverage banner shows when coverage is
warningorunprovenor when there is no snapshot, but it does not surface evidence-gap-driven partiality when coverage proof exists. The spec explicitly requires evidence gaps to influence summary semantics, so the banner and other compact summaries need the same visibility into evidence limitations that the landing page already has. - Alternatives considered:
- Keep evidence gaps only on the landing page and canonical run detail. Rejected because the summary-truth contract would still fail on the dashboard and findings-adjacent surfaces.
- Promote all evidence-gap diagnostics into the summary surface. Rejected because compact surfaces need cautionary meaning and next action, not full bucket-level diagnostics.
Decision 5: KPI cards stay quantitative and should not be promoted into semantic health claims
- Decision: Keep dashboard KPI cards as numeric indicators and ensure any semantic reassurance comes only from the shared summary contract on claim-bearing surfaces.
- Rationale: The KPI cards currently show counts such as open drift findings and high-severity drift. They are not the source of the false compliant claim, and keeping them numeric avoids unnecessary redesign. The feature should harden claim-bearing summaries, not turn every count card into a mini explanation surface.
- Alternatives considered:
- Add semantic healthy or compliant captions to KPI cards. Rejected because that would widen the surface area of the problem.
- Remove KPI cards from scope entirely. Rejected because the spec includes KPI-adjacent summaries and they still need to remain semantically subordinate to the hardened truth contract.
Decision 6: Extend existing Pest and Livewire tests instead of creating a new browser harness
- Decision: Expand the existing baseline compare widget, landing, stats, and run-detail tests with scenario-specific summary-truth assertions.
- Rationale: The repository already has strong feature coverage around
BaselineCompareStats, explanation families, landing explanations, and the widget claim that currently assertsNo open drift — baseline compliant. Updating those tests keeps the regression guard close to the implementation seams and preserves the current Sail-first workflow. - Alternatives considered:
- Rely on manual QA alone. Rejected because the bug is semantic and cross-surface, so it needs automated regression protection.
- Introduce browser tests as the primary guard. Rejected because the affected logic is mainly view-model and rendered-text behavior already well-covered by feature and Livewire tests.