TenantAtlas/specs/165-baseline-summary-trust/spec.md

# Feature Specification: Baseline Compare Summary Trust Propagation & Compliance Claim Hardening

**Feature Branch**: `165-baseline-summary-trust`
**Created**: 2026-03-26
**Status**: Draft
**Input**: User description: "Spec 165 — Baseline Compare Summary Trust Propagation & Compliance Claim Hardening"

## Spec Scope Fields *(mandatory)*

- **Scope**: tenant
- **Primary Routes**:
  - Existing tenant dashboard at `/admin`, including the baseline compare summary widget, needs-attention summary, and drift-related KPI cards
  - Existing tenant Baseline Compare landing page
  - Existing tenant findings surfaces that summarize baseline compare coverage or evidence limitations
  - Existing canonical operation-run drilldowns reached from baseline compare summaries
- **Data Ownership**:
  - Baseline profiles and baseline snapshots remain workspace-owned standards artifacts
  - Baseline compare results, drift findings, evidence gaps, and tenant-linked operation runs remain tenant-owned operational evidence
  - This feature changes summary interpretation, claim strength, and operator guidance only; it does not change ownership, persistence, or route identity
- **RBAC**:
  - Existing workspace membership and tenant membership remain required for tenant-context summary surfaces
  - Existing tenant-view permissions remain authoritative for inspecting baseline, drift, and compare summaries
  - Existing compare-start permissions remain authoritative for any existing compare action exposed from the landing surface
  - Non-members remain deny-as-not-found, and members in scope but lacking an action capability remain forbidden for that action

## Operator Surface Contract *(mandatory when operator-facing surfaces are changed)*

If this feature adds a new operator-facing page or materially refactors one, fill out one row per affected page/surface.

| Surface | Primary Persona | Surface Type | Primary Operator Question | Default-visible Information | Diagnostics-only Information | Status Dimensions Used | Mutation Scope | Primary Actions | Dangerous Actions |
|---|---|---|---|---|---|---|---|---|---|
| Tenant dashboard baseline summaries | Tenant operator | Dashboard summary | Can I safely treat this tenant as aligned, or do I need to review the compare result more closely? | Assigned baseline state, strongest safe summary claim, open drift counts, freshness or availability state, and the clearest next drilldown | Detailed evidence-gap reasons, coverage breakdowns, and raw compare diagnostics | governance result, evidence completeness, freshness, availability | Read-only summary surface | Open Baseline Compare, View findings, View run when available | None introduced by this spec |
| Baseline Compare landing summary | Tenant operator | Tenant landing/detail | What does the latest compare actually prove, and what should I do next? | Primary compare meaning, trustworthiness, evidence limitations, drift confirmation state, and one obvious next step | Detailed diagnostics, evidence-gap breakdowns, and low-level supporting facts | governance result, trust or confidence, evidence completeness, lifecycle or freshness | Existing compare-start action remains unchanged; summary itself is read-only | Compare now, View run, Open findings | No new dangerous action; existing guarded actions remain under current confirmation and authorization rules |
| Findings coverage banner and adjacent summary copy | Tenant operator | Banner summary | Are current findings enough to trust the absence of drift? | Coverage caveat, evidence limitation, and the safest follow-up cue | Detailed gap reasons and underlying compare evidence | governance result, evidence completeness, availability | Read-only summary surface | Review coverage details, Open findings, View run when relevant | None introduced by this spec |
| Canonical operation-run drilldown for baseline compare | Workspace or tenant operator with access | Canonical detail | Is the underlying compare result trustworthy enough to support the summary claim? | Run outcome, artifact truth, result meaning, trustworthiness, and the primary next action | Raw payload fragments, diagnostics, and detailed count breakdowns | execution outcome, artifact truth, evidence completeness, next-action readiness | Read-only drilldown | View run details and follow linked next steps | None introduced by this spec |

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Trust dashboard summary claims (Priority: P1)

As a tenant operator, I want summary surfaces to avoid false calm when the last baseline compare is incomplete or only partially trustworthy, so that I do not mistake missing findings for a reliable governance conclusion.

**Why this priority**: False reassurance on dashboard and compact summaries is the core trust risk. If this is wrong, operators can deprioritize real follow-up work.

**Independent Test**: Can be fully tested by rendering covered summary surfaces for scenarios with zero visible findings but limited confidence, incomplete evidence, suppressed results, or other trust limitations and verifying that none of them present a compliant or all-clear claim.

**Acceptance Scenarios**:

1. **Given** a tenant with zero visible drift findings but a limited-confidence or evidence-gap-affected compare result, **When** a dashboard summary renders, **Then** it shows a cautionary or review-oriented state instead of `Compliant`, `No drift`, or an equivalent all-clear claim.
2. **Given** a tenant with a trustworthy compare result, no contradictory evidence limitation, and no confirmed drift, **When** a dashboard summary renders, **Then** it may show a positive aligned state without contradicting deeper surfaces.

---

### User Story 2 - Triage constrained compare results safely (Priority: P2)

As a tenant operator, I want landing and compact baseline compare surfaces to tell me whether the latest result is reliable enough to use and what I should review next, so that I can act appropriately when the result is incomplete, stale, unavailable, or only diagnostically useful.

**Why this priority**: Operators need more than a count. They need an honest statement of what the compare result does and does not prove.

**Independent Test**: Can be fully tested by opening the covered landing and summary surfaces for incomplete, suppressed, stale, failed, and no-compare-yet scenarios and verifying that each one presents the correct state family and a logical next step.

**Acceptance Scenarios**:

1. **Given** no open drift findings and incomplete or partial evidence, **When** the operator opens the landing summary or compact summary, **Then** the surface presents a limited-confidence or incomplete-evidence state with a drilldown hint instead of an all-clear claim.
2. **Given** no usable compare result is available because the compare is missing or not ready, or the compare failed and requires investigation, **When** a covered summary surface renders, **Then** it communicates `unavailable`, `in_progress`, or `action required` rather than a healthy posture, with failed compare results mapping to an investigation-oriented action-required state.

---

### User Story 3 - See one truth across summary and detail (Priority: P3)

As an operator moving from dashboard to landing to run detail, I want the same compare result to keep the same underlying meaning across all surfaces, so that deeper inspection confirms the summary rather than correcting it.

**Why this priority**: The feature fails if different surfaces describe the same compare result in conflicting ways.

**Independent Test**: Can be fully tested by comparing the same covered scenario across dashboard, landing, findings-adjacent summary, and canonical run detail and confirming that the deeper surface is equally cautious or more cautious, but never less cautious.

**Acceptance Scenarios**:

1. **Given** the same compare result appears on widget, landing, and run-detail surfaces, **When** the operator navigates across them, **Then** the primary claim stays semantically consistent and the deeper surface is never less cautious than the summary.
2. **Given** a summary surface cannot honestly give an all-clear claim, **When** it renders, **Then** it exposes a next action or drilldown that leads to the supporting detail needed to resolve the uncertainty.

### Edge Cases

- A compare result may show zero visible findings while still carrying limited confidence, incomplete evidence, suppressed evaluation, or material evidence gaps.
- A compare artifact may exist while the result is still too incomplete or untrustworthy to justify a compliance or no-drift claim.
- Coverage limitations and evidence gaps may coexist with reassuring counts, and summary surfaces must surface the limitation rather than hiding it behind the counts.
- A tenant may have stale compare history, failed compare history, no compare history, no assigned baseline, or no consumable snapshot; each case must land in a distinct stale, action-required, or unavailable state rather than a healthy state.
- Different summary surfaces may emphasize different slices of the same result, but none may become more optimistic than the landing or drilldown truth.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature introduces no new Microsoft Graph call path, no new mutation workflow, and no new long-running job type. It hardens summary interpretation and operator copy for existing baseline compare evidence. Existing compare execution, confirmation, audit, and run-observability behavior remain authoritative.

**Constitution alignment (OPS-UX):** This feature reuses existing compare and operation-run semantics as read surfaces only. The existing Ops-UX 3-surface feedback contract for compare execution remains unchanged. `OperationRun.status` and `OperationRun.outcome` remain service-owned, existing `summary_counts` normalization remains authoritative, and scheduled or system-run behavior is unaffected. Regression tests for this feature must focus on summary claim safety, cross-surface consistency, and evidence-gap propagation rather than new lifecycle behavior.

**Constitution alignment (RBAC-UX):** This feature does not introduce new authorization rules. It remains in the tenant/admin plane for dashboard, findings, and landing surfaces, with canonical drilldowns continuing to enforce existing workspace and tenant entitlement checks. Non-members remain deny-as-not-found, members remain subject to existing capability checks for guarded actions, and no raw capability strings or role shortcuts may be introduced through summary hardening.

**Constitution alignment (OPS-EX-AUTH-001):** Not applicable. No `/auth/*` handshake path is involved.

**Constitution alignment (BADGE-001):** Any status-like badge, color, or tone used by a covered summary surface must continue to come from centralized semantics for state, trust, severity, or availability. The feature must not introduce page-local green-success shortcuts that imply a stronger claim than the underlying result supports.

**Constitution alignment (UI-FIL-001):** Covered dashboard widgets, landing summaries, and related operator surfaces must continue to rely on Filament widgets, shared badges, shared alerts, and existing surface primitives rather than introducing a local status language. If a compact custom summary block remains necessary, it must still consume shared status semantics instead of ad hoc page-local styling rules.

**Constitution alignment (UI-NAMING-001):** The target object is the tenant's latest baseline compare posture. Primary operator copy must preserve truthful domain language such as aligned, limited confidence, incomplete evidence, result unavailable, review details, and open findings. Implementation-first terms or false-calming phrases must not appear as primary labels when the result is not decision-grade.

**Constitution alignment (OPSURF-001):** Default-visible content on covered surfaces must remain operator-first, communicating governance result, evidence completeness, freshness or availability, and next action without requiring diagnostic detail. Diagnostics remain secondary and explicitly deeper. Existing compare-start actions keep their current mutation-scope messaging and safe-execution behavior. No new dangerous action is introduced by this feature.

**Constitution alignment (Filament Action Surfaces):** This feature modifies existing Filament-backed operator surfaces, including a tenant page and summary widgets, without expanding the action inventory. The Action Surface Contract remains satisfied because the landing page keeps its current guarded `Compare now` action, read-only summary widgets remain non-mutating, and the summary hardening changes interpretation rather than action topology. UI-FIL-001 remains satisfied because the feature is expected to reuse existing Filament or shared status primitives. No exemption is required.

**Constitution alignment (UX-001 — Layout & Information Architecture):** The feature changes summary semantics on existing widgets, banners, and a landing page rather than introducing create or edit screens. Covered surfaces must keep clear sections or cards, meaningful empty or unavailable states, and centralized status presentation. The landing page may continue using its existing custom enterprise layout so long as it preserves the operator-first hierarchy and avoids conflicting summary claims.

### Functional Requirements

- **FR-165-001**: The system MUST prevent every in-scope summary surface from showing `Compliant`, `Baseline compliant`, `No drift`, `No open drift`, `All clear`, or a semantically equivalent all-clear claim unless the underlying compare result is trustworthy enough to support that claim.
- **FR-165-002**: The primary state of every in-scope baseline or drift summary surface MUST be derived from the combined meaning of drift confirmation, trustworthiness or confidence, evidence completeness, result availability, and any material coverage limitation rather than from findings counts alone.
- **FR-165-003**: The system MUST treat `0 findings` or `no open findings` as insufficient on their own to justify a compliance or no-drift claim.
- **FR-165-004**: When no open drift is confirmed but the compare result is limited-confidence, incomplete, suppressed, diagnostically useful only, or otherwise not decision-grade, the surface MUST use a cautionary or review-oriented state family instead of a positive all-clear family.
- **FR-165-005**: A positive summary state may be used only when a usable compare result is available, the result is trustworthy enough for operator decision-making, no material evidence limitation undercuts the claim, and the result meaning does not contradict the positive claim.
- **FR-165-006**: Evidence gaps, resolver limitations, coverage warnings, stale compare conditions, missing compare results, and failed compare results MUST visibly influence the summary state, its wording, or both.
- **FR-165-007**: Every in-scope summary surface MUST present one clear primary statement that answers whether drift is confirmed, whether the result is limited or incomplete, whether no usable result is available, or whether follow-up is required.
- **FR-165-008**: Every in-scope summary surface that cannot safely present a positive all-clear claim MUST offer a logical next step, drilldown, or review cue that follows directly from the limited or unavailable state.
- **FR-165-009**: The system MUST preserve a semantic distinction between `no findings visible`, `no confirmed drift`, `limited confidence`, `incomplete evidence`, `result unavailable`, and `tenant compliant` rather than collapsing them into one visual or linguistic state.
- **FR-165-010**: Two different in-scope summary surfaces describing the same compare result MUST NOT present materially conflicting primary claims.
- **FR-165-011**: A compact summary surface MAY be equally cautious or more cautious than a deeper landing or drilldown surface, but it MUST never be more optimistic than the deeper truth surface.
- **FR-165-012**: Covered summary surfaces MUST consume the existing trust, explanation, evidence, and result-meaning foundations rather than inventing an isolated widget-only truth model.
- **FR-165-013**: Existing navigation from summary surfaces to Baseline Compare, findings, or run detail MUST remain intact so that the operator can resolve uncertainty quickly.
- **FR-165-014**: Empty, missing, failed, stale, and not-ready compare situations MUST be represented as intentionally distinct state families rather than falling through to healthy, aligned, or compliant language. For this feature, `not-ready` is an umbrella term that MUST resolve into the formal `in_progress` or `unavailable` state family depending on whether an active compare is underway.
- **FR-165-015**: Compact dashboard and headline surfaces MUST favor truthful caution over visual calm whenever the result meaning is ambiguous or evidence is materially limited.

### Non-Functional Requirements

- **NFR-165-001**: The feature MUST be deliverable without introducing new database tables, new persistent result models, or new outcome enums.
- **NFR-165-002**: Existing landing and detail surfaces that already expose richer trust or evidence semantics MUST not be flattened, weakened, or contradicted.
- **NFR-165-003**: Existing tenant dashboard, findings, landing, and run-drilldown navigation paths MUST remain stable.
- **NFR-165-004**: Existing authorization and tenant-isolation behavior for all covered surfaces MUST remain intact.
- **NFR-165-005**: The UI may become more conservative, but it must remain compact and readable rather than turning every limited result into alarm-heavy noise.

### Non-Goals

- Rewriting the compare engine, compare execution workflow, or compare persistence model
- Introducing new evidence-gap storage structures, new result enums, or a new backend outcome taxonomy
- Re-implementing the full baseline compare landing page or operation-run detail page beyond the summary-truth contract they expose
- Changing reporting, exports, risk acceptance, exceptions handling, or time-series drift tracking
- Redesigning unrelated dashboard or monitoring surfaces outside the baseline or drift summary problem

### Assumptions

- Existing baseline compare truth, explanation, and evidence foundations are already strong enough that the primary gap is summary propagation rather than backend semantics.
- Existing landing and detail surfaces already communicate limited confidence and evidence limitations better than the compact summary surfaces do today.
- Operators benefit more from conservative governance language than from visually calm but semantically overstated positive states.
- Existing compare-start actions, findings drilldowns, and run drilldowns remain the correct next-step paths and do not need a new execution model for this feature.

### Dependencies

- Existing baseline compare truth and explanation foundations
- Existing evidence-gap and coverage semantics
- Existing tenant dashboard, findings, Baseline Compare landing, and canonical operation-run drilldown surfaces
- Existing tenant authorization and action-guard patterns

### Risks

- Summary surfaces may feel stricter than before, which could initially be perceived as noisier even though the semantics are safer.
- Different compact surfaces could drift into slightly different cautionary phrasing if the shared summary contract is not applied consistently.
- A surface-level fix that only patches one widget could reintroduce semantic drift elsewhere if shared summary rules are not reused.

## UI Action Matrix *(mandatory when Filament is changed)*

If this feature adds/modifies any Filament Resource / RelationManager / Page, fill out the matrix below.

For each surface, list the exact action labels, whether they are destructive (confirmation? typed confirmation?),
RBAC gating (capability + enforcement helper), and whether the mutation writes an audit log.

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline Compare landing page | Existing tenant Baseline Compare page | `Compare now` remains the existing header action and keeps current confirmation plus capability gating | Not a record-list inspect surface | None introduced by this spec | None | Existing missing-assignment, missing-snapshot, and unavailable-state guidance remains | `Compare now`, existing `View run` or `Open findings` drilldowns where already available | Not applicable | Existing compare-start audit and run-observability behavior remains unchanged | Action Surface Contract satisfied. This feature changes summary interpretation and wording, not action topology. |
| Tenant dashboard summary widgets | Existing tenant dashboard widgets and summary cards | None added by this spec | Existing links to Baseline Compare, findings, and operations remain the inspect path | None | None | Existing dashboard empty or unavailable states remain, but their summary claims must obey the hardened contract | Not applicable | Not applicable | No new audit event | Read-only widget surfaces. No exemption required because no new action surface is introduced. |

### Key Entities *(include if feature involves data)*

- **Baseline summary claim**: The strongest safe statement a compact surface makes about the tenant's current baseline or drift posture.
- **Compare result trust signal**: The combined meaning of trustworthiness, confidence, artifact usability, and result quality that determines how strong a summary claim may be.
- **Evidence completeness signal**: The availability, coverage, and evidence-gap posture that can limit or qualify a summary claim even when findings counts look calm.
- **Summary state family**: The operator-facing state category used by compact surfaces, such as positive, cautionary, unavailable, or action-required.
- **Primary next step**: The clearest follow-up action or drilldown the operator should take when the summary cannot safely present an all-clear claim.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-165-001**: In 100% of covered limited-confidence, incomplete-evidence, suppressed-result, or evidence-gap-affected scenarios, in-scope summary surfaces avoid compliant or all-clear claims.
- **SC-165-002**: In 100% of covered trustworthy and fully usable no-drift scenarios, in-scope summary surfaces may present a positive aligned state without contradicting deeper surfaces.
- **SC-165-003**: In acceptance review of covered scenarios, the same compare result produces no materially conflicting primary claim across dashboard summary, landing summary, findings-adjacent summary, and run drilldown.
- **SC-165-004**: In every covered cautionary or unavailable scenario, an operator can identify the correct next step or drilldown from the visible summary in 10 seconds or less.
- **SC-165-005**: In regression review, existing richer landing and detail surfaces continue to expose trust, evidence-gap, and result-meaning nuance without being simplified into findings-only semantics.