# Feature Specification: Enterprise Evidence Gap Details for Baseline Compare

**Feature Branch**: `162-baseline-gap-details`  
**Created**: 2026-03-24  
**Status**: Draft  
**Input**: User description: "Create an enterprise-grade baseline compare evidence gap details experience for operation runs, including searchable operator-first presentation of concrete gap subjects, diagnostic clarity, filtering expectations, audit-safe visibility, and best-practice information architecture for tenant-scoped operations."

## Spec Scope Fields *(mandatory)*

- **Scope**: tenant, canonical-view
- **Primary Routes**: `/admin/operations/{run}`, `/admin/t/{tenant}/baseline-compare-landing`, and existing related navigation back to tenant operations and baseline compare entry points
- **Data Ownership**: Tenant-owned `OperationRun` records remain the source of evidence-gap execution context. Workspace-owned baseline profiles and snapshots remain unchanged in ownership. This feature changes capture and presentation of tenant-owned evidence-gap detail, not record ownership.
- **RBAC**: Existing workspace membership, tenant entitlement, and baseline compare or monitoring view capabilities remain authoritative. No new role or capability is introduced.
- **Default filter behavior when tenant-context is active**: Canonical Monitoring entry points continue to respect active tenant context in navigation and related links, while direct run-detail access remains explicit to the run's tenant and must not silently widen visibility.
- **Explicit entitlement checks preventing cross-tenant leakage**: Canonical run detail and tenant compare review surfaces must continue to enforce workspace entitlement first and tenant entitlement second, with deny-as-not-found behavior for non-members and no cross-tenant hinting through gap details.

## Operator Surface Contract *(mandatory when operator-facing surfaces are changed)*

| Surface | Primary Persona | Surface Type | Primary Operator Question | Default-visible Information | Diagnostics-only Information | Status Dimensions Used | Mutation Scope | Primary Actions | Dangerous Actions |
|---|---|---|---|---|---|---|---|---|---|
| Monitoring → Operation Run Detail for baseline compare runs | Workspace manager or entitled tenant operator | Canonical detail | Which specific subjects caused evidence gaps, and can I trust this compare result? | Outcome, trust statement, next step, grouped gap-detail summary, searchable gap subjects, related context | Raw JSON, internal payload fragments, low-level capture fragments | execution outcome, result trust, data completeness, follow-up readiness | Simulation only for compare results, TenantPilot only for page rendering | View run, search gap details, open related tenant operations | None |
| Tenant Baseline Compare landing | Tenant operator | Tenant-scoped review surface | Which evidence gaps are blocking a trustworthy compare, and what should I inspect next? | Result meaning, gap counts, grouped reasons, searchable concrete subjects, next-step guidance | Raw evidence payload, secondary technical context | evaluation result, reliability, completeness, actionability | Simulation only | Compare now, inspect latest run, filter concrete gaps | None |

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Inspect concrete evidence gaps quickly (Priority: P1)

An operator reviewing a degraded baseline compare run needs to see which exact policy subjects caused evidence gaps so they can determine whether the visible result is trustworthy and what needs follow-up.

**Why this priority**: Aggregate counts alone do not support operational action. The core value is turning an abstract warning into concrete, reviewable subjects.

**Independent Test**: Create a baseline compare run with evidence gaps and verify that the operator can identify the affected policy subjects from the default-visible detail surface without opening JSON or database tooling.

**Acceptance Scenarios**:

1. **Given** a completed baseline compare run with evidence gaps, **When** an entitled operator opens the run detail page, **Then** the page shows grouped concrete gap subjects tied to the relevant evidence-gap reasons.
2. **Given** a completed baseline compare run with both aggregate counts and concrete gap subjects, **When** the operator reads the page, **Then** the concrete details align with the same reason buckets as the counts and do not contradict the top-level trust statement.

---

### User Story 2 - Filter large gap sets without scanning raw diagnostics (Priority: P2)

An operator dealing with many evidence gaps needs to filter the list by reason, policy type, or subject key so they can isolate the specific policy family or identifier they are investigating.

**Why this priority**: Evidence-gap sets can be too large to inspect manually. Filtering is required for operational usefulness at enterprise scale.

**Independent Test**: Open a run with multiple reasons and many gap subjects, enter a partial policy type or subject key, and confirm that the visible rows narrow to the relevant subset without leaving the page.

**Acceptance Scenarios**:

1. **Given** a run with many evidence-gap rows, **When** the operator filters by a policy type token, **Then** only matching reasons and rows remain visible.
2. **Given** a run with GUID-like subject keys or mixed human-readable names, **When** the operator filters by a partial subject key value, **Then** matching rows remain visible regardless of whether the value is human-readable text or an identifier.

---

### User Story 3 - Distinguish missing detail from no gaps (Priority: P3)

An operator reviewing a legacy or partially recorded run needs the surface to distinguish between runs that had no evidence-gap details and runs where details were never recorded, so they do not misread silence as health.

**Why this priority**: Historical runs and partial payloads will continue to exist. The system must preserve trust even when detail quality varies over time.

**Independent Test**: Open one run with gaps but no recorded subject-level details and one run with no gaps, then verify the page communicates the difference clearly.

**Acceptance Scenarios**:

1. **Given** a run with evidence-gap counts but no stored concrete subjects, **When** the operator opens the detail surface, **Then** the page explains that no detailed gap rows were recorded for that run.
2. **Given** a run with no evidence gaps, **When** the operator opens the detail surface, **Then** no misleading gap-detail section appears and the run reads as having no gap details because none exist.

### Edge Cases

- A run contains aggregate evidence-gap counts, but only some reasons have concrete subject details. The surface must show what is known without implying the missing reasons have zero affected subjects.
- A run predates subject-level evidence-gap storage. The surface must explicitly say that detailed rows were not recorded for that run.
- The same subject appears under different reasons across the same run. The surface must preserve each reason association rather than collapsing away meaning.
- Subject keys may contain spaces, GUIDs, underscores, or mixed human-readable and machine-generated values. Filtering must still work predictably.
- Very large gap sets must remain searchable and readable without requiring raw JSON inspection.

## Requirements *(mandatory)*

**Constitution alignment (required):** This feature does not introduce new Microsoft Graph calls or new long-running job types. It reuses existing baseline compare execution and extends what tenant-owned `OperationRun` records capture and reveal about evidence gaps. Existing tenant isolation, audit, and safe execution rules remain unchanged.

**Constitution alignment (OPS-UX):** Existing baseline compare `OperationRun` behavior remains within the current three-surface feedback contract. `OperationRun.status` and `OperationRun.outcome` remain service-owned. Summary counts remain numeric and lifecycle-oriented. This feature adds richer evidence-gap interpretation and detail within existing run context rather than redefining lifecycle semantics. Scheduled or system-run behavior remains unchanged.

**Constitution alignment (RBAC-UX):** This feature changes what is visible on tenant and canonical run-detail surfaces, but not who is authorized. Non-members remain 404. Members without the relevant view capability remain 403 only after membership is established. No raw capability strings or role-specific shortcuts are introduced.

**Constitution alignment (OPS-EX-AUTH-001):** Not applicable beyond reaffirming that no auth-handshake behavior is added.

**Constitution alignment (BADGE-001):** Existing run outcome, trust, and completeness semantics remain centralized. This feature must not invent new ad-hoc badge mappings for evidence-gap states.

**Constitution alignment (UI-NAMING-001):** Operator-facing labels must use domain language such as “Evidence gap details”, “Policy type”, and “Subject key” rather than implementation-first phrasing. Internal reason codes remain diagnostic, not primary.

**Constitution alignment (OPSURF-001):** Default-visible content must remain operator-first. Outcome, trust, and next-step guidance remain ahead of raw JSON. Evidence-gap details are diagnostic, but promoted enough to support first-pass action without forcing operators into raw payloads.

**Constitution alignment (Filament Action Surfaces):** The affected Filament pages remain compliant with the Action Surface Contract. No new destructive actions are introduced. Existing compare actions remain unchanged. This feature only improves read and investigation behavior on existing surfaces.

**Constitution alignment (UX-001 — Layout & Information Architecture):** The evidence-gap experience must remain sectioned, searchable, and readable inside existing detail layouts. The searchable detail table is secondary to the result summary but primary within the diagnostics path.

### Functional Requirements

- **FR-001**: The system MUST persist subject-level evidence-gap details for new baseline compare runs whenever the compare process can identify affected subjects.
- **FR-002**: The system MUST retain aggregate evidence-gap counts and reason groupings alongside subject-level evidence-gap details so both summary and detail remain available on the same run.
- **FR-003**: The system MUST present evidence-gap details on the baseline compare run-detail experience in a searchable table-oriented format rather than as raw JSON alone.
- **FR-004**: The system MUST let operators filter evidence-gap details by at least reason, policy type, and subject key from within the operator-facing surface.
- **FR-005**: The system MUST group evidence-gap details by reason so operators can understand whether subjects are blocked by ambiguity, missing current evidence, missing policy references, or similar distinct causes.
- **FR-006**: The system MUST preserve operator-safe default reading order on the run-detail surface so execution outcome, result trust, and next-step guidance appear before searchable evidence-gap detail and before raw JSON.
- **FR-007**: The system MUST distinguish between “no evidence gaps exist” and “evidence-gap details were not recorded” so historical runs and partial payloads are not misread.
- **FR-008**: The system MUST keep evidence-gap details tenant-safe on both tenant-scoped and canonical monitoring surfaces, revealing them only to entitled workspace and tenant members.
- **FR-009**: The system MUST keep the baseline compare landing experience and the canonical run-detail experience semantically aligned when they reference the same evidence-gap state.
- **FR-010**: The system MUST preserve existing compare initiation behavior, mutation scope messaging, and audit semantics for baseline compare actions. This feature MUST NOT add new dangerous actions or broaden mutation scope.
- **FR-011**: The system MUST continue to support raw JSON diagnostics for support and deep troubleshooting, but those diagnostics MUST remain secondary to the searchable evidence-gap detail experience.
- **FR-012**: The system MUST remain usable for large enterprise tenants by allowing an operator to isolate a relevant gap subject without manually scanning the full visible set.
- **FR-013**: The system MUST continue rendering older runs that only contain aggregate evidence-gap counts without failing the page or hiding the existence of evidence gaps.
- **FR-014**: The system MUST provide regression coverage for subject-level evidence-gap persistence, operator-surface rendering, and filtering-visible affordances on the affected pages.
- **FR-015**: The system MUST preserve the current no-external-calls-on-render rule for Monitoring and run-detail surfaces.

### Assumptions

- This slice focuses on baseline compare evidence-gap detail, not every diagnostic surface in the product.
- Existing baseline compare reason-code and trust semantics remain the semantic source of truth for the top-level operator reading path.
- The primary enterprise need is fast investigation of concrete gap subjects, not full ad hoc reporting from the run detail page.
- Historical runs may continue to exist without subject-level evidence-gap detail and must remain readable.

## UI Action Matrix *(mandatory when Filament is changed)*

| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline compare run detail | Existing canonical run detail surface | Existing navigation and refresh actions remain | Existing detail sections remain the inspect affordance | None added | None | No new CTA. If no gap rows are recorded, the section explains why. | Existing run-detail header actions remain | Not applicable | Existing run and compare audit semantics remain | Read-only information architecture change only |
| Tenant baseline compare landing | Existing tenant compare review surface | Existing compare action remains | Existing navigation to latest run remains | None added | None | Existing compare CTA remains | Existing page-level actions remain | Not applicable | Existing run-backed audit semantics remain | Read-only information architecture change only |

### Key Entities *(include if feature involves data)*

- **Evidence Gap Detail**: A concrete affected subject associated with a specific evidence-gap reason for a baseline compare run, including enough operator-readable identity to investigate the issue.
- **Evidence Gap Reason Group**: A reason bucket that explains why one or more subjects limited compare confidence, used to structure both counts and detailed rows.
- **Baseline Compare Run Context**: The tenant-owned run context that stores both summary evidence-gap information and subject-level detail for later operator review.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: On a baseline compare run with recorded subject-level evidence gaps, an entitled operator can identify at least one affected subject from the default-visible detail surface in under 30 seconds without opening raw JSON.
- **SC-002**: On a run with multiple gap reasons, an entitled operator can isolate the relevant reason, policy type, or subject key using the on-page filter in one filtering action.
- **SC-003**: Legacy runs without subject-level detail continue to render successfully and clearly distinguish missing recorded detail from absence of evidence gaps.
- **SC-004**: The canonical run-detail surface and tenant baseline compare review surface remain semantically consistent in how they describe evidence-gap-driven limited-confidence results.
- **SC-005**: Regression coverage exists for subject-level detail persistence, operator-facing rendering, and search-visible affordances on the affected surfaces.