TenantAtlas/specs/161-operator-explanation-layer/spec.md
ahmido 1f0cc5de56 feat: implement operator explanation layer (#191)
## Summary
- add the shared operator explanation layer with explanation families, trustworthiness semantics, count descriptors, and centralized badge mappings
- adopt explanation-first rendering across baseline compare, governance operation run detail, baseline snapshot presentation, tenant review detail, and review register rows
- extend reason translation, artifact-truth presentation, fallback ops UX messaging, and focused regression coverage for operator explanation semantics

## Testing
- vendor/bin/sail bin pint --dirty --format agent
- vendor/bin/sail artisan test --compact tests/Feature/Monitoring/OperationsTenantScopeTest.php tests/Feature/Operations/OperationRunBlockedExecutionPresentationTest.php
- vendor/bin/sail artisan test --compact

## Notes
- Livewire v4 compatible
- panel provider registration remains in bootstrap/providers.php
- no destructive Filament actions were added or changed in this PR
- no new global-search behavior was introduced in this slice

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #191
2026-03-24 11:24:33 +00:00

153 lines
20 KiB
Markdown

# Feature Specification: Operator Explanation Layer for Degraded, Partial, and Suppressed Results
**Feature Branch**: `161-operator-explanation-layer`
**Created**: 2026-03-23
**Status**: Draft
**Input**: User description: "Operator Explanation Layer for Degraded / Partial / Suppressed Results"
## Spec Scope Fields *(mandatory)*
- **Scope**: workspace, tenant, canonical-view
- **Primary Routes**: `/admin/t/{tenant}/baseline-compare`, governance artifact detail pages under `/admin`, governance list surfaces under `/admin`, and Monitoring → Operations → Run Detail for governance-oriented runs
- **Data Ownership**: Workspace-owned records keep their existing ownership, including baseline snapshots, evidence artifacts, tenant reviews, and review-pack outputs. Tenant-owned `OperationRun` records and tenant-scoped governance results remain tenant-owned. This feature changes how those records are explained, not who owns them.
- **RBAC**: Existing workspace membership, tenant membership, and capability checks remain authoritative. This spec changes operator-facing explanation and information hierarchy, not membership boundaries.
- **Reference rollout surfaces for this slice**: Baseline Compare, Monitoring → Run Detail for governance operations, Baseline Snapshot list/detail as baseline-capture result presentation, Tenant Review detail, and Review Register list rows
- **List Surface Review Standard**: Because this feature changes Review Register and baseline- or governance-oriented list surfaces, implementation and review MUST follow `docs/product/standards/list-surface-review-checklist.md`.
- **Default filter behavior when tenant-context is active**: Canonical Monitoring and governance read surfaces that already support tenant context MUST continue to prefilter to the active tenant when tenant context is selected.
- **Explicit entitlement checks preventing cross-tenant leakage**: Any canonical or workspace-context surface that reveals tenant-owned run or governance results MUST continue to enforce workspace entitlement first and tenant entitlement second, with deny-as-not-found behavior for non-members.
## Operator Surface Contract *(mandatory when operator-facing surfaces are changed)*
| Surface | Primary Persona | Surface Type | Primary Operator Question | Default-visible Information | Diagnostics-only Information | Status Dimensions Used | Mutation Scope | Primary Actions | Dangerous Actions |
|---|---|---|---|---|---|---|---|---|---|
| Baseline Compare | Tenant operator | Tenant-scoped action page | Did the compare produce a trustworthy result, and if not, why not? | Primary explanation, result reliability, coverage/completeness signal, next action, clearly scoped counts | Raw reason codes, evidence-gap payloads, low-level context, internal suppression details | execution outcome, evaluation result, reliability, coverage, recommended action | Simulation only | Compare now, View findings, View run | None |
| Monitoring → Run Detail for governance operations | Workspace manager or entitled tenant operator | Canonical detail | What happened, how trustworthy is the result, and what should I do next? | Outcome summary, explanation summary, result trust statement, next action, operator-safe count meaning | Raw JSON, low-level context, internal reason-code detail, implementation-first counters | execution outcome, evaluation result, reliability, coverage, readiness | TenantPilot only or simulation only depending on run type | View related artifact, View related surface | None |
| Baseline capture result presentation | Workspace manager | List/detail | Did baseline capture produce a trustworthy baseline artifact, and if not, why not? | Primary state, trustworthiness statement, capture-result explanation, next action | Full truth envelope, renderer details, raw cause detail, low-level support metadata | lifecycle, usability, completeness, recommended action | TenantPilot only | View related run, inspect snapshot | None |
| Tenant Review detail | Workspace manager | Detail | Is this review usable, why or why not, and what follow-up is needed? | Primary state, short explanation, trustworthiness or publishability statement, next action | Full truth envelope, support metadata, source diagnostics | lifecycle or readiness, usability, completeness, publication status, recommended action | TenantPilot only | View related run, continue workflow action when already allowed | Existing destructive actions remain unchanged and separately governed |
| Review Register list rows | Workspace manager | List | Which reviews need attention, and which are genuinely ready? | One primary operator statement, brief reason, next-step hint, semantically safe counts | Raw badges for every semantic axis, low-level reason codes, detailed fidelity sub-axes | primary outcome, trustworthiness, actionability | None | Filter, inspect, open detail | None |
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Understand degraded results without diagnostics (Priority: P1)
An operator reviewing a governance result needs the product to explain a degraded, partial, or suppressed outcome in plain operator language without requiring JSON, internal codes, or product-specific background knowledge.
**Why this priority**: This is the central product problem. If operators still need diagnostics to understand whether a result is trustworthy, the feature has not delivered its value.
**Independent Test**: Present a governance result where execution completed but evaluation was limited, then confirm the operator can determine what happened, how trustworthy the result is, and the next action from the default-visible content alone.
**Acceptance Scenarios**:
1. **Given** a baseline compare run that finished technically but could not fully evaluate due to evidence gaps, **When** an operator opens the compare surface or run detail, **Then** the page states that evaluation was incomplete, explains the dominant cause in operator language, and does not let `0 findings` read as an all-clear.
2. **Given** a governance result that produced no output because inputs were missing or suppressed, **When** an operator opens the affected surface, **Then** the page explains why no result was produced and what follow-up is appropriate before showing diagnostics.
---
### User Story 2 - Separate execution success from result trust (Priority: P2)
An operator needs the product to distinguish a technically finished run from the trustworthiness and completeness of the result it produced.
**Why this priority**: The most damaging false-green cases come from execution success being misread as trustworthy outcome.
**Independent Test**: Review multiple reference cases where execution and result trust diverge, then verify the surface keeps those dimensions separate and non-contradictory.
**Acceptance Scenarios**:
1. **Given** a run that completed successfully but produced a limited-confidence artifact, **When** an operator views the run or related artifact, **Then** execution success and result trust are shown as separate statements with no conflicting headline.
2. **Given** a run that failed after producing partial intermediate data, **When** an operator views the result, **Then** the surface makes clear that the run failed and the partial data is not decision-grade.
---
### User Story 3 - Reuse one explanation pattern across domains (Priority: P3)
A workspace manager needs degraded, suppressed, and incomplete states to read consistently across baseline, evidence, review, and governance monitoring surfaces.
**Why this priority**: Local one-off explanations create a new inconsistency problem even if each surface improves individually.
**Independent Test**: Compare the same cause category on the baseline reference surfaces and on a second governance domain surface, then confirm the primary explanation pattern and next-step language stay aligned.
**Acceptance Scenarios**:
1. **Given** two different governance surfaces that both represent missing-input or insufficient-evidence states, **When** an operator reads them, **Then** both use the same explanation tiering and the same reading direction for reliability and next action.
### Edge Cases
- A result shows `0 findings`, but evidence coverage is incomplete. The surface must not read as healthy by default.
- A run is technically successful, but the produced artifact is not trustworthy or not publishable. The headline and follow-up guidance must not conflict.
- Multiple causes contribute to one degraded result. The surface must identify the dominant cause without hiding that other causes exist.
- No result exists because the system intentionally suppressed output. The surface must distinguish suppression from failure and from true absence of issues.
- Diagnostics are unavailable or delayed. The primary explanation still needs to remain understandable.
## Requirements *(mandatory)*
**Constitution alignment (required):** This feature does not introduce new Microsoft Graph calls, new mutation flows, or new long-running jobs. It changes how existing governance and monitoring surfaces explain already produced outcomes and artifacts. Existing contract-registry, safety-gate, audit, and tenant-isolation rules remain unchanged.
**Constitution alignment (OPS-UX):** Existing `OperationRun` creation and lifecycle rules remain unchanged. Governance runs continue to use the existing three feedback surfaces only. `OperationRun.status` and `OperationRun.outcome` remain service-owned. Summary counts remain numeric and execution-oriented; this feature adds meaning and interpretation layers on top of them rather than redefining run lifecycle.
**Constitution alignment (RBAC-UX):** This feature touches workspace-admin, tenant, and canonical Monitoring surfaces, but does not change authorization semantics. Non-members still receive 404. Members lacking capability still receive 403. All existing server-side authorization remains required for any action that starts an operation or reveals tenant-owned governance data.
**Constitution alignment (OPS-EX-AUTH-001):** Not applicable beyond reaffirming that no auth-handshake behavior is introduced.
**Constitution alignment (BADGE-001):** Any new or changed badges, labels, or severity treatments for degraded, partial, suppressed, incomplete, trustworthy, limited-confidence, or unusable states MUST remain centralized. No surface may invent local color or label mappings for the same semantic state.
**Constitution alignment (UI-NAMING-001):** Operator-facing language MUST prioritize domain meaning over implementation terms. Primary wording must explain what happened, how reliable the result is, and what to do next. Internal reason codes and implementation-first terms may remain available only in diagnostics.
**Constitution alignment (OPSURF-001):** Default-visible content on affected operator surfaces MUST remain operator-first and diagnostics-second. The default reading path must be: what happened, how trustworthy the result is, why it looks this way, what to do next. Raw JSON, internal codes, and low-level payload details remain secondary.
**Constitution alignment (UI-STD-001):** Because this feature changes Review Register and other governance-oriented list surfaces, implementation and review MUST use `docs/product/standards/list-surface-review-checklist.md`.
**Constitution alignment (Filament Action Surfaces):** This feature materially refactors several Filament-facing read surfaces but does not introduce new destructive actions. The Action Surface Contract remains satisfied because the main change is explanation hierarchy, count semantics, and status presentation. Existing action topology remains in place unless a follow-up spec changes it explicitly.
**Constitution alignment (UX-001 — Layout & Information Architecture):** Affected screens MUST preserve structured, sectioned layouts. New explanation blocks, trust statements, and next-step summaries must appear as deliberate information sections rather than scattered helper text.
### Functional Requirements
- **FR-001**: The system MUST define a shared operator explanation model that separates at minimum these semantic axes wherever relevant: execution outcome, evaluation result, reliability or trustworthiness, coverage or completeness, and recommended action.
- **FR-002**: The system MUST provide a primary operator explanation for degraded, partial, suppressed, missing-input, and incomplete-result cases that can be understood without opening diagnostics.
- **FR-003**: The system MUST ensure technical reason codes and raw diagnostics remain available for troubleshooting but are not the primary headline or default explanation on affected operator surfaces.
- **FR-004**: The system MUST define semantically safe count rules so output counts, evaluation counts, and completeness or reliability signals cannot be misread as the same thing.
- **FR-005**: The system MUST prevent `0 findings`, `0 issues`, `no results`, or similarly empty-looking result summaries from reading as implicit all-clear when evaluation was limited, suppressed, or incomplete.
- **FR-006**: The system MUST define a reusable explanation pattern for absent-output cases that distinguishes at minimum: true no-issues results, missing required input, suppressed output, blocked prerequisite state, and not-yet-available evaluation.
- **FR-007**: The system MUST define a reusable explanation pattern for technically finished but decision-limited cases where output exists or execution completed, but the produced result is only partially trustworthy, incomplete, or diagnostically useful rather than decision-grade.
- **FR-008**: The system MUST define next-step guidance categories that are semantically derived from the cause class, including at minimum: no action needed, observe, retry later, fix prerequisite, refresh or sync data, review evidence gaps, manually validate, and escalate.
- **FR-009**: The system MUST ensure the same underlying cause class is rendered with the same primary reading direction across all reference surfaces in scope, even when the surrounding domain differs.
- **FR-010**: The system MUST implement the explanation layer first on these reference surfaces for this slice: Baseline Compare, Monitoring → Operation Run Detail for governance runs, and Baseline Snapshot list/detail as baseline-capture result presentation.
- **FR-011**: The system MUST preserve diagnostics as a clearly secondary layer on reference surfaces, with the primary operator explanation visible before raw JSON, raw reason codes, or implementation-first counters.
- **FR-012**: The system MUST ensure the top-level state presented on a reference surface never contradicts the explanation shown beneath it. A technically successful run with a limited-confidence result must read as a consistent composite rather than as a success headline plus buried caveat.
- **FR-013**: The system MUST define one shared explanation-pattern library or registry that implements FR-006 and FR-007 and additionally covers repeated state families such as completed but degraded, completed but incomplete, no output because suppressed, no output because missing input, and output exists but is not yet publishable or decision-grade.
- **FR-014**: The system MUST ensure baseline compare is the reference proof point for this model, including the motivating case where no findings are shown because evidence coverage was incomplete.
- **FR-015**: The system MUST preserve existing RBAC boundaries, context scoping, and action gating while changing explanation language and information hierarchy.
- **FR-016**: The system MUST provide regression coverage for at least one reference case in which execution success, result trustworthiness, and output counts intentionally diverge.
- **FR-017**: The system MUST provide regression coverage for at least one absent-output case and one suppressed-output case so those states cannot silently fall back to generic empty or all-clear language.
### Assumptions
- This spec defines the shared explanation layer and reference implementations, not the final rollout to every governance surface in one shipment.
- Existing outcome taxonomy, reason-code translation, and artifact-truth foundations remain the semantic source of truth that this feature consumes.
- Diagnostics remain important for support and audit, but normal operators should not need them for first-pass interpretation.
- This feature may reuse existing surfaces and components, but it does not require a full redesign of every governance page.
## UI Action Matrix *(mandatory when Filament is changed)*
| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Baseline Compare | Existing tenant-scoped baseline compare surface | Existing `Compare now` remains | Not applicable | None | None | Existing empty or blocked CTA remains, but explanation must distinguish true no-data from blocked or incomplete states | Existing run or related-record actions remain | Not applicable | Yes, via existing run flow | No new dangerous action is introduced. The change is explanation hierarchy and count meaning. |
| Governance run detail | Existing Monitoring run detail surface | Existing related-record actions remain | Not applicable | Not applicable | Not applicable | Not applicable | Existing related navigation remains | Not applicable | Yes, via existing run and audit semantics | No new mutations. This spec changes what is shown first and how diagnostics are demoted. |
| Governance artifact detail surfaces | Existing baseline, evidence, review, and related detail pages | Existing domain-specific actions remain | Existing inspect affordances remain | Existing row actions remain unchanged | Existing bulk behavior unchanged | Existing empty-state CTA remains where already defined | Existing detail-header actions remain | Existing save and cancel unchanged where edit forms already exist | Existing audit behavior unchanged | Explanation sections are added or re-ordered; action topology is unchanged in this spec. |
### Key Entities *(include if feature involves data)*
- **Operator Explanation Pattern**: A reusable interpretation pattern that turns an internal state family into operator-readable meaning, trust guidance, and next action.
- **Governance Result**: Any governance-facing outcome whose execution state, evaluation meaning, and trustworthiness can diverge, including compare results, capture results, review outputs, and evidence-derived outputs.
- **Diagnostic Context**: The secondary technical detail layer containing raw reason codes, JSON payloads, low-level counters, or support facts that remain available but not dominant.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: On Baseline Compare, Monitoring → Run Detail for governance operations, Baseline Snapshot result surfaces, and Tenant Review detail, an operator can determine from default-visible content whether the result is trustworthy, limited, or unusable without opening diagnostics.
- **SC-002**: On Baseline Compare and Baseline Snapshot result surfaces, cases with incomplete evaluation, suppressed output, or missing-input blocks no longer allow empty-looking counts or `0 findings` summaries to read as implicit all-clear.
- **SC-003**: The same cause class renders the same primary explanation structure and same next-step category across Baseline Compare, Baseline Snapshot result presentation, Tenant Review detail, and Review Register list rows.
- **SC-004**: On Baseline Compare, Monitoring → Run Detail for governance operations, and Baseline Snapshot detail, the default-visible section order is: primary explanation, trustworthiness statement, dominant cause summary, and next action, before any diagnostics panels, raw JSON blocks, or low-level metadata sections.
- **SC-005**: The explanation layer is reusable enough that Tenant Review detail and Review Register can adopt the same pattern without inventing new primary terminology for degraded or suppressed states.