TenantAtlas/specs/220-governance-run-summaries/research.md
Ahmed Darrazi c6cc58e1f3
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 43s
feat: add governance run summaries
2026-04-20 22:43:30 +02:00

49 lines
6.2 KiB
Markdown

# Research: Humanized Diagnostic Summaries for Governance Operations
## Decision 1: Keep canonical governance run detail on the existing Monitoring viewer and detail resource
- **Decision**: Reuse `OperationRunResource` and `TenantlessOperationRunViewer` as the single canonical run-detail surface for Spec 220 instead of creating a new governance-only viewer.
- **Rationale**: The repo already routes canonical Monitoring run detail through these seams and already has the right RBAC, action-surface, and navigation guardrails in place. The problem is explanation order and summary quality, not missing routing or missing surface ownership.
- **Alternatives considered**:
- Create a second governance-specific run-detail page. Rejected because it would duplicate route ownership, action hierarchy, and authorization semantics for one existing surface.
- Add page-local partials only in the Blade template. Rejected because the run-detail summary needs stable derivation rules, not just another rendering layer.
## Decision 2: Treat `ArtifactTruthPresenter`, `OperatorExplanationBuilder`, and `ReasonPresenter` as the canonical semantic inputs
- **Decision**: Build the new summary from the existing `ArtifactTruthEnvelope`, `OperatorExplanationPattern`, and reason-translation envelopes instead of introducing a second semantic source.
- **Rationale**: The repo already derives artifact truth and operator explanation for `OperationRun` records, including governance families like `baseline.capture`, `baseline.compare`, `tenant.evidence.snapshot.generate`, `tenant.review.compose`, and `tenant.review_pack.generate`. Reusing that chain preserves existing truth ownership and keeps the new work downstream and bounded.
- **Alternatives considered**:
- Add a new persisted summary state to `operation_runs`. Rejected because the desired summary is fully derivable from current persisted truth and would create drift risk.
- Put all summary logic directly inside `OperationRunResource`. Rejected because it would bury operation-family rules inside Filament schema code and make tests brittle.
## Decision 3: Add one bounded `GovernanceRunDiagnosticSummary` seam only if affected-scale and dominant-cause rules cannot stay in the current presenter flow
- **Decision**: If the current detail seams cannot cleanly express dominant cause, affected scale, and secondary-cause breakdown, add one small value object plus builder under `App\Support\OpsUx` and expose it through `OperationUxPresenter`.
- **Rationale**: Spec 220 needs more than current badges and explanation labels. It needs one stable first-pass summary, especially for multi-cause degraded runs and all-zero runs. A small run-detail-specific helper is justified because the work is limited to one existing surface and several real operation families already consume the same route.
- **Alternatives considered**:
- Extend `ArtifactTruthPresenter` to own all run-detail ranking logic. Rejected because artifact truth is broader than this one run-detail question and should remain canonical truth, not surface-specific emphasis logic.
- Build a generic cross-product explanation framework. Rejected because the spec is explicitly scoped to canonical governance run detail.
## Decision 4: Derive affected-scale cues from existing `summary_counts`, run context, and related artifact truth
- **Decision**: Affected scale must come from existing persisted signals such as `summary_counts`, known run-context payloads, failure summaries, and related artifact summaries. No schema change or count-contract expansion is planned.
- **Rationale**: Covered operation families already persist enough context to support statements like ambiguous subject matches, missing sections, partial evidence dimensions, or zero captured subjects. The missing work is ranking and presenting those signals consistently.
- **Alternatives considered**:
- Add new operation-specific summary fields or nested count structures. Rejected because Ops-UX already constrains `summary_counts` to flat numeric keys, and the feature does not need new persistence.
- Omit affected-scale cues entirely. Rejected because the spec explicitly requires the page to explain what was affected, not just why it failed.
## Decision 5: Keep banners specialized and let the decision zone own the dominant explanation
- **Decision**: Existing canonical context, lifecycle, blocked-execution, and restore-continuation banners remain specialized. The main humanized summary must live in the decision zone so the page does not duplicate dominant-cause copy.
- **Rationale**: The current run detail already has banner-level messaging. Adding another banner or repeating the same explanation in two places would increase attention load instead of reducing it. The summary should become the first read inside the decision zone, with banners reserved for scope, stale lifecycle, and special restore continuity contexts.
- **Alternatives considered**:
- Add a new top-of-page summary banner. Rejected because it would compete with existing lifecycle and context banners.
- Remove existing banners entirely. Rejected because they already communicate valid scope or lifecycle information outside the core diagnostic summary.
## Decision 6: Extend current Monitoring and authorization suites and keep multi-cause fixtures local first
- **Decision**: Reuse existing Monitoring, Filament, and authorization suites; add one new focused `GovernanceOperationRunSummariesTest` plus one narrow unit seam if a builder is introduced. Keep multi-cause fixture builders local to the Monitoring suite unless another consumer emerges.
- **Rationale**: The repo already has substantial run-detail coverage, including hierarchy assertions, artifact-truth rendering, and `404` vs `403` semantics. The main gaps are multi-cause degraded runs, all-zero runs, and cross-family consistency. Those gaps can be covered without creating a new heavy or browser test family.
- **Alternatives considered**:
- Rely mainly on browser tests. Rejected because the current feature is better proven through existing Livewire and feature suites.
- Move multi-cause builders into shared fixture concerns immediately. Rejected because only Spec 220 currently needs those seeds and shared defaults would be risky.