TenantAtlas/specs/201-enforcement-review-guardrails/spec.md

267 lines
28 KiB
Markdown

# Feature Specification: Enforcement & Review Guardrails
**Feature Branch**: `[201-enforcement-review-guardrails]`
**Created**: 2026-04-18
**Status**: Proposed
**Input**: User description: "Spec 201 - operationalize the UI/UX constitution rules from Spec 200 into repeatable review, repository, test, exception, and workflow guardrails."
## Spec Candidate Check *(mandatory - SPEC-GATE-001)*
- **Problem**: Spec 200 now defines the UI and surface rules, but the repo still lacks repeatable day-to-day mechanisms that force reviewers and authors to detect new drift early, classify legitimate exceptions, and keep shared-family or state-layer mistakes from re-entering under delivery pressure.
- **Today's failure**: The team can agree with the constitution in principle, yet fake-native controls, hidden exceptions, host drift, and shell or page or detail state confusion can still merge because there is no mandatory review language, no shared signal catalog, and no explicit workflow contract for when extra tests or exception notes are required.
- **User-visible improvement**: Operators keep a more consistent, trustworthy admin product over time because future UI and surface work is checked against explicit guardrails before drift becomes another cleanup project.
- **Smallest enterprise-capable version**: Define a bounded guardrail catalog, a mandatory review checklist, a repository signal catalog, a test-guardrail matrix, an exception workflow, and spec-flow integration that operationalize Spec 200 without doing another cleanup sweep or inventing heavyweight tooling.
- **Explicit non-goals**: No runtime cleanup of existing surfaces, no new product features, no broad refactor of page or shell logic, no separate governance process outside the existing spec workflow, and no attempt to turn every design judgment into a hard technical gate.
- **Permanent complexity imported**: A finite guardrail vocabulary, review outcome classes, technical signal classes, exception documentation requirements, and close-out expectations for UI and surface work.
- **Why now**: Spec 200 is the normative rule set now. If guardrails do not follow immediately, the same drift classes will return while adjacent specs are still being implemented.
- **Why not local**: Ad-hoc PR comments or personal grep habits cannot create consistent review outcomes, visible exception boundaries, or reusable test-trigger rules across the repo.
- **Approval class**: Cleanup
- **Red flags triggered**: Foundation-sounding governance work and new classification vocabulary. Defense: the scope is explicitly limited to operationalizing existing Spec 200 rules, keeps enforcement proportional, and avoids adding a second parallel framework or mandatory toolchain.
- **Score**: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexität: 1 | Produktnähe: 1 | Wiederverwendung: 2 | **Gesamt: 10/12**
- **Decision**: approve
## Spec Scope Fields *(mandatory)*
- **Scope**: workspace
- **Primary Routes**: No end-user HTTP routes are changed. The affected surfaces are repository-owned review artifacts, spec and plan and task expectations, close-out notes, and any lightweight repository checks that expose UI and surface drift before merge.
- **Data Ownership**: Workspace-owned governance artifacts only. No tenant-owned tables, runtime records, or new persisted product-truth entities are introduced.
- **RBAC**: No runtime authorization behavior changes. The affected actors are contributors and reviewers working on operator-facing surfaces inside the existing spec-driven workflow.
## Proportionality Review *(mandatory when structural complexity is introduced)*
- **New source of truth?**: no, this spec operationalizes the existing rule truth from Spec 200 rather than creating a second normative source
- **New persisted entity/table/artifact?**: yes, but only repository-owned artifacts such as guardrail catalogs, review checklists, and close-out guidance
- **New abstraction?**: yes, a bounded guardrail taxonomy that classifies review signals, repository signals, test triggers, and documented exceptions
- **New enum/state/reason family?**: yes, because guardrail classes and review outcome classes become named categories used across future UI and surface work
- **New cross-domain UI framework/taxonomy?**: yes, but only to translate already-approved surface rules into one finite operational model
- **Current operator problem**: The product already knows what good surface behavior should look like, but the delivery workflow still lacks a reliable way to stop new fake-native patterns, hidden exceptions, shared-family drift, and state-layer confusion before they ship.
- **Existing structure is insufficient because**: Spec 200 defines the rules, but it does not yet say which questions are mandatory in review, which patterns are technically signalable, which surface classes must trigger extra tests, or how legitimate exceptions remain visible and bounded.
- **Narrowest correct implementation**: Add one operational guardrail layer that maps the existing constitution to review, repository, test, exception, and workflow expectations. Do not create cleanup implementation, hard CI enforcement for every case, or a parallel approval bureaucracy.
- **Ownership cost**: Maintainers must keep the guardrail catalog aligned with Spec 200, preserve the shared vocabulary in future specs and PRs, and maintain any lightweight signal definitions or close-out expectations that result from this work.
- **Alternative intentionally rejected**: Rely on tribal review knowledge, add only local PR templates, or attempt a fully automated hard-gate system before the repo has a bounded and reviewable guardrail model.
- **Release truth**: current-release workflow truth needed now so Spec 200 remains effective in daily implementation and review
## Problem Statement
The audit trail behind Specs 196 through 200 exposed the same failure pattern repeatedly: the repo can name the drift, but it cannot yet stop it early enough.
The known problem classes are already concrete:
- fake-native controls and GET-form interactions inside Filament surfaces
- Blade-request-driven UI state in page bodies and adjacent UI surfaces
- hand-rolled simple overviews where standard primitives should remain primary
- shared detail families that drift by host-specific fork instead of one shared contract
- page-state, shell-state, and detail-state ownership collapsing into each other
- legitimate custom exceptions that remain invisible and therefore expand quietly
Without operational guardrails, the constitution is vulnerable to the usual failure mode: agreement in principle, drift in practice, then another cleanup wave later.
## Dependencies
- Depends on Spec 200 - Filament Surface Rules as the normative rule source this spec operationalizes.
- Uses the proven drift cases from Spec 196 - Hard Filament Nativity Cleanup as the clearest examples of fake-native and request-driven UI problems.
- Uses the shared-family and host-drift lessons from Spec 197 - Shared Detail Contract.
- Uses the page-state ownership lessons from Spec 198 - Monitoring Page State.
- Uses the shell and context ownership lessons from Spec 199 - Global Context Shell Contract.
- Does not replace or reopen the cleanup and implementation scope of any adjacent spec.
## Goals
- Translate the Spec 200 rule set into repeatable review guardrails, repository signals, test triggers, exception handling, and workflow expectations.
- Make the team distinguish clearly between what is review-only, what is technically signalable, what requires extra testing, and what is allowed only through visible exception handling.
- Make legitimate exceptions visible, bounded, and reviewable instead of accidental precedent.
- Keep the guardrails proportionate so they help contributors before merge without creating tool-driven theater.
- Integrate the guardrails into the existing spec-driven workflow rather than creating a competing process.
## Non-Goals
- Cleaning up existing UI and surface violations as part of this spec.
- Building heavy custom linting, CI policy, or repo tooling beyond what is needed for lightweight early warning.
- Forcing hard technical gates on design judgments that only human review can assess reliably.
- Inventing new UI rules beyond Spec 200.
- Turning legitimate custom surfaces into forbidden territory.
## Assumptions
- Spec 200 remains the normative rule source, and Spec 201 only operationalizes it.
- Some design decisions will remain structured human judgment even after this work is complete.
- Legitimate custom surfaces, special visualizations, and bounded local state will continue to exist, so the exception path must be first-class rather than grudging.
- The existing spec and plan and task and close-out workflow remains the primary delivery process and should absorb these guardrails directly.
## Guardrail Classification Model
### Guardrail Classes
- **Hard Technical Signals**: reliable warning patterns that can expose likely drift early
- **Structured Review Signals**: questions that must be answered in review even when no technical pattern can decide the case alone
- **Required Test Signals**: surface classes that must trigger additional tests or smoke expectations
- **Exception Signals**: cases where deviation can be legitimate only if it is visible, bounded, and justified
### Handling Modes
- **Hard-stop candidate**: drift pattern that may become blocking once the repo has an approved exception path
- **Review-mandatory**: pattern or surface type that always needs explicit human classification
- **Exception-required**: case that is allowed only with a documented exception model
- **Report-only**: lightweight visibility signal that supports trend review without default blocking
### Review Outcome Classes
- **Blocker**: the change cannot proceed until the guardrail issue is corrected or explicitly split
- **Strong warning**: the change may proceed only if the remaining guardrail risk is acknowledged and resolved in the active workflow
- **Documentation-required exception**: the change is acceptable only once the named exception path is completed and bounded
- **Acceptable special case**: the change remains legitimate without additional guardrail escalation beyond ordinary documentation
## Testing / Lane / Runtime Impact *(mandatory for runtime behavior changes)*
- **Test purpose / classification**: N/A
- **Validation lane(s)**: N/A
- **Why this classification and these lanes are sufficient**: This feature changes review and specification artifacts, not runtime behavior. Validation is document-based and checks completeness, traceability, clarity, and workflow fit.
- **New or expanded test families**: none
- **Fixture / helper cost impact**: none
- **Heavy-family visibility / justification**: none
- **Reviewer handoff**: Reviewers must confirm that each major Spec 200 rule family maps to explicit guardrail behavior, that no second governance framework is introduced, and that the examples remain grounded in the real drift cases from Specs 196 through 200.
- **Budget / baseline / trend impact**: none
- **Escalation needed**: none
- **Planned validation commands**: N/A
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Reviewers Classify Drift Early (Priority: P1)
As a reviewer, I want a mandatory UI and surface checklist so I can detect fake-native patterns, host drift, hidden exceptions, and shell or page or detail state confusion before merge.
**Why this priority**: Review is the earliest practical control point for most new surface work. If the checklist cannot classify obvious drift early, the rest of the guardrail model will be too weak to matter.
**Independent Test**: Can be fully tested by walking a reviewer through a fake-native case, a host-drift case, and a shell or page or detail state confusion case and confirming that the reviewer can classify each one using only the documented checklist, review outcome classes, and handling modes.
**Acceptance Scenarios**:
1. **Given** a Filament-looking surface that introduces plain controls or a GET-form interaction for a core page-body behavior, **When** the reviewer applies the guardrail checklist, **Then** the reviewer can classify the change as a blocker or strong warning with an explicit handling mode rather than a neutral custom implementation.
2. **Given** a repeated shared detail surface starts diverging by host-specific variation, **When** the reviewer applies the guardrails, **Then** the reviewer can identify whether the change belongs inside the shared family contract or must be treated as a documented exception.
3. **Given** a page mixes shell context, page interaction state, and detail viewer state without clear ownership, **When** the reviewer applies the checklist, **Then** the reviewer can name the state-layer problem explicitly and stop the change from being treated as harmless implementation detail.
---
### User Story 2 - Authors Declare Native Vs Custom Up Front (Priority: P1)
As an author planning new UI or surface work, I want the spec workflow to force native versus custom classification, state-layer classification, shared-family relevance, and exception need so I can design the surface correctly before implementation starts.
**Why this priority**: The guardrails only prevent drift if they shape work before code review. Up-front classification keeps the repo from discovering surface-contract problems only after implementation is already expensive to unwind.
**Independent Test**: Can be fully tested by drafting a new UI or surface spec and confirming that the planning artifacts capture the required classifications, expected test depth, and exception need without adding a second workflow.
**Acceptance Scenarios**:
1. **Given** a new complex surface is proposed, **When** the author fills the relevant spec and plan and task fields, **Then** the work item records whether the surface is native, custom, shared-family relevant, or exception-bound and which state layers are involved.
2. **Given** a legitimate special visualization or bounded local-state surface is needed, **When** the author declares the exception, **Then** the planning artifacts capture why default primitives do not fit, how the exception remains bounded, and which standardized parts still stay intact.
---
### User Story 3 - Maintainers Get Proportionate Signals And Test Triggers (Priority: P2)
As a maintainer, I want a catalog of repository signals and required test triggers so likely drift becomes visible in daily work without making every custom surface a false-positive trap or a hard CI failure.
**Why this priority**: Repository-level visibility is the bridge between pure policy and daily implementation. Without signal and test guidance, drift will remain review-dependent and inconsistent.
**Independent Test**: Can be fully tested by mapping known drift patterns and surface classes to the catalog and confirming that every signal has a handling mode and every relevant surface class has a declared test depth.
**Acceptance Scenarios**:
1. **Given** a repository signal matches a likely fake-native or shell-resolution pattern, **When** the maintainer consults the catalog, **Then** the handling mode is explicit as report-only, review-mandatory, exception-required, or hard-stop candidate.
2. **Given** a new shared detail family, monitoring page with its own state contract, context shell touchpoint, or exception-coded surface is introduced, **When** the work is prepared for merge, **Then** the required tests and smoke expectations are visible and standard native surfaces are not burdened with unnecessary bespoke coverage.
### Edge Cases
- A technical signal flags a legitimate custom surface; the exception path must prevent noisy permanent false positives.
- A one-off exception starts spreading to additional hosts or surfaces; the guardrails must require renewed review instead of treating the first approval as general precedent.
- A standard native Filament surface changes without introducing a new state contract; the guardrails must avoid demanding special smoke coverage that adds friction without value.
- Spec 200 evolves later; the guardrail catalog must remain explicitly traceable to the rule it operationalizes so the workflow does not drift from the constitution.
## Requirements *(mandatory)*
**Constitution alignment:** This feature adds no product runtime behavior, no Microsoft Graph behavior, no new authorization plane, and no new operator-facing page. It does add repository and workflow guardrails for operator-facing UI and surface work, so the resulting review vocabulary, signal catalog, exception model, and close-out expectations must remain explicit, bounded, and traceable to Spec 200.
### Functional Requirements
- **FR-001 Guardrail Catalog**: The implementation MUST map every relevant Spec 200 rule family to one or more of the following operational classes: review guardrail, repository signal, required test signal, exception signal, or workflow integration requirement.
- **FR-002 Mandatory Review Checklist**: The implementation MUST provide one binding review checklist for UI and surface work that reviewers can apply without inventing local terminology.
- **FR-003 Checklist Questions**: The review checklist MUST require explicit answers for at least these questions: whether the surface is native-by-default or legitimately custom, whether expected native primitives are used, whether any deviation is an explicit exception, whether the surface belongs to a shared detail family or one-off host, whether host drift exists or is emerging, which state layers are present, how URL or query state is classified, whether competing interaction models exist, whether the surface is a simple overview or a true special visualization, and whether an existing exception is being expanded quietly.
- **FR-004 Review Outcomes**: The review checklist MUST classify findings as blocker, strong warning, documentation-required exception, or acceptable special case.
- **FR-005 Merge Readiness Rule**: New complex surfaces MUST NOT be treated as ready for merge until their guardrail classification is explicit in review and associated planning artifacts.
- **FR-006 Repository Signal Catalog**: The implementation MUST define repository-level warning signals for the main fake-native and drift-prone patterns, including GET-form interactions inside Filament surfaces, plain controls masquerading as native controls, request-driven UI state near page-body surfaces, hand-built simple overviews where standard primitives should remain primary, host-specific forks of known shared families, and shell-context resolution logic leaking into presentation partials.
- **FR-007 Signal Handling Modes**: Every repository signal MUST declare whether it is report-only, review-mandatory, exception-required, or a hard-stop candidate, and whether the signal is eligible for later promotion to blocking.
- **FR-008 False-Positive Control**: Repository guardrails MUST include an explicit exception or review path so legitimate custom surfaces do not become permanently noisy false positives.
- **FR-009 Test Guardrail Matrix**: The implementation MUST define which special surface classes require extra test depth, including shared detail micro-UI families, monitoring or governance pages with their own state contracts, global context shell surfaces, and legitimate exception surfaces with intentional special contracts.
- **FR-010 Required Test Types**: For each special surface class, the guardrail model MUST state whether functional core interaction tests, state-contract tests, exception or fallback behavior tests, and manual smoke expectations are required.
- **FR-011 Standard Surface Relief**: The guardrail model MUST state when standard native Filament surfaces do not need extra special tests beyond normal feature coverage.
- **FR-012 Exception Model**: Every legitimate exception MUST document which default rule it breaks or does not fully satisfy, why native or default behavior is insufficient, how the exception remains bounded, which parts remain standardized, and which follow-on risks are consciously accepted.
- **FR-013 Exception Spread Control**: Exceptions MUST NOT extend silently to additional hosts or surfaces; any expansion requires renewed explicit review.
- **FR-014 Workflow Integration**: The implementation MUST integrate guardrail expectations into spec creation, planning, tasks, implementation review, definition-of-done checks, and follow-up exception documentation for UI and surface-relevant work.
- **FR-015 Planning Visibility**: UI and surface-relevant specs MUST capture native versus custom classification, state-layer classification, shared-family relevance, and exception need in a way that is visible before implementation begins.
- **FR-016 Close-Out Visibility**: Completion notes for relevant work MUST record which guardrail class was triggered, whether an exception was required, and which tests or smoke checks were added or intentionally not needed in the active feature PR close-out entry.
- **FR-017 Guardrail Matrix**: The implementation MUST publish a rule matrix that distinguishes hard-stop candidates, review-mandatory cases, exception-required cases, and report-only cases with representative examples from Specs 196 through 200.
- **FR-018 Deliverable Set**: The implementation MUST produce a guardrail catalog, a UI and surface review checklist, a technical signal catalog, a test-guardrail catalog, an exception workflow, workflow integration notes, and close-out documentation describing what remains review-only versus technically signalable.
### Non-Functional Requirements
- **NFR-001 Early Warning**: The guardrails must expose likely drift early enough that review can stop it before another cleanup pass becomes necessary.
- **NFR-002 Proportionality**: Only patterns that are reliably observable should become repository signals or future hard-stop candidates; judgment-heavy questions must remain structured review instead of fake precision.
- **NFR-003 Workflow Fit**: The guardrails must strengthen the existing spec-driven workflow instead of creating a second process that contributors must learn separately, keep the low-impact path completable in under 1 minute, keep a representative guarded review completable in under 3 minutes, and avoid duplicating the same classification question unnecessarily across spec, plan, task, review, and close-out surfaces.
- **NFR-004 Traceability**: Every guardrail class and example must remain traceable to the governing rules from Spec 200 and its supporting Specs 196 through 199.
### Key Entities *(include if feature involves data)*
- **Guardrail Catalog**: The authoritative operational mapping from Spec 200 rules to review expectations, repository signals, test triggers, exception handling, and workflow touchpoints.
- **Review Checklist**: The binding question set that reviewers and authors use to classify UI and surface work consistently.
- **Repository Signal**: A documented red-flag pattern that can expose likely drift and carries a defined handling mode.
- **Test Guardrail Profile**: The required test and smoke depth attached to a surface class when its contract is more complex than standard native surfaces.
- **Exception Record**: The visible justification that bounds a legitimate deviation from the default surface rules and prevents silent expansion.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: Every targeted drift class from Spec 200 has at least one documented mapping to review guardrails, repository signals, required test handling, or exception handling.
- **SC-002**: Reviewers can classify the representative cases of fake-native dependency edges, legitimate special visualization, shared-family host drift, and shell or page or detail state confusion into the defined review outcome classes and handling modes without inventing new categories.
- **SC-003**: All UI and surface-relevant work items created after rollout include explicit native or custom classification, state-layer classification, shared-family relevance, and exception need in their planning artifacts.
- **SC-004**: Every documented exception includes all required justification fields and an explicit boundary that prevents silent reuse as general precedent.
- **SC-005**: Every repository signal in the catalog has an assigned handling mode and at least one documented review or exception path, so no signal remains an ownerless warning.
- **SC-006**: Reviewers can determine in one pass which special surface classes require additional tests and which standard native surfaces do not.
- **SC-007**: A low-impact docs-only workflow path remains completable in under 1 minute, a representative guarded review remains completable in under 3 minutes, and neither path requires contributors to answer the same classification question redundantly across workflow surfaces.
## Validation Notes
### Reviewer Workflow Validation
| Scenario | Source artifact | Outcome class | Handling mode | Workflow outcome | Notes |
|---|---|---|---|---|---|
| Low-impact docs-only path | `.specify/templates/checklist-template.md` + `.specify/README.md` | `acceptable-special-case` | `report-only` | `keep` | One `N/A` note stayed sufficient; no fake UI-surface prose was required |
| Fake-native hard signal | `specs/196-hard-filament-nativity-cleanup/spec.md` | `blocker` | `hard-stop-candidate` | `reject-or-split` | Fake-native drift still reaches the strongest guardrail outcome cleanly |
| Shared-family host drift | `specs/197-shared-detail-contract/spec.md` | `strong-warning` | `review-mandatory` | `document-in-feature` | The workflow stops host drift without inventing new categories |
| State-layer confusion | `specs/198-monitoring-page-state/spec.md` + `specs/199-global-context-shell-contract/spec.md` | `strong-warning` | `review-mandatory` | `document-in-feature` | Shell/page/detail ownership now resolves through one fixed question set |
| Legitimate special visualization | `specs/200-filament-surface-rules/spec.md` | `documentation-required-exception` | `exception-required` | `document-in-feature` | Legitimate special cases remain allowed only with bounded exception notes |
- Representative guarded review elapsed time: `02:34`
- Duplicate-question note: the final workflow asks for native/custom, shared-family, and state ownership once in the spec, then reuses that classification in plan, tasks, checklist, and close-out.
### Authoring Workflow Validation
| Scenario | Source artifact | Outcome | Elapsed time | Notes |
|---|---|---|---|---|
| Low-impact docs-only flow | `.specify/templates/checklist-template.md` + `.specify/README.md` | `keep` | `00:48` | Low-impact `N/A` remains fast and does not fabricate runtime obligations |
| Surface-changing authoring flow | `specs/200-filament-surface-rules/spec.md` | `document-in-feature` | `01:51` | Native/custom classification, handling modes, and proof depth stay explicit without adding a second workflow |
### Signal And Test-Trigger Validation
| Drift / surface class | Handling mode | Required proof profile | Close-out expectation |
|---|---|---|---|
| Fake-native hard signal | `hard-stop-candidate` | Special proof only if a bounded exception is claimed | Close-out notes are required only when an exception is still in play |
| Shared-family host drift | `review-mandatory` | `shared-detail-family` | Record host/core boundary and any exception spread control |
| Monitoring or shell state-layer confusion | `review-mandatory` | `monitoring-state-page` or `global-context-shell` | Record state-owner proof and any required smoke |
| Legitimate special visualization | `exception-required` | `exception-coded-surface` | Record bounded exception, preserved standards, and manual smoke |
| Standard native Filament surface | `report-only` | `standard-native-filament` relief | Record ordinary feature coverage only; no bespoke guardrail proof required |
- Standard-native relief rule: standard native Filament work does not need extra bespoke guardrail tests unless it introduces a shared-family contract, shell/page/detail state ownership risk, or a bounded exception.
- Active feature PR close-out entry name: `Guardrail / Exception / Smoke Coverage`
- First-pass automation deferrals remain explicit: report-first signals only, no CI hard-stop, no PR bot, no auto-promotion of review-mandatory cases.