docs: add Spec 212 test authoring guardrails #245

Merged
ahmido merged 1 commits from 212-test-authoring-guardrails into dev 2026-04-18 10:08:01 +00:00
17 changed files with 1713 additions and 64 deletions

View File

@ -200,6 +200,8 @@ ## Active Technologies
- SQLite `:memory:` for default lane execution, filesystem artifacts under the app-root contract path `storage/logs/test-lanes`, checked-in workflow YAML under `.gitea/workflows/`, and no new product database persistence (210-ci-matrix-budget-enforcement)
- PHP 8.4.15 for repo-truth governance logic, Bash for repo-root wrappers, GitHub-compatible Gitea Actions workflow YAML under `.gitea/workflows/`, plus JSON Schema and logical OpenAPI for repository contracts + Laravel 12, Pest v4, PHPUnit 12, Filament v5, Livewire v4, Laravel Sail, Gitea Actions backed by `act_runner`, uploaded artifact bundles, and the existing `Tests\Support\TestLaneManifest`, `TestLaneBudget`, and `TestLaneReport` seams (211-runtime-trend-recalibration)
- SQLite `:memory:` for lane execution, filesystem artifacts under `apps/platform/storage/logs/test-lanes`, staged CI bundles under `.gitea-artifacts/<workflow-profile>`, bounded derived trend/history artifacts adjacent to current lane artifacts, and no new product database persistence (211-runtime-trend-recalibration)
- Markdown for repository governance artifacts, JSON Schema plus logical OpenAPI for planning contracts, and Bash-backed SpecKit scripts already present in the repo + `.specify/memory/constitution.md`, `.specify/templates/spec-template.md`, `.specify/templates/plan-template.md`, `.specify/templates/tasks-template.md`, `.specify/templates/checklist-template.md`, `.specify/README.md`, `README.md`, and the existing Specs 206 through 211 governance vocabulary (212-test-authoring-guardrails)
- Repository-owned markdown and contract artifacts under `.specify/`, `specs/212-test-authoring-guardrails/`, and root documentation files; no product database persistence (212-test-authoring-guardrails)
- PHP 8.4.15 (feat/005-bulk-operations)
@ -234,8 +236,8 @@ ## Code Style
PHP 8.4.15: Follow standard conventions
## Recent Changes
- 212-test-authoring-guardrails: Added Markdown for repository governance artifacts, JSON Schema plus logical OpenAPI for planning contracts, and Bash-backed SpecKit scripts already present in the repo + `.specify/memory/constitution.md`, `.specify/templates/spec-template.md`, `.specify/templates/plan-template.md`, `.specify/templates/tasks-template.md`, `.specify/templates/checklist-template.md`, `.specify/README.md`, `README.md`, and the existing Specs 206 through 211 governance vocabulary
- 211-runtime-trend-recalibration: Added PHP 8.4.15 for repo-truth governance logic, Bash for repo-root wrappers, GitHub-compatible Gitea Actions workflow YAML under `.gitea/workflows/`, plus JSON Schema and logical OpenAPI for repository contracts + Laravel 12, Pest v4, PHPUnit 12, Filament v5, Livewire v4, Laravel Sail, Gitea Actions backed by `act_runner`, uploaded artifact bundles, and the existing `Tests\Support\TestLaneManifest`, `TestLaneBudget`, and `TestLaneReport` seams
- 210-ci-matrix-budget-enforcement: Added PHP 8.4.15 for repo-truth test governance, Bash for repo-root wrappers, and GitHub-compatible Gitea Actions workflow YAML under `.gitea/workflows/` + Laravel 12, Pest v4, PHPUnit 12, Filament v5, Livewire v4, Laravel Sail, Gitea Actions backed by `act_runner`, and the existing `Tests\Support\TestLaneManifest`, `TestLaneBudget`, and `TestLaneReport` seams
- 209-heavy-governance-cost: Added PHP 8.4.15 + Laravel 12, Pest v4, PHPUnit 12, Filament v5, Livewire v4, Laravel Sail
<!-- MANUAL ADDITIONS START -->
<!-- MANUAL ADDITIONS END -->

View File

@ -10,6 +10,26 @@ ## Important
- `plan.md`
- `tasks.md`
- `checklists/requirements.md`
- Runtime-changing work MUST carry testing/lane/runtime impact through the active `spec.md`, `plan.md`, and `tasks.md`; lane upkeep belongs to the feature, not to a later cleanup pass.
- Runtime-changing or test-affecting work MUST carry actual test-purpose classification (`Unit`, `Feature`, `Heavy-Governance`, `Browser`), affected lanes, fixture/default cost risks, heavy-family changes, escalation decisions, and minimal validation commands through the active `spec.md`, `plan.md`, and `tasks.md`.
- Review-oriented checklists MUST surface lane fit, hidden defaults, heavy-family visibility, and runtime-budget follow-up before merge; lane upkeep belongs to the feature, not to a later cleanup pass.
## Review Entry Point
Use the active feature's `spec.md`, `plan.md`, and `tasks.md` together with the generated checklist based on `.specify/templates/checklist-template.md`.
1. Confirm the spec names the affected validation lane(s) or a deliberate `N/A`, the test family impact, setup-cost impact, reviewer handoff, and any escalation outcome.
2. Confirm the plan turns that into changed test types, narrowest proving commands, helper/default widening checks, and the note target for budget or trend drift.
3. Apply the checklist and end with one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
## Low-Impact Rule
- Docs-only or template-only work may answer `N/A` or `none`.
- Do not force fake lane prose when no runtime or suite impact exists.
## Escalation Rule
- Use `document-in-feature` for contained cost or drift that belongs in the active feature.
- Use `follow-up-spec` only for recurring pain or structural lane or family changes.
- Use `reject-or-split` when hidden test cost or wrong-lane scope is still unresolved.
The files `.specify/spec.md`, `.specify/plan.md`, `.specify/tasks.md` may exist as legacy references only.

View File

@ -1,29 +1,33 @@
<!--
Sync Impact Report
- Version change: 2.3.0 -> 2.4.0
- Version change: 2.4.0 -> 2.5.0
- Modified principles:
- Quality Gates: expanded to require narrowest-lane validation and
runtime-drift notes for runtime changes
- Governance review expectations: expanded to make lane/runtime
impact a mandatory part of spec and PR review
- Added sections:
- Test Suite Governance Must Live In The Delivery Workflow
(TEST-GOV-001)
(TEST-GOV-001): expanded into explicit test-impact disclosure,
lane discipline, minimal-fixture defaults, heavy-family visibility,
expensive-default bans, runtime-budget stewardship, review-stop
rules, and escalation triggers
- Governance review expectations: expanded to require test-purpose
classification, explicit runtime-cost review, and visible review
routine coverage in delivery artifacts
- Added sections: None
- Removed sections: None
- Templates requiring updates:
- ✅ .specify/memory/constitution.md
- ✅ .specify/templates/plan-template.md (test-governance planning and
lane-impact checks added)
- ✅ .specify/templates/spec-template.md (mandatory testing/lane/runtime
impact section added)
- ✅ .specify/templates/tasks-template.md (lane classification,
fixture-cost, and runtime-drift task guidance added)
- ✅ .specify/templates/checklist-template.md (runtime checklist note
added)
- ✅ .specify/README.md (SpecKit workflow note added for lane/runtime
ownership)
- ✅ README.md (developer routine updated for test-governance upkeep)
- ✅ .specify/templates/plan-template.md (lane-discipline and
escalation-planning checks expanded)
- ✅ .specify/templates/spec-template.md (test-purpose,
lane-discipline, heavy-family, and escalation prompts expanded)
- ✅ .specify/templates/tasks-template.md (task obligations expanded for
classification, cheap defaults, review-stop rules, and runtime
stewardship)
- ✅ .specify/templates/checklist-template.md (review checklist guidance
expanded for lane fit, heavy risk, and escalation)
- ✅ .specify/README.md (SpecKit workflow expectations expanded for
visible test-governance coverage)
- ✅ README.md (developer workflow guidance expanded for lane
discipline and runtime stewardship)
- Commands checked:
- N/A `.specify/templates/commands/*.md` directory is not present in this repo
- Follow-up TODOs: None
@ -104,10 +108,17 @@ ### Tests Must Protect Business Truth (TEST-TRUTH-001)
### Test Suite Governance Must Live In The Delivery Workflow (TEST-GOV-001)
- Test-suite governance is a standing workflow rule, not an occasional cleanup project.
- Every runtime-changing spec MUST declare the affected validation lane(s), any fixture/helper cost risk, whether it introduces or expands heavy-governance or browser coverage, and whether budget/baseline follow-up is needed.
- Plans MUST choose the narrowest lane mix that proves the change and MUST call out new heavy families, expensive defaults, or CI/runtime drift before implementation starts.
- Tasks and reviews MUST confirm lane classification, keep default fixtures cheap, reject accidental heavy promotion, and record material runtime drift or recalibration work in the active spec or PR.
- Standalone follow-up specs for test governance are reserved for recurring pain or structural lane changes; ordinary recalibration belongs inside normal delivery work.
- Every spec or implementation change that changes runtime behavior, tests, lane mix, or shared test infrastructure MUST state test impact explicitly: affected validation lane(s), actual test purpose classification (`Unit`, `Feature`, `Heavy-Governance`, `Browser`), any new or broader test family, fixture/helper/factory/seed/context cost change, and any budget, baseline, or trend follow-up.
- Docs-only, template-only, or otherwise no-runtime-impact work MAY answer the test-governance prompts with concise `N/A` or `none`, but MUST still make the absence of runtime or suite impact explicit.
- Test classification MUST follow the proving purpose of the change rather than directory names, filenames, or convenience. Fast or narrow lanes MUST NOT silently absorb discovery, surface, workflow, or browser cost that belongs in heavier governance lanes.
- Minimal fixtures and minimal infrastructure are the default. Database, Livewire, Filament, provider setup, workspace or membership context, session state, capability context, and similar expensive dependencies MUST be used only when the asserted behavior requires them.
- Heavy families MUST remain explicit in naming, lane assignment, and review rationale. New or expanded heavy-governance, discovery, surface, broad workflow, or browser families MUST NOT appear accidentally through helper drift, copied setup, or folder placement alone.
- Shared helpers, factories, seeds, and support layers MUST keep expensive context opt-in. Provider, workspace, membership, capability, session, or similar full-context defaults MUST NOT become implicit norms.
- Lane budgets, baselines, and trend reports are engineering constraints. Changes that materially worsen runtime, shift lane semantics, or create heavy cost centers MUST be validated, documented, and escalated when the impact exceeds ordinary feature-local upkeep.
- Reviews MUST stop test drift before merge. Reviewers MUST verify lane fit, test breadth, fixture cost, heavy-family risk, and runtime impact, and MUST treat unnecessary breadth, wrong classification, or hidden cost as merge blockers rather than later CI cleanup.
- Review checklists MUST end with one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`, so the decision about suite-cost risk is attributable instead of implied.
- New governance cost centers, including new heavy families, new browser coverage, material lane-cost shifts, revived expensive defaults, or budget/baseline-relevant regressions, MUST be documented explicitly. Contained feature-local cases MAY be `document-in-feature`; structural or recurring cases MUST escalate to `follow-up-spec`; unjustified scope or hidden cost MUST resolve as `reject-or-split`.
- These rules MUST stay visible in the spec, plan, task, and review routine. Test governance MUST NOT live only in CI output, wrapper scripts, or tribal knowledge.
### Enterprise Complexity Is Allowed Only Where Risk Demands It (RISK-COMP-001)
- Heavier architecture is explicitly legitimate for workspace or tenant isolation, RBAC and policy enforcement, auditability, immutable history and snapshot truth, queue/job execution legitimacy, provider credential safety, retention/compliance evidence, and operator-critical lifecycle correctness.
@ -1337,11 +1348,12 @@ ### Scope, Compliance, and Review Expectations
- This constitution applies across the repo. Feature specs may add stricter constraints but not weaker ones.
- Restore semantics changes require: spec update, checklist update (if applicable), and tests proving safety.
- Specs and PRs that introduce new persisted truth, abstractions, states, DTO/presenter layers, or taxonomies MUST include the proportionality review required by BLOAT-001.
- Runtime-changing specs and PRs MUST include testing/lane/runtime impact covering affected lanes, fixture/helper cost changes, any heavy-family expansion, expected budget/baseline effect, and the minimal validation commands.
- Runtime-changing or test-affecting specs and PRs MUST include testing/lane/runtime impact covering actual test-purpose classification, affected lanes, fixture/helper/factory/seed/context cost changes, any heavy-family expansion, expected budget/baseline/trend effect, escalation decisions, and the minimal validation commands.
- Specs, plans, task lists, and review checklists MUST surface the test-governance questions needed to catch lane drift, hidden defaults, and runtime-cost escalation before merge.
- Specs and PRs that change operator-facing surfaces MUST classify each
affected surface under DECIDE-001 and justify any new Primary
Decision Surface or workflow-first navigation change.
- Reviews MUST reject runtime changes when lane classification is missing, expensive defaults are introduced silently, or material CI/runtime drift is left undocumented.
- Reviews MUST reject runtime or test changes when lane classification is missing, fast-lane work quietly absorbs heavy cost, expensive defaults are introduced silently, or material CI/runtime drift is left undocumented.
- Review and approval MUST favor simplification, replacement, and absorption over additive semantic layering.
- Future-release preparation alone is not sufficient justification for new persistence or frameworkization unless security, tenant isolation, auditability, compliance evidence, or queue correctness already require it.
@ -1355,4 +1367,4 @@ ### Versioning Policy (SemVer)
- **MINOR**: new principle/section or materially expanded guidance.
- **MAJOR**: removing/redefining principles in a backward-incompatible way.
**Version**: 2.4.0 | **Ratified**: 2026-01-03 | **Last Amended**: 2026-04-17
**Version**: 2.5.0 | **Ratified**: 2026-01-03 | **Last Amended**: 2026-04-18

View File

@ -5,37 +5,39 @@ # [CHECKLIST TYPE] Checklist: [FEATURE NAME]
**Feature**: [Link to spec.md or relevant documentation]
**Note**: This checklist is generated by the `/speckit.checklist` command based on feature context and requirements.
If the checklist covers runtime behavior changes, include lane classification, fixture-cost review, heavy-family justification, minimal validation commands, and any budget/baseline follow-up checks.
If the checklist covers runtime behavior or test-surface changes, use it to reach one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
Low-impact docs-only or template-only work may mark runtime-only checks `N/A`, but should still leave one explicit outcome.
<!--
============================================================================
IMPORTANT: The checklist items below are SAMPLE ITEMS for illustration only.
## Lane Fit
The /speckit.checklist command MUST replace these with actual items based on:
- User's specific checklist request
- Feature requirements from spec.md
- Technical context from plan.md
- Implementation details from tasks.md
- [ ] CHK001 The chosen validation lane is the narrowest lane or lane mix that proves the change.
- [ ] CHK002 The test stays in the smallest honest family (`Unit`, `Feature`, `Heavy-Governance`, `Browser`) and does not hide broader purpose behind a narrow label.
DO NOT keep these sample items in the generated checklist file.
============================================================================
-->
## Breadth And Cost
## [Category 1]
- [ ] CHK003 The changed or added test is no broader than the behavior it proves.
- [ ] CHK004 Any database, Livewire, Filament, or browser surface is justified over a narrower alternative.
- [ ] CHK005 Shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default; any widening is explicit and locally justified.
- [ ] CHK001 First checklist item with clear action
- [ ] CHK002 Second checklist item
- [ ] CHK003 Third checklist item
## Validation And Drift
## [Category 2]
- [ ] CHK006 The minimal reviewer validation command is written explicitly and matches the declared lane.
- [ ] CHK007 Any material budget, baseline, trend, or runtime-drift note is recorded in the active spec or PR.
- [ ] CHK004 Another category item
- [ ] CHK005 Item with specific criteria
- [ ] CHK006 Final item in this category
## Escalation Outcome
- [ ] CHK008 One explicit outcome is chosen: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
- [ ] CHK009 New heavy families, new browser coverage, revived expensive defaults, or material lane-cost shifts are not left implicit.
## Notes
- `keep`: current lane, family, and setup are justified.
- `split`: scope is valid, but the test or helper spread should be narrowed before merge.
- `document-in-feature`: the change is acceptable, but the cost or drift must be recorded in the active spec or PR.
- `follow-up-spec`: recurring pain or structural lane or family changes need dedicated governance work.
- `reject-or-split`: hidden cost, wrong lane, or unjustified breadth blocks merge as proposed.
- Check items off as completed: `[x]`
- Add comments or findings inline
- Link to relevant resources or documentation
- Items are numbered sequentially for easy reference
- Reviewer-facing runtime checklists SHOULD stop merge when lane fit, hidden cost, heavy-family drift, or escalation handling is unclear.

View File

@ -49,7 +49,7 @@ ## Constitution Check
- Ops-UX system runs: initiator-null runs emit no terminal DB notification; audit remains via Monitoring; tenant-wide alerting goes through Alerts (not OperationRun notifications)
- Automation: queued/scheduled ops use locks + idempotency; handle 429/503 with backoff+jitter
- Data minimization: Inventory stores metadata + whitelisted meta; logs contain no secrets/tokens
- Test governance (TEST-GOV-001): affected lanes, fixture/helper cost risks, heavy-family changes, and any budget/baseline follow-up are explicit; the narrowest proving lane is planned
- Test governance (TEST-GOV-001): actual test-purpose classification, affected lanes, fixture/helper/factory/seed/context cost risks, heavy-family visibility, review-stop points, reviewer handoff, and any budget/baseline/trend follow-up are explicit; the narrowest proving lane mix is planned and any structural cost change has an escalation path
- Proportionality (PROP-001): any new structure, layer, persisted truth, or semantic machinery is justified by current release truth, current operator workflow, and why a narrower solution is insufficient
- No premature abstraction (ABSTR-001): no new factories, registries, resolvers, strategy systems, interfaces, type registries, or orchestration pipelines before at least 2 real concrete cases exist, unless security, tenant isolation, auditability, compliance evidence, or queue correctness require it now
- Persisted truth (PERSIST-001): new tables/entities/artifacts represent independent product truth or lifecycle; convenience projections and UI helpers stay derived
@ -92,13 +92,19 @@ ## Constitution Check
## Test Governance Check
> **Fill for any runtime-changing feature. Docs-only or template-only work may state `N/A`.**
> **Fill for any runtime-changing or test-affecting feature. Docs-only or template-only work may state concise `N/A` or `none`.**
- **Test purpose / classification by changed surface**: [Unit / Feature / Heavy-Governance / Browser / N/A]
- **Affected validation lanes**: [fast-feedback / confidence / heavy-governance / browser / profiling / junit / N/A]
- **Why this lane mix is the narrowest sufficient proof**: [Why the chosen classification and lanes fit the actual proving purpose]
- **Narrowest proving command(s)**: [Exact commands reviewers should run before merge]
- **Fixture / helper cost risks**: [none / describe]
- **Heavy-family additions or promotions**: [none / describe]
- **Fixture / helper / factory / seed / context cost risks**: [none / describe]
- **Expensive defaults or shared helper growth introduced?**: [no / describe explicit opt-in path]
- **Heavy-family additions, promotions, or visibility changes**: [none / describe]
- **Closing validation and reviewer handoff**: [What must be re-run, what reviewers should verify, and what exact proof command they should rely on]
- **Budget / baseline / trend follow-up**: [none / describe]
- **Review-stop questions**: [lane fit / breadth / hidden cost / heavy-family risk / escalation]
- **Escalation path**: [none / document-in-feature / follow-up-spec / reject-or-split]
- **Why no dedicated follow-up spec is needed**: [Routine upkeep stays inside this feature unless recurring pain or structural lane changes justify a separate spec]
## Project Structure

View File

@ -90,14 +90,17 @@ ## Proportionality Review *(mandatory when structural complexity is introduced)*
## Testing / Lane / Runtime Impact *(mandatory for runtime behavior changes)*
For docs-only changes, state `N/A` for each field.
For docs-only or template-only changes, state concise `N/A` or `none`. For runtime- or test-affecting work, classification MUST follow the proving purpose of the change rather than the file path or folder name.
- **Test purpose / classification**: [Unit / Feature / Heavy-Governance / Browser / N/A]
- **Validation lane(s)**: [fast-feedback / confidence / heavy-governance / browser / profiling / junit / N/A]
- **Why these lanes are sufficient**: [Why the narrowest listed lane(s) prove the change]
- **Why this classification and these lanes are sufficient**: [Why the narrowest listed lane(s) and chosen test type prove the change]
- **New or expanded test families**: [none / describe]
- **Fixture / helper cost impact**: [none / describe new defaults, factories, seeds, helpers, browser setup, etc.]
- **Heavy coverage justification**: [none / explain any heavy-governance or browser addition]
- **Fixture / helper cost impact**: [none / describe new defaults, factories, seeds, helpers, browser setup, provider setup, workspace or membership context, session state, etc.]
- **Heavy-family visibility / justification**: [none / explain any heavy-governance or browser addition and how it remains explicit in naming, lane choice, and review]
- **Reviewer handoff**: [What reviewers must confirm about lane fit, hidden cost, heavy-family visibility, and the exact proof command]
- **Budget / baseline / trend impact**: [none / expected drift + follow-up]
- **Escalation needed**: [none / document-in-feature / follow-up-spec / reject-or-split]
- **Planned validation commands**: [Exact minimal commands reviewers should run]
## User Scenarios & Testing *(mandatory)*
@ -188,10 +191,14 @@ ## Requirements *(mandatory)*
or taxonomy/classification system, the Proportionality Review section above is mandatory.
**Constitution alignment (TEST-GOV-001):** If this feature changes runtime behavior or tests, the spec MUST describe:
- the actual test-purpose classification (`Unit`, `Feature`, `Heavy-Governance`, or `Browser`) and why that classification matches the real proving purpose,
- the affected validation lane(s) and why they are the narrowest sufficient proof,
- any new or expanded heavy-governance or browser coverage,
- any fixture, helper, factory, seed, or default setup cost added or avoided,
- any fixture, helper, factory, seed, provider, workspace, membership, session, or default setup cost added or avoided,
- how any heavy family stays explicit rather than becoming accidental default breadth,
- the reviewer handoff for lane fit, hidden-cost checks, and the exact minimal validation commands,
- any expected budget, baseline, or trend impact,
- whether escalation stays inside this feature or resolves as `document-in-feature`, `follow-up-spec`, or `reject-or-split`,
- and the exact minimal validation commands reviewers should run.
**Constitution alignment (OPS-UX):** If this feature creates/reuses an `OperationRun`, the spec MUST:

View File

@ -10,11 +10,13 @@ # Tasks: [FEATURE NAME]
**Tests**: For runtime behavior changes in this repo, tests are REQUIRED (Pest). Only docs-only changes may omit tests.
Runtime-changing features MUST also include tasks to:
- classify or confirm the affected validation lane(s),
- keep new helpers, factories, and seeds cheap by default or isolate expensive setup behind explicit opt-ins,
- justify any new heavy-governance or browser coverage,
- classify the actual test purpose (`Unit`, `Feature`, `Heavy-Governance`, `Browser`) and confirm the affected validation lane(s),
- keep fast or narrow lanes free of silent discovery, surface, workflow, or browser cost,
- keep new helpers, factories, seeds, providers, session state, and support defaults cheap by default or isolate expensive setup behind explicit opt-ins,
- make any new heavy-governance or browser family explicit in naming, lane assignment, and review notes,
- run the narrowest relevant lane before merge,
- and record budget, baseline, or trend follow-up when runtime cost shifts materially.
- record budget, baseline, or trend follow-up when runtime cost shifts materially,
- and document whether the change resolves as `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
**Operations**: If this feature introduces long-running/remote/queued/scheduled work, include tasks to create/reuse and update a
canonical `OperationRun`, and ensure “View run” links route to the canonical Monitoring hub.
If security-relevant DB-only actions skip `OperationRun`, include tasks for `AuditLog` entries (before/after + actor + tenant).
@ -129,7 +131,17 @@ # Tasks: [FEATURE NAME]
- and adding tests around business consequences, permissions, lifecycle behavior, isolation, or audit responsibilities rather than thin indirection alone.
**Organization**: Tasks are grouped by user story to enable independent implementation and testing of each story.
Runtime behavior changes SHOULD include at least one explicit task for lane validation or runtime-impact review so upkeep stays inside the feature instead of becoming separate cleanup.
Runtime behavior or test-surface changes MUST include at least one explicit task for lane validation or runtime-impact review so upkeep stays inside the feature instead of becoming separate cleanup.
## Test Governance Checklist
Include this short checklist in generated task lists for runtime-changing or test-affecting work. Docs-only or template-only work may mark the items `N/A`.
- [ ] Lane assignment is named and is the narrowest sufficient proof for the changed behavior.
- [ ] New or changed tests stay in the smallest honest family, and any heavy-governance or browser addition is explicit.
- [ ] Shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default; any widening is isolated or documented.
- [ ] Planned validation commands cover the change without pulling in unrelated lane cost.
- [ ] Any material budget, baseline, trend, or escalation note is recorded in the active spec or PR.
## Format: `[ID] [P?] [Story] Description`

View File

@ -81,9 +81,23 @@ ### Trend Summary Reading
### Workflow Expectation
- Every runtime-changing spec, plan, and task set MUST record the target validation lane(s), fixture-cost risks, any heavy-governance or browser expansion, and any budget/baseline follow-up.
- Every runtime-changing or test-affecting spec, plan, and task set MUST record actual test-purpose classification, target validation lane(s), fixture-cost risks, any heavy-governance or browser expansion, any heavy-family visibility change, and any budget/baseline/trend follow-up.
- Test classification follows the real proving purpose of the change, not the filename or folder.
- Minimal fixtures and minimal infrastructure are the default; database, Livewire, Filament, provider, workspace, membership, or session-heavy setup must stay explicit and opt-in.
- Review treats wrong lane fit, hidden default cost, accidental heavy-family growth, or undocumented runtime drift as merge issues, not later cleanup.
- Routine lane recalibration belongs inside the affecting feature spec or PR; open a dedicated follow-up spec only when recurring pain or structural lane changes justify it.
### Authoring And Review Guardrails
- Start with the smallest honest surface: `Unit` for isolated logic, `Feature` for HTTP, Livewire, Filament, jobs, or non-browser integration, `heavy-governance` for intentionally expensive governance scans, and `Browser` only for end-to-end workflow coverage.
- Specs and plans must state the affected lanes or a deliberate `N/A`, the family impact, the setup-cost impact, and the narrowest reviewer command.
- If database, Livewire, Filament, provider setup, workspace or membership context, session state, capability context, or browser coverage is required, say why a narrower proof is insufficient.
- Keep shared helpers, factories, seeds, fixtures, and defaults cheap by default. Full-context setup should stay behind explicit opt-ins instead of becoming the default path.
- Extend an existing heavy or browser family only when the behavior truly matches it. New heavy families, new browser scope, revived expensive defaults, or material lane-cost shifts require explicit escalation.
- Low-impact docs-only or template-only work may answer the governance prompts with `N/A` or `none`; do not invent runtime impact where none exists.
- Review should end with one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
- Use `document-in-feature` for contained drift or cost that belongs in the active feature. Use `follow-up-spec` for recurring pain or structural lane-model changes. Use `reject-or-split` when hidden cost is still unresolved.
### CI Trigger Matrix
- Pull requests (`opened`, `reopened`, `synchronize`) run only `./scripts/platform-test-lane fast-feedback` through `.gitea/workflows/test-pr-fast-feedback.yml` and block on test, wrapper, artifact, and mature Fast Feedback budget failures.

View File

@ -0,0 +1,36 @@
# Specification Quality Checklist: Test Suite Authoring Constitution & Review Guardrails
**Purpose**: Validate specification completeness and quality before proceeding to planning
**Created**: 2026-04-18
**Feature**: [spec.md](../spec.md)
## Content Quality
- [x] No implementation details (languages, frameworks, APIs)
- [x] Focused on user value and business needs
- [x] Written for non-technical stakeholders
- [x] All mandatory sections completed
## Requirement Completeness
- [x] No [NEEDS CLARIFICATION] markers remain
- [x] Requirements are testable and unambiguous
- [x] Success criteria are measurable
- [x] Success criteria are technology-agnostic (no implementation details)
- [x] All acceptance scenarios are defined
- [x] Edge cases are identified
- [x] Scope is clearly bounded
- [x] Dependencies and assumptions identified
## Feature Readiness
- [x] All functional requirements have clear acceptance criteria
- [x] User scenarios cover primary flows
- [x] Feature meets measurable outcomes defined in Success Criteria
- [x] No implementation details leak into specification
## Notes
- Validation completed in one iteration.
- No unresolved clarification markers or template placeholders remain.
- The spec stays repository-process focused and aligns with the existing test-governance chain from Specs 206 through 211.

View File

@ -0,0 +1,302 @@
openapi: 3.1.0
info:
title: Test Authoring Constitution & Review Guardrails
version: 1.0.0
description: |
Logical contract for the repository-owned authoring and review workflow
introduced by Spec 212. This documents constitution, template, checklist,
and escalation semantics. It is not a public HTTP API.
servers:
- url: https://tenantatlas.local/logical
paths:
/logical/test-governance/spec-impact/validate:
post:
summary: Validate one spec-level testing and lane impact block
operationId: validateSpecImpactBlock
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/SpecImpactValidationRequest'
responses:
'200':
description: Spec impact block evaluated
content:
application/json:
schema:
$ref: '#/components/schemas/SpecImpactValidationResult'
/logical/test-governance/plan-impact/validate:
post:
summary: Validate one planning-time test-governance block
operationId: validatePlanImpactBlock
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/PlanImpactValidationRequest'
responses:
'200':
description: Plan impact block evaluated
content:
application/json:
schema:
$ref: '#/components/schemas/PlanImpactValidationResult'
/logical/test-governance/tasks/checklist/evaluate:
post:
summary: Evaluate whether a task checklist keeps test-governance obligations visible
operationId: evaluateTaskGovernanceChecklist
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/TaskChecklistEvaluationRequest'
responses:
'200':
description: Task-level governance checklist evaluation returned
content:
application/json:
schema:
$ref: '#/components/schemas/TaskChecklistEvaluationResult'
/logical/test-governance/reviews/escalation-assessment:
post:
summary: Assess whether a test change requires governance escalation
operationId: assessReviewEscalation
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/EscalationAssessmentRequest'
responses:
'200':
description: Escalation assessment returned
content:
application/json:
schema:
$ref: '#/components/schemas/EscalationAssessmentResult'
/logical/test-governance/guidance:
get:
summary: Read the contributor guidance pack for test authoring decisions
operationId: readContributorGuidance
responses:
'200':
description: Contributor guidance returned
content:
application/json:
schema:
$ref: '#/components/schemas/ContributorGuidancePack'
components:
schemas:
SpecImpactValidationRequest:
type: object
additionalProperties: false
required:
- specPath
- validationLanes
- testFamilyImpact
- heavySurfaceImpact
- fixtureCostImpact
- reviewValidationCommand
properties:
specPath:
type: string
validationLanes:
oneOf:
- type: string
enum:
- N/A
- type: array
items:
type: string
enum:
- fast-feedback
- confidence
- heavy-governance
- browser
- profiling
- junit
testFamilyImpact:
type: string
heavySurfaceImpact:
type: string
fixtureCostImpact:
type: string
budgetTrendImpact:
type: string
reviewValidationCommand:
type: string
escalationNeeded:
type: boolean
SpecImpactValidationResult:
type: object
additionalProperties: false
required:
- status
- findings
properties:
status:
type: string
enum:
- complete
- needs-revision
findings:
type: array
items:
type: string
reviewerHandOff:
type: string
PlanImpactValidationRequest:
type: object
additionalProperties: false
required:
- planPath
- changedTestTypes
- helperFixtureImpact
- laneReshapeImpact
- closingValidation
properties:
planPath:
type: string
changedTestTypes:
type: array
items:
type: string
helperFixtureImpact:
type: string
laneReshapeImpact:
type: string
closingValidation:
type: string
driftDocumentationTarget:
type: string
PlanImpactValidationResult:
type: object
additionalProperties: false
required:
- status
- findings
properties:
status:
type: string
enum:
- complete
- needs-revision
findings:
type: array
items:
type: string
taskChecklistRequirements:
type: array
items:
type: string
TaskChecklistEvaluationRequest:
type: object
additionalProperties: false
required:
- checklistId
- items
- runtimeChange
properties:
checklistId:
type: string
items:
type: array
items:
type: string
runtimeChange:
type: boolean
evidenceTarget:
type: string
TaskChecklistEvaluationResult:
type: object
additionalProperties: false
required:
- status
- missingCoverage
properties:
status:
type: string
enum:
- complete
- incomplete
missingCoverage:
type: array
items:
type: string
notes:
type: array
items:
type: string
EscalationAssessmentRequest:
type: object
additionalProperties: false
required:
- changeRef
- triggers
properties:
changeRef:
type: string
triggers:
type: array
items:
type: string
enum:
- new-heavy-family
- new-browser-coverage
- material-lane-cost-shift
- broad-filament-livewire-governance-surface
- revived-expensive-default
- budget-or-baseline-relevant-change
- major-suite-reshaping
contextNote:
type: string
EscalationAssessmentResult:
type: object
additionalProperties: false
required:
- outcome
- reason
properties:
outcome:
type: string
enum:
- none
- document-in-feature
- follow-up-spec
- reject-or-split
reason:
type: string
recordLocation:
type:
- string
- 'null'
ContributorGuidancePack:
type: object
additionalProperties: false
required:
- guidanceId
- decisionPoints
- entryPoints
- sharedVocabulary
properties:
guidanceId:
type: string
decisionPoints:
type: array
items:
type: string
examplePatterns:
type: array
items:
type: string
entryPoints:
type: array
items:
type: string
sharedVocabulary:
type: array
items:
type: string

View File

@ -0,0 +1,304 @@
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://tenantatlas.local/specs/212/test-authoring-governance.schema.json",
"title": "Test Authoring Governance Pack",
"description": "Repository-owned contract for the constitution, authoring prompts, review guardrails, escalation policy, and validation scenarios introduced by Spec 212.",
"type": "object",
"additionalProperties": false,
"required": [
"schemaVersion",
"constitutionSection",
"specImpactPromptBlock",
"planImpactPromptBlock",
"taskGovernanceChecklist",
"reviewGuardrailChecklist",
"escalationPolicy",
"contributorGuidance",
"validationScenarios"
],
"properties": {
"schemaVersion": {
"type": "string"
},
"constitutionSection": {
"$ref": "#/$defs/constitutionSection"
},
"specImpactPromptBlock": {
"$ref": "#/$defs/specImpactPromptBlock"
},
"planImpactPromptBlock": {
"$ref": "#/$defs/planImpactPromptBlock"
},
"taskGovernanceChecklist": {
"$ref": "#/$defs/checklist"
},
"reviewGuardrailChecklist": {
"$ref": "#/$defs/reviewChecklist"
},
"escalationPolicy": {
"$ref": "#/$defs/escalationPolicy"
},
"contributorGuidance": {
"$ref": "#/$defs/contributorGuidance"
},
"validationScenarios": {
"type": "array",
"minItems": 2,
"items": {
"$ref": "#/$defs/validationScenario"
}
}
},
"$defs": {
"constitutionSection": {
"type": "object",
"additionalProperties": false,
"required": [
"sectionId",
"version",
"classificationRule",
"laneAwarenessRule",
"heavyJustificationRule",
"minimalFixtureRule",
"expensiveDefaultBanRule",
"reviewExpectationRule",
"escalationRule",
"linkedWorkflowSurfaces"
],
"properties": {
"sectionId": { "type": "string" },
"version": { "type": "string" },
"classificationRule": { "type": "string" },
"laneAwarenessRule": { "type": "string" },
"heavyJustificationRule": { "type": "string" },
"minimalFixtureRule": { "type": "string" },
"expensiveDefaultBanRule": { "type": "string" },
"reviewExpectationRule": { "type": "string" },
"escalationRule": { "type": "string" },
"linkedWorkflowSurfaces": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
}
}
},
"specImpactPromptBlock": {
"type": "object",
"additionalProperties": false,
"required": [
"blockId",
"requiredFields",
"narrowestProofRule",
"naAllowanceRule",
"escalationPrompt",
"reviewerHandOff"
],
"properties": {
"blockId": { "type": "string" },
"requiredFields": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
},
"narrowestProofRule": { "type": "string" },
"naAllowanceRule": { "type": "string" },
"escalationPrompt": { "type": "string" },
"reviewerHandOff": { "type": "string" }
}
},
"planImpactPromptBlock": {
"type": "object",
"additionalProperties": false,
"required": [
"blockId",
"changedTestTypes",
"helperOrFixtureImpact",
"laneReshapeQuestion",
"closingValidationRule",
"driftDocumentationRule"
],
"properties": {
"blockId": { "type": "string" },
"changedTestTypes": {
"type": "array",
"items": { "type": "string" }
},
"helperOrFixtureImpact": { "type": "string" },
"laneReshapeQuestion": { "type": "string" },
"closingValidationRule": { "type": "string" },
"driftDocumentationRule": { "type": "string" }
}
},
"checklist": {
"type": "object",
"additionalProperties": false,
"required": [
"checklistId",
"items",
"appliesWhen",
"evidenceTarget"
],
"properties": {
"checklistId": { "type": "string" },
"items": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
},
"appliesWhen": { "type": "string" },
"evidenceTarget": { "type": "string" }
}
},
"reviewChecklist": {
"type": "object",
"additionalProperties": false,
"required": [
"checklistId",
"questions",
"expectedOutcomeSet",
"maxReviewMinutes",
"escalationReference"
],
"properties": {
"checklistId": { "type": "string" },
"questions": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
},
"expectedOutcomeSet": {
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": [
"keep",
"split",
"document-in-feature",
"follow-up-spec",
"reject-or-split"
]
}
},
"maxReviewMinutes": {
"type": "integer",
"minimum": 1
},
"escalationReference": { "type": "string" }
}
},
"escalationPolicy": {
"type": "object",
"additionalProperties": false,
"required": [
"policyId",
"triggers",
"outcomes",
"followUpThresholdRule"
],
"properties": {
"policyId": { "type": "string" },
"triggers": {
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": [
"new-heavy-family",
"new-browser-coverage",
"material-lane-cost-shift",
"broad-filament-livewire-governance-surface",
"revived-expensive-default",
"budget-or-baseline-relevant-change",
"major-suite-reshaping"
]
}
},
"outcomes": {
"type": "array",
"minItems": 1,
"items": {
"type": "string",
"enum": [
"none",
"document-in-feature",
"follow-up-spec",
"reject-or-split"
]
}
},
"followUpThresholdRule": { "type": "string" }
}
},
"contributorGuidance": {
"type": "object",
"additionalProperties": false,
"required": [
"guidanceId",
"decisionPoints",
"examplePatterns",
"entryPoints",
"sharedVocabulary"
],
"properties": {
"guidanceId": { "type": "string" },
"decisionPoints": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
},
"examplePatterns": {
"type": "array",
"items": { "type": "string" }
},
"entryPoints": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
},
"sharedVocabulary": {
"type": "array",
"minItems": 1,
"items": { "type": "string" }
}
}
},
"validationScenario": {
"type": "object",
"additionalProperties": false,
"required": [
"scenarioId",
"scenarioType",
"representativeArtifact",
"expectedPromptPattern",
"expectedEscalationOutcome",
"status"
],
"properties": {
"scenarioId": { "type": "string" },
"scenarioType": {
"type": "string",
"enum": ["low-impact", "high-impact"]
},
"representativeArtifact": { "type": "string" },
"expectedPromptPattern": { "type": "string" },
"expectedEscalationOutcome": {
"type": "string",
"enum": [
"none",
"document-in-feature",
"follow-up-spec",
"reject-or-split"
]
},
"status": {
"type": "string",
"enum": ["planned", "validated", "needs-tuning"]
},
"notes": {
"type": "string"
}
}
}
}
}

View File

@ -0,0 +1,178 @@
# Data Model: Test Suite Authoring Constitution & Review Guardrails
This feature adds repository-owned governance artifacts only. It does not add product database tables or runtime-owned entities. All objects below are implemented as constitution text, markdown prompt blocks, checklists, logical contracts, or validation notes.
## 1. TestAuthoringConstitutionSection
**Purpose**: Defines the standing rules contributors and reviewers must follow when new tests are introduced or existing tests expand in cost.
| Field | Type | Description |
|-------|------|-------------|
| `sectionId` | string | Stable identifier for the constitution section. |
| `version` | string | Version of the rule set. |
| `scope` | string | Repository workflow scope, always `workspace`. |
| `classificationRule` | string | Requires explicit classification of new or changed tests. |
| `laneAwarenessRule` | string | Requires authors to name affected lane or lanes. |
| `heavyJustificationRule` | string | Requires justification for database, Livewire, Filament, or browser use. |
| `minimalFixtureRule` | string | States that minimal fixtures and cheap defaults are the norm. |
| `expensiveDefaultBanRule` | string | Forbids hidden shared helper, factory, or seed cost growth without disclosure or escalation. |
| `reviewExpectationRule` | string | Requires the reviewer guardrail questions to be applied when tests change. |
| `escalationRule` | string | Defines when a change must be documented locally or raised as follow-up governance work. |
| `linkedWorkflowSurfaces` | array | Template, checklist, and contributor-doc surfaces that must remain aligned with the section. |
**Relationships**
- One `TestAuthoringConstitutionSection` governs one `SpecImpactPromptBlock`, one `PlanImpactPromptBlock`, one `TaskGovernanceChecklist`, one `ReviewGuardrailChecklist`, and one `ContributorGuidancePack`.
**Validation Rules**
- The rule set must stay short enough to be quoted or understood during routine authoring and review.
- The section must reuse existing lane vocabulary from Specs 206 through 211.
- The section must not invent new validation lanes or new runtime governance subsystems.
## 2. SpecImpactPromptBlock
**Purpose**: Defines the authoring-time questions every spec must answer about test, lane, and runtime cost impact.
| Field | Type | Description |
|-------|------|-------------|
| `blockId` | string | Stable identifier for the spec prompt block. |
| `requiredFields` | array | Required answers such as affected lanes, test-family impact, heavy-surface relevance, fixture-cost impact, budget or trend implications, and reviewer validation commands or `N/A`. |
| `narrowestProofRule` | string | Requires authors to name the narrowest sufficient validation path when runtime changes exist. |
| `naAllowanceRule` | string | Allows concise `N/A` or `none` answers for docs-only or low-impact work. |
| `escalationPrompt` | string | Direct question asking whether the change creates a new heavy family, new browser scope, or material lane-cost shift. |
| `reviewerHandOff` | string | States what reviewers should verify from the completed block. |
**Validation Rules**
- The block must be short enough for ordinary specs to complete quickly.
- The block must distinguish between “no impact” and “impact exists but is acceptable.”
- The block must not duplicate entire review checklist content; it only prepares the review handoff.
## 3. PlanImpactPromptBlock
**Purpose**: Defines the planning-time questions that convert the spec's declared impact into implementation-time guardrails.
| Field | Type | Description |
|-------|------|-------------|
| `blockId` | string | Stable identifier for the plan prompt block. |
| `changedTestTypes` | array | Test types being added or changed. |
| `helperOrFixtureImpact` | string | Whether helpers, factories, seeds, or defaults widen. |
| `laneReshapeQuestion` | string | Whether lane movement, heavy-family addition, or browser promotion is implicated. |
| `closingValidationRule` | string | Defines the minimum validation evidence to finish the feature. |
| `driftDocumentationRule` | string | States where material runtime drift or recalibration follow-up must be recorded. |
**Relationships**
- One `PlanImpactPromptBlock` operationalizes one `SpecImpactPromptBlock`.
- One `PlanImpactPromptBlock` informs one `TaskGovernanceChecklist`.
**Validation Rules**
- The block must make authoring decisions actionable in tasks, not merely restate the spec.
- The block must expose helper or fixture widening even when the local feature is otherwise small.
## 4. TaskGovernanceChecklist
**Purpose**: Provides a short implementation-time checklist that keeps lane fit, setup cost, and validation visible while tasks are broken down.
| Field | Type | Description |
|-------|------|-------------|
| `checklistId` | string | Stable identifier for the task checklist. |
| `items` | array | Required checks such as lane assignment confirmed, no unnecessary heavy cost, minimal fixtures used, relevant validation planned, and budget or trend notes recorded when needed. |
| `appliesWhen` | string | Scope rule for runtime-changing work versus docs-only work. |
| `evidenceTarget` | string | Where the resulting note or evidence must be recorded. |
**Validation Rules**
- The checklist must remain short enough to fit inside ordinary task planning.
- The checklist must not require runtime-lane execution for docs-only work.
## 5. ReviewGuardrailChecklist
**Purpose**: Gives reviewers a fast, repeatable decision aid for new or changed tests.
| Field | Type | Description |
|-------|------|-------------|
| `checklistId` | string | Stable identifier for the review checklist. |
| `questions` | array | Direct questions about lane fit, breadth, DB or UI-heavy necessity, setup cost, split need, escalation need, and budget or trend notes. |
| `expectedOutcomeSet` | array | Allowed reviewer outcomes such as `keep`, `split`, `document-local`, `follow-up-spec`, or `reject-drift`. |
| `maxReviewMinutes` | integer | Target application time for one representative change. |
| `escalationReference` | string | Link or pointer to the escalation policy used when a trigger is present. |
**Validation Rules**
- Questions must be phrased as decisions, not vague advice.
- The checklist must stay usable in under 3 minutes for a representative diff.
- The checklist must support both low-impact and high-impact changes.
## 6. EscalationAssessment
**Purpose**: Captures whether a change is ordinary test maintenance or a governance-significant event requiring extra documentation or follow-up.
| Field | Type | Description |
|-------|------|-------------|
| `assessmentId` | string | Stable identifier for one escalation assessment. |
| `triggerSet` | array | Detected triggers such as new heavy family, new browser scope, revived expensive defaults, material lane-cost shift, or broad suite reshaping. |
| `outcome` | enum | `none`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`. |
| `reason` | string | Human-readable explanation of why the outcome was chosen. |
| `recordLocation` | string | Active spec path or implementation PR location where the outcome is recorded. |
| `examples` | array | Example changes that should resolve to this outcome. |
**Validation Rules**
- Every trigger must map to a documented action.
- `follow-up-spec` is reserved for recurring pain or structural change, not ordinary recalibration.
- `none` is valid only when the change stays inside an existing lane and family without hidden-cost growth.
## 7. ContributorGuidancePack
**Purpose**: Gives contributors concise operational guidance for choosing the smallest justified test surface and recognizing escalation signals.
| Field | Type | Description |
|-------|------|-------------|
| `guidanceId` | string | Stable identifier for the contributor guidance pack. |
| `decisionPoints` | array | High-value decisions such as unit vs feature vs heavy vs browser, when DB is justified, and when a test is too broad. |
| `examplePatterns` | array | Brief examples of acceptable `N/A`, lane-specific justification, and escalation-worthy changes. |
| `entryPoints` | array | Documentation surfaces where the guidance appears. |
| `sharedVocabulary` | array | Canonical governance terms reused across constitution, templates, and review. |
**Validation Rules**
- Guidance must stay short and operational.
- Guidance must avoid duplicating long prose across multiple files.
- Guidance must reflect the same vocabulary used in the constitution and review checklist.
## 8. ValidationScenario
**Purpose**: Represents one dry-run scenario used to prove that the authoring and review workflow stays usable.
| Field | Type | Description |
|-------|------|-------------|
| `scenarioId` | string | Stable scenario identifier. |
| `scenarioType` | enum | `low-impact` or `high-impact`. |
| `representativeArtifact` | string | Spec, plan, or diff used in the dry run. |
| `expectedPromptPattern` | string | Expected answer style, such as `N/A` or a multi-lane justification. |
| `expectedEscalationOutcome` | string | Expected escalation result for the scenario. |
| `status` | enum | `planned`, `validated`, or `needs-tuning`. |
| `notes` | string | What the dry run proved or what wording needs refinement. |
**Validation Rules**
- At least one `low-impact` and one `high-impact` scenario must be validated.
- `low-impact` scenarios must prove the workflow stays lightweight.
- `high-impact` scenarios must prove the escalation prompts catch the intended cost-center changes.
## State Transitions
### EscalationAssessment.outcome
- `none` -> `document-in-feature`: allowed when a review reveals governance-relevant cost or scope that should be explicitly recorded but does not justify a new spec.
- `document-in-feature` -> `follow-up-spec`: allowed when the discovered issue reflects recurring pain or structural lane change rather than one contained feature decision.
- Any state -> `reject-or-split`: allowed when the change is too broad, too hidden in cost, or insufficiently justified to merge as proposed.
### ValidationScenario.status
- `planned` -> `validated`: allowed when the scenario can be completed with the expected prompt pattern and escalation outcome.
- `planned` -> `needs-tuning`: allowed when wording or checklist structure creates unnecessary friction or misses the expected governance signal.
- `needs-tuning` -> `validated`: allowed after the relevant constitution, template, or checklist wording is refined.

View File

@ -0,0 +1,165 @@
# Implementation Plan: Test Suite Authoring Constitution & Review Guardrails
**Branch**: `212-test-authoring-guardrails` | **Date**: 2026-04-18 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/212-test-authoring-guardrails/spec.md`
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/212-test-authoring-guardrails/spec.md`
## Summary
Implement Spec 212 by tightening the existing `TEST-GOV-001` workflow surfaces in the constitution, SpecKit templates, and contributor-facing repository guidance so new tests must declare lane impact, justify heavy setup, trigger explicit escalation when new cost centers appear, and give reviewers a fast decision-grade checklist without introducing runtime tooling, bots, or a second governance subsystem.
## Technical Context
**Language/Version**: Markdown for repository governance artifacts, JSON Schema plus logical OpenAPI for planning contracts, and Bash-backed SpecKit scripts already present in the repo
**Primary Dependencies**: `.specify/memory/constitution.md`, `.specify/templates/spec-template.md`, `.specify/templates/plan-template.md`, `.specify/templates/tasks-template.md`, `.specify/templates/checklist-template.md`, `.specify/README.md`, `README.md`, and the existing Specs 206 through 211 governance vocabulary
**Storage**: Repository-owned markdown and contract artifacts under `.specify/`, `specs/212-test-authoring-guardrails/`, and root documentation files; no product database persistence
**Testing**: Document and template validation against representative low-impact and higher-cost feature flows, checklist completeness review, and no runtime Pest lane execution because the feature is docs and workflow only
**Validation Lanes**: `N/A`
**Target Platform**: TenantAtlas monorepo with SpecKit-driven specification workflow, repository contributor guidance, and Gitea-backed code review
**Project Type**: Monorepo with a Laravel platform app and Astro website, but this feature is scoped strictly to repository governance and authoring workflow artifacts
**Performance Goals**: Keep low-impact feature answers to the new prompts completable in under 1 minute, keep representative review-guardrail application under 3 minutes, and avoid adding any new daily workflow surface beyond the existing constitution, templates, and contributor guidance entry points
**Constraints**: No new runtime dependencies, no CI bot requirement, no new product routes or persistence, no contradiction with Specs 206 through 211, no speculative governance framework, and no new documentation sprawl when an existing entry point can carry the guidance
**Scale/Scope**: One constitution section, four SpecKit templates, two contributor-facing guidance surfaces, one review-guardrail surface, one escalation policy set, one contributor guidance pack, and validation against at least two representative spec flows
### Filament v5 Implementation Notes
- **Livewire v4.0+ compliance**: Preserved. This feature only changes repository authoring and review artifacts and does not alter the Filament or Livewire runtime stack.
- **Provider registration location**: Unchanged. Existing panel providers remain registered in `bootstrap/providers.php`.
- **Global search rule**: No globally searchable resources are added or modified.
- **Destructive actions**: No runtime destructive actions are introduced. Existing confirmation and authorization behavior remain unchanged.
- **Asset strategy**: No panel or shared assets are added. Existing `filament:assets` deployment behavior remains unchanged.
- **Testing plan**: Validate the constitution, template prompts, checklist wording, escalation semantics, and contributor guidance against one docs-only `N/A` path and one higher-cost governed spec path; no runtime UI, action, or Livewire tests are added by this feature.
## Test Governance Check
- **Affected validation lanes**: `N/A`
- **Narrowest proving command(s)**: `N/A`. Validation is document and workflow based rather than runtime-lane based.
- **Fixture / helper cost risks**: None directly. The feature exists to prevent future hidden helper and fixture cost growth rather than to introduce new shared setup.
- **Heavy-family additions or promotions**: None. The intended change is earlier disclosure and escalation of heavy-family growth in future work.
- **Budget / baseline / trend follow-up**: None directly. The feature must stay consistent with current lane, budget, and trend vocabulary without mutating those contracts.
- **Why no dedicated follow-up spec is needed**: Spec 212 is itself the structural authoring and review guardrail feature. After rollout, routine upkeep should live inside ordinary feature specs unless recurring pain or another structural lane-model change appears.
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
- Inventory-first: PASS. No inventory, backup, or snapshot product truth changes.
- Read/write separation: PASS. This is repository-only governance work with no end-user mutations.
- Graph contract path: PASS. No Microsoft Graph calls or contract-registry changes.
- Deterministic capabilities: PASS. No capability resolver or authorization registry changes.
- RBAC-UX, workspace isolation, tenant isolation: PASS. No runtime routes, policies, or scope behavior change.
- Run observability and Ops-UX: PASS. No `OperationRun` or monitoring lifecycle changes.
- Data minimization: PASS. The new artifacts are repository-owned prompts and guidance only.
- Test governance (TEST-GOV-001): PASS WITH WORK. The feature intentionally strengthens authoring-time and review-time enforcement of lane choice, fixture-cost disclosure, heavy-family escalation, and runtime-drift documentation.
- Proportionality and bloat control: PASS WITH LIMITS. The implementation may touch several workflow entry points, but it must do so by sharpening existing sections rather than creating a new governance framework, parallel handbook, or automation layer.
- TEST-TRUTH-001: PASS WITH WORK. The added prompts and checklists must stay tied to real lane, cost, and escalation decisions instead of inventing abstract process overhead.
- Filament/UI constitutions: PASS / NOT APPLICABLE. No operator-facing runtime UI, action surfaces, or panels are changed.
**Phase 0 Gate Result**: PASS
- The feature stays bounded to repository constitution, templates, review prompts, and contributor guidance.
- No new product persistence, Graph seams, runtime routes, or authorization planes are introduced.
- The plan reuses existing `TEST-GOV-001` workflow surfaces instead of inventing a second governance mechanism.
## Project Structure
### Documentation (this feature)
```text
specs/212-test-authoring-guardrails/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│ ├── test-authoring-governance.schema.json
│ └── test-authoring-governance.logical.openapi.yaml
└── tasks.md
```
### Source Code (repository root)
```text
.specify/
├── memory/
│ └── constitution.md
├── templates/
│ ├── checklist-template.md
│ ├── plan-template.md
│ ├── spec-template.md
│ └── tasks-template.md
└── README.md
README.md
specs/
└── 212-test-authoring-guardrails/
├── spec.md
└── checklists/requirements.md
```
**Structure Decision**: Keep all changes inside the existing constitution, SpecKit templates, and established contributor documentation entry points so the governance model becomes more explicit without creating a separate handbook, reviewer-only subsystem, or new runtime-owned code surface.
## Complexity Tracking
| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| None | Not applicable | Not applicable |
## Proportionality Review
- **Current operator problem**: Contributors can still introduce broad, expensive, or misclassified tests before review, and reviewers still lack one compact, repeatable checklist for catching accidental heavy cost and escalation triggers before merge.
- **Existing structure is insufficient because**: `TEST-GOV-001` and the current templates already mention lane/runtime impact, but they do not yet fully encode authoring-time classification discipline, a stable review checklist, escalation triggers, or lightweight contributor decision guidance.
- **Narrowest correct implementation**: Tighten the existing constitution, templates, and contributor docs with a small number of mandatory prompts and review questions instead of adding bots, runtime policy engines, or a standalone governance manual.
- **Ownership cost created**: The repo must maintain a concise shared vocabulary for escalation, authoring prompts, and review guardrails across constitution, template, and contributor docs.
- **Alternative intentionally rejected**: A new automation bot, PR scoring system, or separate governance handbook, because each would add process surface or drift risk beyond what the current delivery workflow needs.
- **Release truth**: Current-release repository truth needed to make the test-governance chain from Specs 206 through 211 durable in day-to-day authoring and review.
## Phase 0 — Research (complete)
- Output: [research.md](./research.md)
- Resolved key decisions:
- Reuse and sharpen the existing `TEST-GOV-001` workflow surfaces instead of creating a new governance subsystem.
- Keep contributor guidance in existing high-traffic documentation surfaces unless a new file proves necessary for clarity.
- Model review guardrails as a short question set and explicit escalation outcomes rather than a lengthy rubric or approval board.
- Treat escalation as a documented authoring and review decision, not a new automatic CI blocker.
- Validate the workflow with one docs-only `N/A` path and one higher-cost governed-spec path so both minimal overhead and escalation behavior are proven.
- Use logical contract artifacts to describe the expected prompt, checklist, and escalation semantics even though the feature adds no transport API, while treating those files as plan-time scaffolding rather than new maintained workflow surfaces.
## Phase 1 — Design & Contracts (complete)
- Output: [data-model.md](./data-model.md) formalizes the repository-owned governance objects: constitution rule set, spec and plan prompt blocks, task checklist, review checklist, escalation assessment, contributor guidance pack, and validation scenarios.
- Output: [contracts/test-authoring-governance.schema.json](./contracts/test-authoring-governance.schema.json) defines schema-first planning scaffolding for the governance pack the workflow must express; it is not an additional maintained reviewer-facing surface.
- Output: [contracts/test-authoring-governance.logical.openapi.yaml](./contracts/test-authoring-governance.logical.openapi.yaml) captures logical planning semantics for validating spec and plan impact blocks, evaluating a task checklist, assessing escalation, and serving contributor guidance; it is not an additional maintained reviewer-facing surface.
- Output: [quickstart.md](./quickstart.md) provides the implementation order, representative validation flows, and rollout checklist.
### Post-design Constitution Re-check
- PASS: No runtime routes, panels, Graph seams, or authorization planes are introduced.
- PASS: The design keeps all new truth repository-owned and documentation-first.
- PASS: The workflow surfaces stay inside existing constitution, template, and contributor entry points rather than creating a new process framework.
- PASS WITH WORK: Review guardrails and escalation wording must remain concise enough that low-impact features can still answer with `N/A` or `none` without friction.
- PASS WITH WORK: Any contributor guidance added to `README.md` or `.specify/README.md` must avoid duplicating the same rules in multiple long prose blocks that will drift.
## Phase 2 — Implementation Planning
`tasks.md` should cover:
- Auditing the current `TEST-GOV-001` constitution text, SpecKit templates, and contributor docs to isolate exactly which authoring-time and review-time gaps remain after Specs 206 through 211.
- Updating `.specify/memory/constitution.md` with a short, binding test authoring and review guardrail section that makes classification, minimal fixtures, explicit heavy justification, escalation triggers, and expensive-default bans unmistakable.
- Updating `.specify/templates/spec-template.md` so the existing `Testing / Lane / Runtime Impact` block explicitly asks for lane fit, heavy-surface justification, fixture-cost disclosure, and minimal reviewer validation in authoring-time language.
- Updating `.specify/templates/plan-template.md` so the `Test Governance Check` and technical planning surfaces make test type changes, helper widening, lane reshaping, escalation triggers, and closing validation explicit before implementation begins.
- Updating `.specify/templates/tasks-template.md` to standardize a short task-level governance checklist covering lane assignment, minimal setup, relevant validation, hidden-cost prevention, and documentation of material budget or trend impact.
- Updating `.specify/templates/checklist-template.md` as the canonical generated review-checklist surface, with `.specify/README.md` as the reviewer entry point, so reviewers get a stable, quick guardrail checklist with direct keep, split, or escalate questions.
- Updating `.specify/README.md` and `README.md` with concise contributor guidance showing how to answer `N/A` for low-impact work, when database or UI-heavy coverage is justified, and when a new heavy family or browser path requires escalation.
- Validating the updated workflow against one low-impact docs or template scenario and one higher-cost governed-spec scenario, confirming that the low-impact path stays fast and the higher-cost path surfaces the intended escalation questions.
- Recording the validation note inside the active spec or implementation PR so the workflow proof is durable and does not live only in casual commentary.
### Contract Implementation Note
- The JSON schema is repository-tooling oriented and describes the complete governance pack the repo must express during planning even if the first implementation lives mostly in markdown templates and checklists.
- The OpenAPI file is logical rather than transport-prescriptive. It documents workflow semantics for authoring and review interactions, not a public HTTP API.
- The design intentionally avoids new runtime services or CI bots. The contracts are plan-time alignment aids inside this spec set, not new long-term reviewer-facing workflow surfaces that must evolve independently from the markdown sources.
### Deployment Sequencing Note
- No database migration is planned.
- No asset publish step changes.
- Recommended rollout order: tighten constitution text first, then update spec and plan templates, then update task and review checklist surfaces, then update contributor guidance, then validate low-impact and higher-cost scenarios, and finally note any wording refinements needed to keep the process lightweight.

View File

@ -0,0 +1,128 @@
# Quickstart: Test Suite Authoring Constitution & Review Guardrails
This feature is repository-governance only. It does not change application runtime behavior, validation lanes, or deployment infrastructure. The goal is to tighten the authoring and review workflow so future test changes declare cost and escalation signals earlier.
## 1. Confirm the implementation surfaces
Review the files that already carry test-governance workflow truth:
- `.specify/memory/constitution.md`
- `.specify/templates/spec-template.md`
- `.specify/templates/plan-template.md`
- `.specify/templates/tasks-template.md`
- `.specify/templates/checklist-template.md`
- `.specify/README.md`
- `README.md`
Do not create a parallel handbook unless one of these surfaces cannot carry the needed guidance cleanly.
## 2. Tighten the constitution first
Update the constitution so the standing rules are explicit about:
- authoring-time classification of new and changed tests
- naming affected lanes when runtime behavior or tests change
- justifying database, Livewire, Filament, or browser usage
- keeping fixtures, helpers, factories, and seeds cheap by default
- escalating new heavy families, new browser scope, revived expensive defaults, or material lane-cost shifts
- giving reviewers a stable decision-grade checklist target
Keep the language short and binding.
## 3. Update the template surfaces in order
Apply the same vocabulary consistently across the authoring workflow:
1. `spec-template.md`: strengthen the existing `Testing / Lane / Runtime Impact` block with authoring-time classification and escalation prompts.
2. `plan-template.md`: strengthen the `Test Governance Check` so helper widening, lane reshaping, and closing validation are explicit before implementation.
3. `tasks-template.md`: standardize the short task-level governance checklist.
4. `checklist-template.md` as the canonical generated review-checklist surface, with `.specify/README.md` as the reviewer entry point: provide the fixed review guardrail questions and expected escalation outcomes.
Avoid asking the same question in three different ways across the templates.
## 4. Update contributor guidance
Keep the contributor-facing explanation concise and practical:
- how to answer `N/A` or `none` for low-impact work
- how to choose between unit, feature, heavy-governance, and browser coverage
- when database or UI-heavy coverage is justified
- when a test has become too broad
- when to extend an existing family versus introduce a new one
- when a change stays local versus needs escalation
Prefer updating `.specify/README.md` and `README.md` over adding a new long-lived documentation file.
## 5. Run the two required dry runs
### Low-impact validation path
Use a genuinely low-impact template-only or docs-only change, such as a change limited to `.specify/templates/checklist-template.md` and `.specify/README.md`, to prove that:
- the spec prompt can be completed with `N/A` or `none`
- the plan prompt does not demand runtime lanes
- the review checklist stays brief and still ends with an explicit `keep` outcome without forcing fake escalation
Expected authoring answers:
- affected validation lanes: `N/A`
- test purpose / family impact: `none`
- DB / Livewire / Filament / browser usage: `none`
- fixture / helper / factory / seed / context cost impact: `none`
- escalation triggers and outcome: `none`
Expected result: the workflow remains lightweight and completable in under 1 minute for the authoring prompts.
### Canonical review-checklist surface
The generated checklist based on `.specify/templates/checklist-template.md` should ask exactly these decision-grade questions:
- Is the declared validation lane the narrowest lane or lane mix that proves the change?
- Does the test stay in the smallest honest family (`Unit`, `Feature`, `Heavy-Governance`, `Browser`)?
- Is the changed or added test no broader than the behavior it proves?
- Is any database, Livewire, Filament, or browser surface justified over a narrower alternative?
- Do shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default?
- Is the minimal reviewer validation command written explicitly, and is any material drift note recorded?
- Does the reviewer choose one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`?
### High-impact validation path
Use an existing governed spec such as `specs/211-runtime-trend-recalibration/` or a similar multi-lane runtime-governance feature to prove that:
- the spec prompt surfaces lane and heavy-surface choices clearly
- the plan prompt exposes helper, fixture, or lane-shape impact
- the review checklist can ask whether the change creates a new cost center
- the escalation rules distinguish between local documentation and a true follow-up spec
Expected review outcome for `specs/211-runtime-trend-recalibration/`: `document-in-feature`, because the spec already records its own validation lanes, bounded fixture risk, unchanged heavy/browser scope, and runtime drift follow-up inside the active feature.
Expected result: the workflow surfaces the intended escalation decisions without adding a new approval bureaucracy and keeps the representative higher-cost review under 3 minutes.
## 6. Record the validation note
Capture the dry-run outcome in the active spec or implementation PR with at least:
- low-impact scenario used
- high-impact scenario used
- whether any prompt wording was confusing
- which explicit review outcome was chosen
- whether any review question felt redundant or missing
- whether the process stayed lightweight enough for ordinary work
## 7. Recorded dry-run results (2026-04-18)
- **Low-impact scenario**: `.specify/templates/checklist-template.md` plus `.specify/README.md`
Result: the spec and plan prompts were answerable with `N/A` or `none` in under 1 minute, and the checklist still closed with a clear `keep` outcome.
- **Higher-cost scenario**: `specs/211-runtime-trend-recalibration/spec.md` plus `specs/211-runtime-trend-recalibration/plan.md`
Result: the reviewer could reach `document-in-feature` in under 3 minutes because Spec 211 already documents lane fit, bounded helper cost, unchanged heavy/browser scope, and runtime-drift follow-up inside the active delivery artifact.
- **Document-in-feature example**: a feature widens evidence or trend reporting inside an existing governed lane family and records the runtime or recalibration note in its own spec or PR.
- **Follow-up-spec example**: a change introduces a new heavy family, normalizes browser coverage for a new workflow class, or revives an expensive shared default across multiple unrelated tests.
## 8. Completion checklist
- Constitution wording updated and aligned with `TEST-GOV-001`
- Spec, plan, task, and review surfaces use the same lane and escalation vocabulary
- Contributor guidance explains both low-impact and escalation-worthy cases
- Dry runs confirm the workflow is both usable and sufficiently strict
- The review checklist ends with one explicit outcome
- No new governance subsystem, bot, or duplicate handbook was introduced

View File

@ -0,0 +1,49 @@
# Research: Test Suite Authoring Constitution & Review Guardrails
## Decision 1: Reuse and sharpen existing `TEST-GOV-001` workflow surfaces
- **Decision**: Build Spec 212 by extending the current constitution and SpecKit template surfaces that already carry lane/runtime governance instead of inventing a new governance subsystem.
- **Rationale**: The repository already contains `TEST-GOV-001`, a `Testing / Lane / Runtime Impact` section in the spec template, a `Test Governance Check` in the plan template, and runtime-governance language in the task template and repository guidance. The missing value is stronger authoring-time classification, review guardrails, and escalation prompts, not new infrastructure.
- **Alternatives considered**:
- Create a dedicated test-governance framework with its own configuration and commands. Rejected because it would add a second process surface and drift risk.
- Rely on CI and reviewer discretion alone. Rejected because the spec explicitly targets prevention at authoring and review time.
## Decision 2: Keep contributor guidance inside existing repository entry points
- **Decision**: Prefer `.specify/README.md`, `README.md`, the updated templates, and this feature's `quickstart.md` over introducing a new standalone contributor handbook.
- **Rationale**: These are the surfaces contributors already read during spec work and repository setup. Reusing them keeps the guidance discoverable without creating another long-lived document that can drift.
- **Alternatives considered**:
- Add a new `docs/test-authoring-governance.md` file. Rejected because it would split the guidance away from the authoring workflow and increase maintenance burden.
- Encode all guidance only in the constitution. Rejected because contributors need operational examples at the point of use, not just high-level rules.
## Decision 3: Make review guardrails question-based, not score-based
- **Decision**: Model the review surface as a short set of direct questions plus explicit escalation outcomes rather than a weighted scorecard or approval rubric.
- **Rationale**: Reviewers need a fast keep, split, or escalate decision aid. Direct questions about lane fit, breadth, database and UI-heavy justification, fixture cost, and escalation need are easier to apply in under 3 minutes than a scoring framework.
- **Alternatives considered**:
- A weighted review rubric. Rejected because it would slow down reviews and encourage ritual over judgment.
- A long prose checklist. Rejected because it would be harder to scan and easier to ignore.
## Decision 4: Escalation stays document-first, not CI-block-first
- **Decision**: New heavy families, new browser scope, revived expensive defaults, and material lane-cost changes should trigger explicit documentation and follow-up decisions in the active spec or PR, not a new automatic CI policy.
- **Rationale**: These are judgment-heavy signals. The right first move is to make them visible and attributable at authoring and review time, not to bolt on a new blocking system that would be brittle and hard to calibrate.
- **Alternatives considered**:
- Fail CI immediately on any detected heavy-surface expansion. Rejected because many legitimate changes still need human context and scoping decisions.
- Treat escalation as optional reviewer prose. Rejected because optional language is exactly what the spec is trying to harden.
## Decision 5: Validate both the low-friction and high-risk paths
- **Decision**: Validate the updated workflow against one docs-only or template-only `N/A` flow and one higher-cost governed-spec flow that touches multiple runtime governance concerns.
- **Rationale**: The low-impact path proves the process stays lightweight. The higher-cost path proves the workflow can surface lane, heavy, fixture, and escalation questions before implementation.
- **Alternatives considered**:
- Validate only against a higher-cost spec. Rejected because it would not prove that ordinary low-impact work stays fast.
- Validate only against hypothetical examples. Rejected because real repo artifacts are needed to check phrasing and friction.
## Decision 6: Use logical contract artifacts for workflow semantics
- **Decision**: Represent the design with one schema-first governance-pack contract and one logical OpenAPI contract even though the feature adds no transport API, and treat both artifacts as design-time scaffolding rather than new maintained workflow surfaces.
- **Rationale**: Neighboring governance specs already use logical OpenAPI plus JSON Schema to describe repository-owned workflow truth. Reusing that pattern keeps planning artifacts consistent and gives the later task-generation step structured inputs.
- **Alternatives considered**:
- Markdown-only planning notes. Rejected because they are less structured and less reusable for task generation and validation.
- A runtime API contract. Rejected because this feature does not introduce a runtime service or endpoint.

View File

@ -0,0 +1,243 @@
# Feature Specification: Test Suite Authoring Constitution & Review Guardrails
**Feature Branch**: `212-test-authoring-guardrails`
**Created**: 2026-04-18
**Status**: Draft
**Input**: User description: "Spec 212 — Test Suite Authoring Constitution & Review Guardrails"
## Spec Candidate Check *(mandatory — SPEC-GATE-001)*
- **Problem**: TenantPilot can now measure, segment, and enforce test-suite cost, but contributors still lack a mandatory authoring and review routine that keeps new tests correctly classified, minimally provisioned, and lane-aware before they become permanent suite cost.
- **Today's failure**: New tests can still be written convenience-first, broaden shared helpers or fixtures, or expand heavy families without early disclosure, so the first strong signal often appears only after review fatigue or CI slowdown.
- **User-visible improvement**: Contributors and reviewers get lightweight, repeatable prompts that make lane impact, heavy risk, fixture cost, and escalation needs explicit while the change is still small and easy to redirect.
- **Smallest enterprise-capable version**: Extend the existing governance system with a short authoring constitution, mandatory test-impact prompts in spec and plan flows, a compact task checklist, a concise review checklist, explicit escalation rules, and contributor guidance validated against representative specs.
- **Explicit non-goals**: No new CI lanes, no new runtime-optimization program, no automatic PR bot, no broader coding constitution outside test authoring and review, and no attempt to replace human test design judgment with bureaucracy.
- **Permanent complexity imported**: A small set of governance prompts, reviewer questions, escalation vocabulary, contributor guidance, and maintenance responsibility for keeping those artifacts aligned with Specs 206 through 211.
- **Why now**: Specs 206 through 211 built the lane, budget, heavy-segmentation, CI, and trend foundation; without authoring-time and review-time guardrails, the suite can still drift back toward hidden cost growth at the exact point new tests are introduced.
- **Why not local**: Ad hoc reviewer discipline and tribal memory do not scale across contributors, feature specs, or future maintainers, and they do not leave a durable, reviewable record of why a costly test choice was accepted.
- **Approval class**: Cleanup
- **Red flags triggered**: New governance prompts across several authoring surfaces and some new shared vocabulary around escalation. Defense: the feature stays repository-scoped, avoids new runtime infrastructure, and intentionally closes an already active governance program instead of opening a new one.
- **Score**: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexität: 1 | Produktnähe: 1 | Wiederverwendung: 2 | **Gesamt: 10/12**
- **Decision**: approve
## Spec Scope Fields *(mandatory)*
- **Scope**: workspace
- **Primary Routes**: No end-user HTTP routes change. The affected surfaces are repository-owned governance artifacts: the test authoring constitution, specification routine, planning routine, task routine, review checklist, and contributor guidance.
- **Data Ownership**: Workspace-owned authoring templates, governance rules, review prompts, and validation notes. No tenant-owned records or product runtime tables are introduced.
- **RBAC**: No product authorization behavior changes. The actors are contributors, reviewers, and maintainers applying repository governance.
## Proportionality Review *(mandatory when structural complexity is introduced)*
- **New source of truth?**: no
- **New persisted entity/table/artifact?**: yes, but only repository-owned governance artifacts such as constitution text, prompt blocks, checklists, and validation notes
- **New abstraction?**: no new software abstraction; only a documented decision routine for authoring and review
- **New enum/state/reason family?**: no
- **New cross-domain UI framework/taxonomy?**: no
- **Current operator problem**: Contributors can still introduce expensive or misclassified tests at authoring time, while reviewers lack a short, explicit checklist for catching avoidable suite-cost drift before merge.
- **Existing structure is insufficient because**: Lane budgets, CI enforcement, and trend reporting detect problems after a test already exists, but they do not reliably force contributors to justify heavy surfaces or shared setup cost while the change is still being designed.
- **Narrowest correct implementation**: Add lightweight governance text and prompt surfaces directly to the existing spec, plan, task, and review workflow instead of inventing new runtime tooling or a separate approval system.
- **Ownership cost**: Maintainers must keep the constitution text, prompt blocks, checklist language, and representative examples aligned as lane vocabulary and governance expectations evolve.
- **Alternative intentionally rejected**: Relying on informal reviewer comments, CI failures alone, or scattered contributor notes with no shared authoring contract.
- **Release truth**: Current-release repository truth needed to make the test-governance foundation from Specs 206 through 211 durable.
## Problem Statement
Specs 206 through 211 gave TenantPilot a strong technical foundation for test-suite governance:
- lane structure and runtime budgets exist
- shared fixture cost has been reduced
- heavy Filament and Livewire families have been segmented
- heavy-governance cost is treated explicitly
- CI runs the governed lanes and enforces their runtime expectations
- runtime trend and baseline logic make erosion visible over time
That foundation is strong, but it is still mostly reactive. The main remaining gap is the moment where new tests are conceived, written, and reviewed.
The biggest slowdown risks are still created at authoring time, when a contributor chooses whether a test stays narrow or immediately reaches for database, Livewire, Filament, browser, or broad shared helpers. Review is the last reliable checkpoint before those choices become permanent suite cost. If authoring and review lack explicit guardrails, the repository drifts back toward convenience-first testing and only learns about the damage after CI or runtime budgets start complaining.
This feature closes that gap by embedding the existing governance model directly into the daily workflow for specification, planning, tasking, authoring, and review.
## Dependencies
- Depends on Spec 206 — Test Suite Governance & Performance Foundation for lane vocabulary, cost awareness, and the original governance contract.
- Depends on Spec 207 — Shared Test Fixture Slimming for the expectation that common setup remains intentionally small.
- Depends on Spec 208 — Filament/Livewire Heavy Suite Segmentation for the definition and containment of expensive UI-driven families.
- Depends on Spec 209 — Heavy Governance Lane Cost Reduction for the principle that heavy governance must be deliberate rather than accidental.
- Depends on Spec 210 — CI Test Matrix & Runtime Budget Enforcement for enforced lane boundaries and runtime budget evidence.
- Depends on Spec 211 — Test Runtime Trend Reporting & Baseline Recalibration for the ability to see long-horizon cost drift and justify escalation.
- Recommended after the governed lanes, CI enforcement, and trend visibility are stable enough that the remaining problem is authoring and review behavior rather than missing infrastructure.
- Blocks durable, everyday embedding of the existing governance model at the point where new tests enter the suite.
- Does not block normal feature delivery when current reviewer discipline is already handling the risk manually.
## Goals
- Embed test-governance thinking directly into the normal development routine.
- Give contributors explicit rules for classifying and justifying new tests.
- Give reviewers concrete prompts that catch hidden suite-cost drift before merge.
- Require new specs, plans, and task lists to state their test and lane impact deliberately.
- Keep heavy-family creation, browser expansion, and shared setup cost from appearing silently.
- Prevent drift earlier than CI or budget failures.
- Close the open loop in the existing test-governance program.
## Non-Goals
- Creating another runtime-optimization or lane-segmentation spec.
- Expanding the CI matrix or adding new infrastructure by default.
- Replacing thoughtful test design with a rigid checklist ritual.
- Creating a universal engineering constitution for every domain outside test authoring and review.
- Introducing PR bots or fully automated review comments as a requirement for this slice.
- Reopening lane or budget design that Specs 206 through 211 already settled.
## Assumptions
- Specs 206 through 211 remain the authoritative source for lane vocabulary, heavy-family expectations, budget stewardship, and runtime-trend interpretation.
- The existing specification, planning, and task routines are the correct places to force early test-impact thinking.
- Reviewers will continue to use judgment; the checklist is meant to sharpen decisions, not replace them.
- Most feature work should still be able to satisfy the added prompts with concise answers rather than long essays.
## Key Decisions
- **Prevention is better than post-facto enforcement**: The cheapest place to control suite cost is before the test is committed, not after CI exposes the damage.
- **Constitution rules must stay lightweight but binding**: The authoring contract must be short enough to use every day and strong enough to matter when a costly choice appears.
- **Every spec must consider test impact explicitly**: New feature work should say which lane, family, and runtime implications it touches instead of leaving that question implicit.
- **Reviewers need decision-grade prompts**: Review guardrails should ask direct questions about lane fit, breadth, fixture cost, heavy-family creation, and escalation need rather than vague reminders to care about performance.
- **Classification must happen at authoring time**: Contributors should decide up front whether a test belongs in a narrow lane, a heavier lane, or a new heavy family.
- **New heavy cost centers must announce themselves**: New browser scope, new heavy families, major lane-cost movement, and revived expensive defaults require explicit escalation instead of silent normalization.
## Required Outcomes
### Test Authoring Constitution
The repository must gain a short, durable constitution section that states the standing rules for test classification, lane awareness, justified use of database or UI-heavy surfaces, minimal fixtures by default, and refusal of hidden shared-cost growth.
### Specification Routine Extension
Every new feature specification must answer a small, standard test-impact block that covers affected lanes, new or expanded test families, heavy or browser relevance, expected budget or trend effect, and the validation expected at review time.
### Planning Routine Extension
The planning workflow must make test-impact decisions visible before implementation by asking what test types change, whether helpers or fixtures widen, whether lane reshaping is needed, and what final validation is required.
### Task Routine Extension
Task lists must carry a short test-governance checklist that keeps lane assignment, minimal setup, relevant validation, and budget or trend disclosure visible while work is broken down.
### Review Guardrails
Reviewers must have a fast checklist that asks whether a test is in the right lane, whether it is unnecessarily broad, whether database or UI-heavy surfaces are actually required, whether setup is secretly expensive, whether the change should be split, and whether escalation is required. The canonical daily-use review surface is the generated checklist based on `.specify/templates/checklist-template.md`, with `.specify/README.md` acting as the reviewer entry point for how to apply it.
### Escalation Rules
The governance model must define when a change stops being an ordinary test delta and becomes a governance signal that needs explicit documentation or a follow-up spec, especially for new heavy families, new browser coverage, material lane-cost changes, revived expensive defaults, or broad suite reshaping.
### Contributor Guidance
Contributors must get short guidance that explains how to choose between narrow and heavy test surfaces, how to detect an overly broad test, when shared setup is justified, and when a change belongs inside an existing family versus creating a new one.
### Workflow Integration
The resulting rules must appear where they are used: in the constitution, specification routine, planning routine, task routine, review checklist, and lightweight contributor-facing guidance.
## Testing / Lane / Runtime Impact *(mandatory for runtime behavior changes)*
- **Validation lane(s)**: N/A
- **Why these lanes are sufficient**: N/A. This feature changes repository authoring and review artifacts rather than product runtime behavior.
- **New or expanded test families**: none
- **Fixture / helper cost impact**: none directly. The intended effect is future prevention of unnecessary shared setup cost rather than immediate new fixture or helper behavior.
- **Heavy coverage justification**: none
- **Budget / baseline / trend impact**: none directly. The feature should improve earlier disclosure of future drift, but it does not itself change lane membership, budgets, baselines, or runtime measurements.
- **Planned validation commands**: N/A. Validation is document-based and consists of applying the new prompts and guardrails to representative specs, plans, and task flows.
## Workflow Validation Notes (2026-04-18)
### Low-Impact Authoring Dry Run
- **Scenario**: Apply the updated prompts to a template-only change limited to `.specify/templates/checklist-template.md` and `.specify/README.md`.
- **Result**: The authoring flow can be completed with concise `N/A` or `none` answers in under 1 minute because the prompts only ask for runtime-specific detail when impact actually exists.
- **Wording adjustment captured**: The spec and plan templates now ask for a short reviewer handoff and an explicit escalation outcome so low-impact work stays lightweight while still ending in a clear review disposition.
### Higher-Cost Review Dry Run
- **Scenario**: Apply the updated review guardrails to `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/plan.md`.
- **Result**: The reviewer can confirm lane fit, bounded helper cost, no new heavy/browser promotion, and explicit validation commands in under 3 minutes. The correct outcome is `document-in-feature` because Spec 211 changes governed runtime-reporting behavior inside existing lane families and already records its own drift and recalibration notes.
- **Escalation boundary proved**: A true `follow-up-spec` remains reserved for recurring pain or structural lane-model changes, such as introducing a new heavy family, normalizing browser coverage for a new workflow class, or reviving an expensive shared default across unrelated tests.
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Classify Test Impact While Authoring (Priority: P1)
As a contributor preparing a new feature spec or plan, I want the workflow to ask about lane impact, heavy coverage, and fixture cost before implementation begins so I choose the smallest justified test surface instead of defaulting to convenience-first coverage.
**Why this priority**: This is the earliest and cheapest place to stop avoidable suite-cost drift.
**Independent Test**: Apply the workflow to a genuinely low-impact docs-only or template-only scenario, such as a change limited to `.specify/templates/checklist-template.md` and `.specify/README.md`, and confirm that the author can answer with concise `N/A` or `none` responses while still making any affected lanes, new or expanded test families, heavy-surface justification, and minimal validation expectations explicit when they exist.
**Acceptance Scenarios**:
1. **Given** a feature spec that introduces or changes tests, **When** the author completes the required governance prompts, **Then** the spec states the affected lane or lanes, any family expansion, and the required validation scope explicitly.
2. **Given** a proposed test that reaches for database, Livewire, Filament, or browser coverage, **When** the author documents the approach, **Then** the justification and minimal-setup expectation are stated rather than assumed.
---
### User Story 2 - Reviewers Catch Hidden Suite Cost Before Merge (Priority: P1)
As a reviewer evaluating new or changed tests, I want a short guardrail checklist so I can quickly judge whether the test belongs in the chosen lane, whether the setup is too broad, and whether the change needs escalation instead of silent acceptance.
**Why this priority**: Review is the last reliable checkpoint before hidden cost becomes permanent repository truth.
**Independent Test**: Use the canonical generated review checklist on representative test changes and confirm that the reviewer can reach a clear keep, split, or escalate decision without relying on unwritten tribal knowledge.
**Acceptance Scenarios**:
1. **Given** a test that is broader than necessary for its intent, **When** the reviewer applies the checklist, **Then** the checklist makes the breadth and likely narrower alternative visible.
2. **Given** a change that quietly expands a heavy family or shared helper default, **When** the reviewer applies the checklist, **Then** the need for explicit escalation or follow-up governance is surfaced before merge.
---
### User Story 3 - Escalate New Cost Centers Deliberately (Priority: P2)
As a maintainer stewarding suite health, I want clear escalation rules so that new heavy families, new browser scope, or material lane-cost shifts are documented and evaluated explicitly instead of being normalized through drift.
**Why this priority**: The governance model stays durable only if major new cost centers announce themselves early and visibly.
**Independent Test**: Apply the escalation rules to representative examples involving new browser or heavy scope and confirm that the outcome is either a documented local exception or an explicit follow-up governance action.
**Acceptance Scenarios**:
1. **Given** a change that introduces a new heavy family or new browser coverage, **When** the escalation rules are applied, **Then** the change is classified as an explicit governance decision rather than a routine test edit.
2. **Given** a small test change that stays within an existing lane and family, **When** the escalation rules are applied, **Then** the workflow allows the change to proceed without forcing unnecessary process overhead.
### Edge Cases
- A feature has no meaningful runtime or test impact; the workflow must allow concise `N/A` or `none` answers instead of forcing boilerplate.
- One feature legitimately affects multiple existing lanes; the prompts must allow multi-lane disclosure without implying a new family.
- A seemingly small helper or factory default would silently broaden setup cost across many tests; the guardrails must treat this as a governance concern even if the local diff looks minor.
- A reviewer sees budget or baseline implications before CI is red; the escalation rules must allow early documentation rather than waiting for a hard failure.
- A single justified browser or heavy scenario must not automatically bless wider copy-paste expansion into nearby tests.
## Requirements *(mandatory)*
### Functional Requirements
- **FR-001**: The repository MUST define a permanent test authoring constitution that requires explicit test classification, deliberate lane awareness, justified use of database or UI-heavy surfaces, minimal fixtures by default, and rejection of hidden shared-cost growth.
- **FR-002**: The specification routine MUST require a standard test-impact section for every new spec that captures affected lane or lanes, new or expanded test families, heavy or browser relevance, expected budget or trend implications, and reviewer validation expectations, or explicit `N/A` or `none` answers when no such impact exists.
- **FR-003**: The planning routine MUST require a test-impact block that identifies which test types change, whether shared helpers, fixtures, factories, or defaults widen, whether lane reassignment or lane addition is implicated, and what final validation is required.
- **FR-004**: The task routine MUST include a short standardized checklist that confirms lane assignment, avoidance of unnecessary heavy cost, use of minimal fixtures or helpers, relevant validation, and documentation of budget or trend implications when present.
- **FR-005**: The review routine MUST provide a concise guardrail checklist that asks whether the test is in the correct lane, whether it is unnecessarily broad, whether database, Livewire, Filament, or browser usage is justified, whether setup is secretly expensive, whether the test should be split, and whether escalation is required. The canonical checklist surface is the generated review checklist based on `.specify/templates/checklist-template.md`, with `.specify/README.md` linking reviewers to its use.
- **FR-006**: The governance model MUST define explicit escalation rules for new heavy families, new browser coverage, material lane-cost change, broad new Filament or Livewire governance surfaces, revived expensive helper or factory defaults, budget or baseline relevant shifts, and major suite reshaping.
- **FR-007**: Contributor guidance MUST explain how to choose between narrow and heavy test surfaces, when database or UI-heavy coverage is justified, how to recognize an overly broad test, and when to extend an existing family versus introduce a new one.
- **FR-008**: The guardrails MUST be integrated into the everyday authoring and review surfaces used by contributors and reviewers, including the constitution, specification routine, planning routine, task routine, review checklist, and contributor guidance.
- **FR-009**: The added governance prompts MUST remain lightweight enough that an ordinary feature with little or no test impact can satisfy them with concise answers and without material process drag.
- **FR-010**: The completed guidance MUST be validated against at least one representative low-risk docs-only or template-only flow, such as a change limited to `.specify/templates/checklist-template.md` and `.specify/README.md`, and one representative higher-cost or multi-lane scenario to confirm that the rules are usable, do not contradict existing lane or budget governance, and catch the intended escalation cases.
- **FR-011**: The governance rules MUST explicitly forbid introducing new expensive shared helper, factory, seed, or fixture defaults without disclosing the cost impact and either containing the change locally or escalating it as governance-relevant work.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-001**: In dry runs on at least two representative feature specs, authors can complete the required test-impact prompts with no unanswered required field and with the affected lane or lanes, family impact, and validation scope made explicit.
- **SC-002**: In representative review exercises, reviewers can use the guardrail checklist to reach a clear keep, split, or escalate decision within 3 minutes for each sample change.
- **SC-003**: Every validation example that introduces new browser coverage, a new heavy family, or a material lane-cost shift is explicitly classified as either a documented local exception or a governance escalation; none remain implicit.
- **SC-004**: A representative low-impact docs-only or template-only scenario with no runtime or meaningful test change can satisfy the added governance prompts in under 1 minute using concise `N/A` or `none` answers.
- **SC-005**: Validation against representative specs, plans, and task flows shows no contradiction with the existing lane, budget, baseline, or runtime-trend model established by Specs 206 through 211.

View File

@ -0,0 +1,169 @@
# Tasks: Test Suite Authoring Constitution & Review Guardrails
**Input**: Design documents from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/212-test-authoring-guardrails/`
**Prerequisites**: `plan.md` (required), `spec.md` (required), `research.md`, `data-model.md`, `contracts/`, `quickstart.md`
**Tests**: Not required. This feature is docs and workflow only, so validation is by representative low-impact and higher-cost dry runs, cross-artifact consistency review, and recording the outcomes in the active spec artifacts.
**Organization**: Tasks are grouped by user story so each story can be implemented and validated independently where possible.
## Phase 1: Setup (Shared Context)
**Purpose**: Freeze the real repository workflow surfaces before editing them.
- [X] T001 Audit `.specify/memory/constitution.md`, `.specify/templates/spec-template.md`, `.specify/templates/plan-template.md`, `.specify/templates/tasks-template.md`, `.specify/templates/checklist-template.md`, `.specify/README.md`, and `README.md` against `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/plan.md` to confirm the exact guardrail gaps this feature must close
---
## Phase 2: Foundational (Blocking Prerequisites)
**Purpose**: Establish the shared vocabulary that every later template and checklist update depends on.
**Critical**: No user story work should begin until this phase is complete.
- [X] T002 Update `.specify/memory/constitution.md` with the canonical test authoring and review guardrail rules for classification, lane awareness, heavy-surface justification, minimal fixtures, expensive-default bans, reviewer expectations, and escalation outcomes
**Checkpoint**: The shared governance vocabulary is stable enough for story-specific template and guidance updates.
---
## Phase 3: User Story 1 - Classify Test Impact While Authoring (Priority: P1) 🎯 MVP
**Goal**: Make contributors declare lane impact, heavy justification, and minimal proof while writing specs and plans.
**Independent Test**: Apply the updated spec and plan prompts to a genuinely low-impact template-only scenario limited to `.specify/templates/checklist-template.md` and `.specify/README.md`, confirm the low-impact path can be answered with concise `N/A` or `none` responses, and verify the required authoring questions are explicit.
### Implementation for User Story 1
- [X] T003 [P] [US1] Update `.specify/templates/spec-template.md` so `Testing / Lane / Runtime Impact` explicitly asks for affected lane fit, heavy-surface justification, fixture or helper cost disclosure, escalation triggers, and concise `N/A` or `none` handling
- [X] T004 [P] [US1] Update `.specify/templates/plan-template.md` so `Test Governance Check` explicitly asks for changed test types, helper or factory widening, lane reshaping, closing validation, and where material drift notes must be recorded
- [X] T005 [US1] Validate the authoring flow using a low-impact template-only scenario limited to `.specify/templates/checklist-template.md` and `.specify/README.md` as the representative `N/A` path, then record the outcome and any wording adjustments in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`
**Checkpoint**: Contributors can classify test impact during spec and plan authoring without extra workflow overhead.
---
## Phase 4: User Story 2 - Reviewers Catch Hidden Suite Cost Before Merge (Priority: P1)
**Goal**: Give reviewers a fixed, quick checklist that surfaces hidden test cost and points to clear outcomes.
**Independent Test**: Apply the updated review checklist to a representative higher-cost governed spec flow and confirm the reviewer can reach a keep, split, or escalate decision in under 3 minutes.
### Implementation for User Story 2
- [X] T006 [P] [US2] Update `.specify/templates/tasks-template.md` so generated task lists carry a short test-governance checklist for lane assignment, minimal setup, relevant validation, hidden-cost prevention, and budget or trend note visibility
- [X] T007 [P] [US2] Update `.specify/templates/checklist-template.md` as the canonical generated review-checklist surface with a fixed guardrail structure covering lane fit, breadth, DB or UI-heavy necessity, setup cost, split need, and escalation outcomes
- [X] T008 [US2] Update `.specify/README.md` as the reviewer entry point for applying the canonical review checklist and interpreting `keep`, `split`, `document-in-feature`, `follow-up-spec`, and `reject-or-split` outcomes
- [X] T009 [US2] Validate the review guardrails against `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/plan.md`, then record the representative review outcome and timing note in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`
**Checkpoint**: Reviewers have a stable guardrail surface that catches hidden suite cost before merge.
---
## Phase 5: User Story 3 - Escalate New Cost Centers Deliberately (Priority: P2)
**Goal**: Make new heavy families, new browser scope, revived expensive defaults, and material lane-cost shifts announce themselves explicitly.
**Independent Test**: Apply the escalation rules to a representative higher-cost multi-lane workflow and confirm the result distinguishes between local documentation and a true follow-up governance action.
### Implementation for User Story 3
- [X] T010 [P] [US3] Update `README.md` with concise contributor guidance for choosing unit vs feature vs heavy-governance vs browser coverage, justifying database or UI-heavy usage, recognizing over-broad tests, spotting escalation triggers early, and deciding when to extend an existing family versus introduce a new one
- [X] T011 [US3] Update `specs/212-test-authoring-guardrails/quickstart.md` with the canonical low-impact validation scenario, the canonical review-checklist surface, and explicit document-in-feature vs follow-up-spec escalation examples that match the live templates and docs
- [X] T012 [US3] Validate escalation handling against a representative higher-cost multi-lane flow using `specs/211-runtime-trend-recalibration/spec.md`, `specs/211-runtime-trend-recalibration/plan.md`, and the implemented guidance surfaces, then record document-local vs follow-up-spec examples in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`
**Checkpoint**: Escalation-worthy test cost changes are explicit, documented, and consistently interpreted.
---
## Phase 6: Polish & Cross-Cutting Concerns
**Purpose**: Reconcile the finished workflow surfaces and remove drift between the templates, guidance, and active spec artifacts.
- [X] T013 Run the `specs/212-test-authoring-guardrails/quickstart.md` completion checklist against `.specify/memory/constitution.md`, `.specify/templates/spec-template.md`, `.specify/templates/plan-template.md`, `.specify/templates/tasks-template.md`, `.specify/templates/checklist-template.md`, `.specify/README.md`, `README.md`, and `specs/212-test-authoring-guardrails/spec.md`, then remove any duplicated or conflicting wording across the updated governance surfaces
---
## Dependencies & Execution Order
### Phase Dependencies
- **Setup (Phase 1)**: No dependencies and can start immediately.
- **Foundational (Phase 2)**: Depends on Phase 1 and blocks all user story work.
- **User Story 1 (Phase 3)**: Depends on Phase 2 and is the MVP slice.
- **User Story 2 (Phase 4)**: Depends on Phase 2 and can proceed independently of User Story 1 once the shared vocabulary is stable.
- **User Story 3 (Phase 5)**: Depends on Phase 2 and benefits from the implemented authoring and review surfaces from User Stories 1 and 2 before final escalation examples are recorded.
- **Polish (Phase 6)**: Depends on all desired user stories being complete.
### User Story Dependencies
- **User Story 1 (P1)**: Can begin immediately after Foundational and delivers the first usable authoring workflow increment.
- **User Story 2 (P1)**: Can begin immediately after Foundational and delivers a separate review workflow increment.
- **User Story 3 (P2)**: Reuses the stable vocabulary from Foundational and should finalize once the live authoring and review surfaces are in place.
### Within Each User Story
- Shared vocabulary changes in `.specify/memory/constitution.md` must land before any template or checklist wording is finalized.
- Template changes should be implemented before story-specific validation notes are recorded in `spec.md` and `quickstart.md`.
- Low-impact and higher-cost dry-run validation must complete before closing the corresponding story.
- Cross-artifact cleanup should happen only after all targeted workflow surfaces are updated.
### Parallel Opportunities
- T003 and T004 can run in parallel because they update different template surfaces for the same authoring flow.
- T006 and T007 can run in parallel because they update different checklist-producing template surfaces.
- T010 can run in parallel with the earlier story validation recording once the shared vocabulary is stable because it targets root contributor guidance rather than the SpecKit templates.
---
## Parallel Example: User Story 1
```bash
# After T002 establishes the shared vocabulary, these can proceed in parallel:
Task: "Update .specify/templates/spec-template.md"
Task: "Update .specify/templates/plan-template.md"
```
---
## Parallel Example: User Story 2
```bash
# After T002 establishes the shared vocabulary, these can proceed in parallel:
Task: "Update .specify/templates/tasks-template.md"
Task: "Update .specify/templates/checklist-template.md"
```
---
## Implementation Strategy
### MVP First (User Story 1 Only)
1. Complete Phase 1: Setup.
2. Complete Phase 2: Foundational.
3. Complete Phase 3: User Story 1.
4. Validate the low-impact `N/A` path using a template-only scenario limited to `.specify/templates/checklist-template.md` and `.specify/README.md` before continuing.
### Incremental Delivery
1. Lock the shared constitution vocabulary first.
2. Deliver the authoring prompts for specs and plans.
3. Deliver the reviewer-facing task and checklist surfaces.
4. Add contributor guidance and explicit escalation examples.
5. Finish with cross-artifact cleanup and quickstart completion review.
### Parallel Team Strategy
1. One contributor can update the spec and plan templates while another prepares the task and checklist template changes after Foundational is done.
2. Reviewer guidance in `.specify/README.md` can follow once the checklist surface is stable.
3. Root `README.md` contributor guidance and final escalation examples can be completed in parallel with late-stage validation-note drafting.
---
## Notes
- `[P]` tasks operate on different files or independent workflow surfaces and can run in parallel once dependencies are satisfied.
- `[US1]`, `[US2]`, and `[US3]` map tasks directly to the user stories in `spec.md`.
- This feature is docs and workflow only, so validation is recorded in the active spec artifacts rather than by running Pest lanes.
- The final workflow must stay lightweight for low-impact work while still surfacing explicit escalation for new test cost centers.