## Summary - add Spec 212 planning artifacts for test authoring constitution and review guardrails - expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance - define the canonical review checklist outcomes and record low-impact and higher-cost validation examples ## Validation - docs/workflow only; no runtime Pest or Sail test lanes were run - validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md` Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #245
7.0 KiB
Quickstart: Test Suite Authoring Constitution & Review Guardrails
This feature is repository-governance only. It does not change application runtime behavior, validation lanes, or deployment infrastructure. The goal is to tighten the authoring and review workflow so future test changes declare cost and escalation signals earlier.
1. Confirm the implementation surfaces
Review the files that already carry test-governance workflow truth:
.specify/memory/constitution.md.specify/templates/spec-template.md.specify/templates/plan-template.md.specify/templates/tasks-template.md.specify/templates/checklist-template.md.specify/README.mdREADME.md
Do not create a parallel handbook unless one of these surfaces cannot carry the needed guidance cleanly.
2. Tighten the constitution first
Update the constitution so the standing rules are explicit about:
- authoring-time classification of new and changed tests
- naming affected lanes when runtime behavior or tests change
- justifying database, Livewire, Filament, or browser usage
- keeping fixtures, helpers, factories, and seeds cheap by default
- escalating new heavy families, new browser scope, revived expensive defaults, or material lane-cost shifts
- giving reviewers a stable decision-grade checklist target
Keep the language short and binding.
3. Update the template surfaces in order
Apply the same vocabulary consistently across the authoring workflow:
spec-template.md: strengthen the existingTesting / Lane / Runtime Impactblock with authoring-time classification and escalation prompts.plan-template.md: strengthen theTest Governance Checkso helper widening, lane reshaping, and closing validation are explicit before implementation.tasks-template.md: standardize the short task-level governance checklist.checklist-template.mdas the canonical generated review-checklist surface, with.specify/README.mdas the reviewer entry point: provide the fixed review guardrail questions and expected escalation outcomes.
Avoid asking the same question in three different ways across the templates.
4. Update contributor guidance
Keep the contributor-facing explanation concise and practical:
- how to answer
N/Aornonefor low-impact work - how to choose between unit, feature, heavy-governance, and browser coverage
- when database or UI-heavy coverage is justified
- when a test has become too broad
- when to extend an existing family versus introduce a new one
- when a change stays local versus needs escalation
Prefer updating .specify/README.md and README.md over adding a new long-lived documentation file.
5. Run the two required dry runs
Low-impact validation path
Use a genuinely low-impact template-only or docs-only change, such as a change limited to .specify/templates/checklist-template.md and .specify/README.md, to prove that:
- the spec prompt can be completed with
N/Aornone - the plan prompt does not demand runtime lanes
- the review checklist stays brief and still ends with an explicit
keepoutcome without forcing fake escalation
Expected authoring answers:
- affected validation lanes:
N/A - test purpose / family impact:
none - DB / Livewire / Filament / browser usage:
none - fixture / helper / factory / seed / context cost impact:
none - escalation triggers and outcome:
none
Expected result: the workflow remains lightweight and completable in under 1 minute for the authoring prompts.
Canonical review-checklist surface
The generated checklist based on .specify/templates/checklist-template.md should ask exactly these decision-grade questions:
- Is the declared validation lane the narrowest lane or lane mix that proves the change?
- Does the test stay in the smallest honest family (
Unit,Feature,Heavy-Governance,Browser)? - Is the changed or added test no broader than the behavior it proves?
- Is any database, Livewire, Filament, or browser surface justified over a narrower alternative?
- Do shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default?
- Is the minimal reviewer validation command written explicitly, and is any material drift note recorded?
- Does the reviewer choose one explicit outcome:
keep,split,document-in-feature,follow-up-spec, orreject-or-split?
High-impact validation path
Use an existing governed spec such as specs/211-runtime-trend-recalibration/ or a similar multi-lane runtime-governance feature to prove that:
- the spec prompt surfaces lane and heavy-surface choices clearly
- the plan prompt exposes helper, fixture, or lane-shape impact
- the review checklist can ask whether the change creates a new cost center
- the escalation rules distinguish between local documentation and a true follow-up spec
Expected review outcome for specs/211-runtime-trend-recalibration/: document-in-feature, because the spec already records its own validation lanes, bounded fixture risk, unchanged heavy/browser scope, and runtime drift follow-up inside the active feature.
Expected result: the workflow surfaces the intended escalation decisions without adding a new approval bureaucracy and keeps the representative higher-cost review under 3 minutes.
6. Record the validation note
Capture the dry-run outcome in the active spec or implementation PR with at least:
- low-impact scenario used
- high-impact scenario used
- whether any prompt wording was confusing
- which explicit review outcome was chosen
- whether any review question felt redundant or missing
- whether the process stayed lightweight enough for ordinary work
7. Recorded dry-run results (2026-04-18)
- Low-impact scenario:
.specify/templates/checklist-template.mdplus.specify/README.mdResult: the spec and plan prompts were answerable withN/Aornonein under 1 minute, and the checklist still closed with a clearkeepoutcome. - Higher-cost scenario:
specs/211-runtime-trend-recalibration/spec.mdplusspecs/211-runtime-trend-recalibration/plan.mdResult: the reviewer could reachdocument-in-featurein under 3 minutes because Spec 211 already documents lane fit, bounded helper cost, unchanged heavy/browser scope, and runtime-drift follow-up inside the active delivery artifact. - Document-in-feature example: a feature widens evidence or trend reporting inside an existing governed lane family and records the runtime or recalibration note in its own spec or PR.
- Follow-up-spec example: a change introduces a new heavy family, normalizes browser coverage for a new workflow class, or revives an expensive shared default across multiple unrelated tests.
8. Completion checklist
- Constitution wording updated and aligned with
TEST-GOV-001 - Spec, plan, task, and review surfaces use the same lane and escalation vocabulary
- Contributor guidance explains both low-impact and escalation-worthy cases
- Dry runs confirm the workflow is both usable and sufficiently strict
- The review checklist ends with one explicit outcome
- No new governance subsystem, bot, or duplicate handbook was introduced