TenantAtlas/specs/212-test-authoring-guardrails/quickstart.md
ahmido ea9ef9cb38
Some checks failed
Main Confidence / confidence (push) Failing after 48s
docs: add Spec 212 test authoring guardrails (#245)
## Summary

- add Spec 212 planning artifacts for test authoring constitution and review guardrails
- expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance
- define the canonical review checklist outcomes and record low-impact and higher-cost validation examples

## Validation

- docs/workflow only; no runtime Pest or Sail test lanes were run
- validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #245
2026-04-18 10:08:00 +00:00

129 lines
7.0 KiB
Markdown

# Quickstart: Test Suite Authoring Constitution & Review Guardrails
This feature is repository-governance only. It does not change application runtime behavior, validation lanes, or deployment infrastructure. The goal is to tighten the authoring and review workflow so future test changes declare cost and escalation signals earlier.
## 1. Confirm the implementation surfaces
Review the files that already carry test-governance workflow truth:
- `.specify/memory/constitution.md`
- `.specify/templates/spec-template.md`
- `.specify/templates/plan-template.md`
- `.specify/templates/tasks-template.md`
- `.specify/templates/checklist-template.md`
- `.specify/README.md`
- `README.md`
Do not create a parallel handbook unless one of these surfaces cannot carry the needed guidance cleanly.
## 2. Tighten the constitution first
Update the constitution so the standing rules are explicit about:
- authoring-time classification of new and changed tests
- naming affected lanes when runtime behavior or tests change
- justifying database, Livewire, Filament, or browser usage
- keeping fixtures, helpers, factories, and seeds cheap by default
- escalating new heavy families, new browser scope, revived expensive defaults, or material lane-cost shifts
- giving reviewers a stable decision-grade checklist target
Keep the language short and binding.
## 3. Update the template surfaces in order
Apply the same vocabulary consistently across the authoring workflow:
1. `spec-template.md`: strengthen the existing `Testing / Lane / Runtime Impact` block with authoring-time classification and escalation prompts.
2. `plan-template.md`: strengthen the `Test Governance Check` so helper widening, lane reshaping, and closing validation are explicit before implementation.
3. `tasks-template.md`: standardize the short task-level governance checklist.
4. `checklist-template.md` as the canonical generated review-checklist surface, with `.specify/README.md` as the reviewer entry point: provide the fixed review guardrail questions and expected escalation outcomes.
Avoid asking the same question in three different ways across the templates.
## 4. Update contributor guidance
Keep the contributor-facing explanation concise and practical:
- how to answer `N/A` or `none` for low-impact work
- how to choose between unit, feature, heavy-governance, and browser coverage
- when database or UI-heavy coverage is justified
- when a test has become too broad
- when to extend an existing family versus introduce a new one
- when a change stays local versus needs escalation
Prefer updating `.specify/README.md` and `README.md` over adding a new long-lived documentation file.
## 5. Run the two required dry runs
### Low-impact validation path
Use a genuinely low-impact template-only or docs-only change, such as a change limited to `.specify/templates/checklist-template.md` and `.specify/README.md`, to prove that:
- the spec prompt can be completed with `N/A` or `none`
- the plan prompt does not demand runtime lanes
- the review checklist stays brief and still ends with an explicit `keep` outcome without forcing fake escalation
Expected authoring answers:
- affected validation lanes: `N/A`
- test purpose / family impact: `none`
- DB / Livewire / Filament / browser usage: `none`
- fixture / helper / factory / seed / context cost impact: `none`
- escalation triggers and outcome: `none`
Expected result: the workflow remains lightweight and completable in under 1 minute for the authoring prompts.
### Canonical review-checklist surface
The generated checklist based on `.specify/templates/checklist-template.md` should ask exactly these decision-grade questions:
- Is the declared validation lane the narrowest lane or lane mix that proves the change?
- Does the test stay in the smallest honest family (`Unit`, `Feature`, `Heavy-Governance`, `Browser`)?
- Is the changed or added test no broader than the behavior it proves?
- Is any database, Livewire, Filament, or browser surface justified over a narrower alternative?
- Do shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default?
- Is the minimal reviewer validation command written explicitly, and is any material drift note recorded?
- Does the reviewer choose one explicit outcome: `keep`, `split`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`?
### High-impact validation path
Use an existing governed spec such as `specs/211-runtime-trend-recalibration/` or a similar multi-lane runtime-governance feature to prove that:
- the spec prompt surfaces lane and heavy-surface choices clearly
- the plan prompt exposes helper, fixture, or lane-shape impact
- the review checklist can ask whether the change creates a new cost center
- the escalation rules distinguish between local documentation and a true follow-up spec
Expected review outcome for `specs/211-runtime-trend-recalibration/`: `document-in-feature`, because the spec already records its own validation lanes, bounded fixture risk, unchanged heavy/browser scope, and runtime drift follow-up inside the active feature.
Expected result: the workflow surfaces the intended escalation decisions without adding a new approval bureaucracy and keeps the representative higher-cost review under 3 minutes.
## 6. Record the validation note
Capture the dry-run outcome in the active spec or implementation PR with at least:
- low-impact scenario used
- high-impact scenario used
- whether any prompt wording was confusing
- which explicit review outcome was chosen
- whether any review question felt redundant or missing
- whether the process stayed lightweight enough for ordinary work
## 7. Recorded dry-run results (2026-04-18)
- **Low-impact scenario**: `.specify/templates/checklist-template.md` plus `.specify/README.md`
Result: the spec and plan prompts were answerable with `N/A` or `none` in under 1 minute, and the checklist still closed with a clear `keep` outcome.
- **Higher-cost scenario**: `specs/211-runtime-trend-recalibration/spec.md` plus `specs/211-runtime-trend-recalibration/plan.md`
Result: the reviewer could reach `document-in-feature` in under 3 minutes because Spec 211 already documents lane fit, bounded helper cost, unchanged heavy/browser scope, and runtime-drift follow-up inside the active delivery artifact.
- **Document-in-feature example**: a feature widens evidence or trend reporting inside an existing governed lane family and records the runtime or recalibration note in its own spec or PR.
- **Follow-up-spec example**: a change introduces a new heavy family, normalizes browser coverage for a new workflow class, or revives an expensive shared default across multiple unrelated tests.
## 8. Completion checklist
- Constitution wording updated and aligned with `TEST-GOV-001`
- Spec, plan, task, and review surfaces use the same lane and escalation vocabulary
- Contributor guidance explains both low-impact and escalation-worthy cases
- Dry runs confirm the workflow is both usable and sufficiently strict
- The review checklist ends with one explicit outcome
- No new governance subsystem, bot, or duplicate handbook was introduced