Main Confidence / confidence (push) Failing after 48s

Details

docs: add Spec 212 test authoring guardrails (#245 )

## Summary

- add Spec 212 planning artifacts for test authoring constitution and review guardrails
- expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance
- define the canonical review checklist outcomes and record low-impact and higher-cost validation examples

## Validation

- docs/workflow only; no runtime Pest or Sail test lanes were run
- validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #245

2026-04-18 10:08:00 +00:00

7.0 KiB

Raw Blame History

Quickstart: Test Suite Authoring Constitution & Review Guardrails

This feature is repository-governance only. It does not change application runtime behavior, validation lanes, or deployment infrastructure. The goal is to tighten the authoring and review workflow so future test changes declare cost and escalation signals earlier.

1. Confirm the implementation surfaces

Review the files that already carry test-governance workflow truth:

.specify/memory/constitution.md
.specify/templates/spec-template.md
.specify/templates/plan-template.md
.specify/templates/tasks-template.md
.specify/templates/checklist-template.md
.specify/README.md
README.md

Do not create a parallel handbook unless one of these surfaces cannot carry the needed guidance cleanly.

2. Tighten the constitution first

Update the constitution so the standing rules are explicit about:

authoring-time classification of new and changed tests
naming affected lanes when runtime behavior or tests change
justifying database, Livewire, Filament, or browser usage
keeping fixtures, helpers, factories, and seeds cheap by default
escalating new heavy families, new browser scope, revived expensive defaults, or material lane-cost shifts
giving reviewers a stable decision-grade checklist target

Keep the language short and binding.

3. Update the template surfaces in order

Apply the same vocabulary consistently across the authoring workflow:

spec-template.md: strengthen the existing Testing / Lane / Runtime Impact block with authoring-time classification and escalation prompts.
plan-template.md: strengthen the Test Governance Check so helper widening, lane reshaping, and closing validation are explicit before implementation.
tasks-template.md: standardize the short task-level governance checklist.
checklist-template.md as the canonical generated review-checklist surface, with .specify/README.md as the reviewer entry point: provide the fixed review guardrail questions and expected escalation outcomes.

Avoid asking the same question in three different ways across the templates.

4. Update contributor guidance

Keep the contributor-facing explanation concise and practical:

how to answer N/A or none for low-impact work
how to choose between unit, feature, heavy-governance, and browser coverage
when database or UI-heavy coverage is justified
when a test has become too broad
when to extend an existing family versus introduce a new one
when a change stays local versus needs escalation

Prefer updating .specify/README.md and README.md over adding a new long-lived documentation file.

5. Run the two required dry runs

Low-impact validation path

Use a genuinely low-impact template-only or docs-only change, such as a change limited to .specify/templates/checklist-template.md and .specify/README.md, to prove that:

the spec prompt can be completed with N/A or none
the plan prompt does not demand runtime lanes
the review checklist stays brief and still ends with an explicit keep outcome without forcing fake escalation

Expected authoring answers:

affected validation lanes: N/A
test purpose / family impact: none
DB / Livewire / Filament / browser usage: none
fixture / helper / factory / seed / context cost impact: none
escalation triggers and outcome: none

Expected result: the workflow remains lightweight and completable in under 1 minute for the authoring prompts.

Canonical review-checklist surface

The generated checklist based on .specify/templates/checklist-template.md should ask exactly these decision-grade questions:

Is the declared validation lane the narrowest lane or lane mix that proves the change?
Does the test stay in the smallest honest family (Unit, Feature, Heavy-Governance, Browser)?
Is the changed or added test no broader than the behavior it proves?
Is any database, Livewire, Filament, or browser surface justified over a narrower alternative?
Do shared helpers, factories, seeds, fixtures, and context defaults stay cheap by default?
Is the minimal reviewer validation command written explicitly, and is any material drift note recorded?
Does the reviewer choose one explicit outcome: keep, split, document-in-feature, follow-up-spec, or reject-or-split?

High-impact validation path

Use an existing governed spec such as specs/211-runtime-trend-recalibration/ or a similar multi-lane runtime-governance feature to prove that:

the spec prompt surfaces lane and heavy-surface choices clearly
the plan prompt exposes helper, fixture, or lane-shape impact
the review checklist can ask whether the change creates a new cost center
the escalation rules distinguish between local documentation and a true follow-up spec

Expected review outcome for specs/211-runtime-trend-recalibration/: document-in-feature, because the spec already records its own validation lanes, bounded fixture risk, unchanged heavy/browser scope, and runtime drift follow-up inside the active feature.

Expected result: the workflow surfaces the intended escalation decisions without adding a new approval bureaucracy and keeps the representative higher-cost review under 3 minutes.

6. Record the validation note

Capture the dry-run outcome in the active spec or implementation PR with at least:

low-impact scenario used
high-impact scenario used
whether any prompt wording was confusing
which explicit review outcome was chosen
whether any review question felt redundant or missing
whether the process stayed lightweight enough for ordinary work

7. Recorded dry-run results (2026-04-18)

Low-impact scenario: .specify/templates/checklist-template.md plus .specify/README.md Result: the spec and plan prompts were answerable with N/A or none in under 1 minute, and the checklist still closed with a clear keep outcome.
Higher-cost scenario: specs/211-runtime-trend-recalibration/spec.md plus specs/211-runtime-trend-recalibration/plan.md Result: the reviewer could reach document-in-feature in under 3 minutes because Spec 211 already documents lane fit, bounded helper cost, unchanged heavy/browser scope, and runtime-drift follow-up inside the active delivery artifact.
Document-in-feature example: a feature widens evidence or trend reporting inside an existing governed lane family and records the runtime or recalibration note in its own spec or PR.
Follow-up-spec example: a change introduces a new heavy family, normalizes browser coverage for a new workflow class, or revives an expensive shared default across multiple unrelated tests.

8. Completion checklist

Constitution wording updated and aligned with TEST-GOV-001
Spec, plan, task, and review surfaces use the same lane and escalation vocabulary
Contributor guidance explains both low-impact and escalation-worthy cases
Dry runs confirm the workflow is both usable and sufficiently strict
The review checklist ends with one explicit outcome
No new governance subsystem, bot, or duplicate handbook was introduced

7.0 KiB Raw Blame History