Main Confidence / confidence (push) Failing after 48s

Details

docs: add Spec 212 test authoring guardrails (#245 )

## Summary

- add Spec 212 planning artifacts for test authoring constitution and review guardrails
- expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance
- define the canonical review checklist outcomes and record low-impact and higher-cost validation examples

## Validation

- docs/workflow only; no runtime Pest or Sail test lanes were run
- validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #245

2026-04-18 10:08:00 +00:00

4.9 KiB

Raw Blame History

Research: Test Suite Authoring Constitution & Review Guardrails

Decision 1: Reuse and sharpen existing `TEST-GOV-001` workflow surfaces

Decision: Build Spec 212 by extending the current constitution and SpecKit template surfaces that already carry lane/runtime governance instead of inventing a new governance subsystem.
Rationale: The repository already contains TEST-GOV-001, a Testing / Lane / Runtime Impact section in the spec template, a Test Governance Check in the plan template, and runtime-governance language in the task template and repository guidance. The missing value is stronger authoring-time classification, review guardrails, and escalation prompts, not new infrastructure.
Alternatives considered:
- Create a dedicated test-governance framework with its own configuration and commands. Rejected because it would add a second process surface and drift risk.
- Rely on CI and reviewer discretion alone. Rejected because the spec explicitly targets prevention at authoring and review time.

Decision 2: Keep contributor guidance inside existing repository entry points

Decision: Prefer .specify/README.md, README.md, the updated templates, and this feature's quickstart.md over introducing a new standalone contributor handbook.
Rationale: These are the surfaces contributors already read during spec work and repository setup. Reusing them keeps the guidance discoverable without creating another long-lived document that can drift.
Alternatives considered:
- Add a new docs/test-authoring-governance.md file. Rejected because it would split the guidance away from the authoring workflow and increase maintenance burden.
- Encode all guidance only in the constitution. Rejected because contributors need operational examples at the point of use, not just high-level rules.

Decision 3: Make review guardrails question-based, not score-based

Decision: Model the review surface as a short set of direct questions plus explicit escalation outcomes rather than a weighted scorecard or approval rubric.
Rationale: Reviewers need a fast keep, split, or escalate decision aid. Direct questions about lane fit, breadth, database and UI-heavy justification, fixture cost, and escalation need are easier to apply in under 3 minutes than a scoring framework.
Alternatives considered:
- A weighted review rubric. Rejected because it would slow down reviews and encourage ritual over judgment.
- A long prose checklist. Rejected because it would be harder to scan and easier to ignore.

Decision 4: Escalation stays document-first, not CI-block-first

Decision: New heavy families, new browser scope, revived expensive defaults, and material lane-cost changes should trigger explicit documentation and follow-up decisions in the active spec or PR, not a new automatic CI policy.
Rationale: These are judgment-heavy signals. The right first move is to make them visible and attributable at authoring and review time, not to bolt on a new blocking system that would be brittle and hard to calibrate.
Alternatives considered:
- Fail CI immediately on any detected heavy-surface expansion. Rejected because many legitimate changes still need human context and scoping decisions.
- Treat escalation as optional reviewer prose. Rejected because optional language is exactly what the spec is trying to harden.

Decision 5: Validate both the low-friction and high-risk paths

Decision: Validate the updated workflow against one docs-only or template-only N/A flow and one higher-cost governed-spec flow that touches multiple runtime governance concerns.
Rationale: The low-impact path proves the process stays lightweight. The higher-cost path proves the workflow can surface lane, heavy, fixture, and escalation questions before implementation.
Alternatives considered:
- Validate only against a higher-cost spec. Rejected because it would not prove that ordinary low-impact work stays fast.
- Validate only against hypothetical examples. Rejected because real repo artifacts are needed to check phrasing and friction.

Decision 6: Use logical contract artifacts for workflow semantics

Decision: Represent the design with one schema-first governance-pack contract and one logical OpenAPI contract even though the feature adds no transport API, and treat both artifacts as design-time scaffolding rather than new maintained workflow surfaces.
Rationale: Neighboring governance specs already use logical OpenAPI plus JSON Schema to describe repository-owned workflow truth. Reusing that pattern keeps planning artifacts consistent and gives the later task-generation step structured inputs.
Alternatives considered:
- Markdown-only planning notes. Rejected because they are less structured and less reusable for task generation and validation.
- A runtime API contract. Rejected because this feature does not introduce a runtime service or endpoint.

4.9 KiB Raw Blame History

Research: Test Suite Authoring Constitution & Review Guardrails

Decision 1: Reuse and sharpen existing TEST-GOV-001 workflow surfaces

Decision 2: Keep contributor guidance inside existing repository entry points

Decision 3: Make review guardrails question-based, not score-based

Decision 4: Escalation stays document-first, not CI-block-first

Decision 5: Validate both the low-friction and high-risk paths

Decision 6: Use logical contract artifacts for workflow semantics

4.9 KiB

Raw Blame History

Decision 1: Reuse and sharpen existing `TEST-GOV-001` workflow surfaces