Main Confidence / confidence (push) Failing after 48s

Details

docs: add Spec 212 test authoring guardrails (#245 )

## Summary

- add Spec 212 planning artifacts for test authoring constitution and review guardrails
- expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance
- define the canonical review checklist outcomes and record low-impact and higher-cost validation examples

## Validation

- docs/workflow only; no runtime Pest or Sail test lanes were run
- validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #245

2026-04-18 10:08:00 +00:00

11 KiB

Raw Blame History

Tasks: Test Suite Authoring Constitution & Review Guardrails

Input: Design documents from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/212-test-authoring-guardrails/ Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md

Tests: Not required. This feature is docs and workflow only, so validation is by representative low-impact and higher-cost dry runs, cross-artifact consistency review, and recording the outcomes in the active spec artifacts.

Organization: Tasks are grouped by user story so each story can be implemented and validated independently where possible.

Phase 1: Setup (Shared Context)

Purpose: Freeze the real repository workflow surfaces before editing them.

T001 Audit .specify/memory/constitution.md, .specify/templates/spec-template.md, .specify/templates/plan-template.md, .specify/templates/tasks-template.md, .specify/templates/checklist-template.md, .specify/README.md, and README.md against specs/212-test-authoring-guardrails/spec.md and specs/212-test-authoring-guardrails/plan.md to confirm the exact guardrail gaps this feature must close

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Establish the shared vocabulary that every later template and checklist update depends on.

Critical: No user story work should begin until this phase is complete.

T002 Update .specify/memory/constitution.md with the canonical test authoring and review guardrail rules for classification, lane awareness, heavy-surface justification, minimal fixtures, expensive-default bans, reviewer expectations, and escalation outcomes

Checkpoint: The shared governance vocabulary is stable enough for story-specific template and guidance updates.

Phase 3: User Story 1 - Classify Test Impact While Authoring (Priority: P1) 🎯 MVP

Goal: Make contributors declare lane impact, heavy justification, and minimal proof while writing specs and plans.

Independent Test: Apply the updated spec and plan prompts to a genuinely low-impact template-only scenario limited to .specify/templates/checklist-template.md and .specify/README.md, confirm the low-impact path can be answered with concise N/A or none responses, and verify the required authoring questions are explicit.

Implementation for User Story 1

T003 [P] [US1] Update .specify/templates/spec-template.md so Testing / Lane / Runtime Impact explicitly asks for affected lane fit, heavy-surface justification, fixture or helper cost disclosure, escalation triggers, and concise N/A or none handling
T004 [P] [US1] Update .specify/templates/plan-template.md so Test Governance Check explicitly asks for changed test types, helper or factory widening, lane reshaping, closing validation, and where material drift notes must be recorded
T005 [US1] Validate the authoring flow using a low-impact template-only scenario limited to .specify/templates/checklist-template.md and .specify/README.md as the representative N/A path, then record the outcome and any wording adjustments in specs/212-test-authoring-guardrails/spec.md and specs/212-test-authoring-guardrails/quickstart.md

Checkpoint: Contributors can classify test impact during spec and plan authoring without extra workflow overhead.

Phase 4: User Story 2 - Reviewers Catch Hidden Suite Cost Before Merge (Priority: P1)

Goal: Give reviewers a fixed, quick checklist that surfaces hidden test cost and points to clear outcomes.

Independent Test: Apply the updated review checklist to a representative higher-cost governed spec flow and confirm the reviewer can reach a keep, split, or escalate decision in under 3 minutes.

Implementation for User Story 2

T006 [P] [US2] Update .specify/templates/tasks-template.md so generated task lists carry a short test-governance checklist for lane assignment, minimal setup, relevant validation, hidden-cost prevention, and budget or trend note visibility
T007 [P] [US2] Update .specify/templates/checklist-template.md as the canonical generated review-checklist surface with a fixed guardrail structure covering lane fit, breadth, DB or UI-heavy necessity, setup cost, split need, and escalation outcomes
T008 [US2] Update .specify/README.md as the reviewer entry point for applying the canonical review checklist and interpreting keep, split, document-in-feature, follow-up-spec, and reject-or-split outcomes
T009 [US2] Validate the review guardrails against specs/211-runtime-trend-recalibration/spec.md and specs/211-runtime-trend-recalibration/plan.md, then record the representative review outcome and timing note in specs/212-test-authoring-guardrails/spec.md and specs/212-test-authoring-guardrails/quickstart.md

Checkpoint: Reviewers have a stable guardrail surface that catches hidden suite cost before merge.

Phase 5: User Story 3 - Escalate New Cost Centers Deliberately (Priority: P2)

Goal: Make new heavy families, new browser scope, revived expensive defaults, and material lane-cost shifts announce themselves explicitly.

Independent Test: Apply the escalation rules to a representative higher-cost multi-lane workflow and confirm the result distinguishes between local documentation and a true follow-up governance action.

Implementation for User Story 3

T010 [P] [US3] Update README.md with concise contributor guidance for choosing unit vs feature vs heavy-governance vs browser coverage, justifying database or UI-heavy usage, recognizing over-broad tests, spotting escalation triggers early, and deciding when to extend an existing family versus introduce a new one
T011 [US3] Update specs/212-test-authoring-guardrails/quickstart.md with the canonical low-impact validation scenario, the canonical review-checklist surface, and explicit document-in-feature vs follow-up-spec escalation examples that match the live templates and docs
T012 [US3] Validate escalation handling against a representative higher-cost multi-lane flow using specs/211-runtime-trend-recalibration/spec.md, specs/211-runtime-trend-recalibration/plan.md, and the implemented guidance surfaces, then record document-local vs follow-up-spec examples in specs/212-test-authoring-guardrails/spec.md and specs/212-test-authoring-guardrails/quickstart.md

Checkpoint: Escalation-worthy test cost changes are explicit, documented, and consistently interpreted.

Phase 6: Polish & Cross-Cutting Concerns

Purpose: Reconcile the finished workflow surfaces and remove drift between the templates, guidance, and active spec artifacts.

T013 Run the specs/212-test-authoring-guardrails/quickstart.md completion checklist against .specify/memory/constitution.md, .specify/templates/spec-template.md, .specify/templates/plan-template.md, .specify/templates/tasks-template.md, .specify/templates/checklist-template.md, .specify/README.md, README.md, and specs/212-test-authoring-guardrails/spec.md, then remove any duplicated or conflicting wording across the updated governance surfaces

Dependencies & Execution Order

Phase Dependencies

Setup (Phase 1): No dependencies and can start immediately.
Foundational (Phase 2): Depends on Phase 1 and blocks all user story work.
User Story 1 (Phase 3): Depends on Phase 2 and is the MVP slice.
User Story 2 (Phase 4): Depends on Phase 2 and can proceed independently of User Story 1 once the shared vocabulary is stable.
User Story 3 (Phase 5): Depends on Phase 2 and benefits from the implemented authoring and review surfaces from User Stories 1 and 2 before final escalation examples are recorded.
Polish (Phase 6): Depends on all desired user stories being complete.

User Story Dependencies

User Story 1 (P1): Can begin immediately after Foundational and delivers the first usable authoring workflow increment.
User Story 2 (P1): Can begin immediately after Foundational and delivers a separate review workflow increment.
User Story 3 (P2): Reuses the stable vocabulary from Foundational and should finalize once the live authoring and review surfaces are in place.

Within Each User Story

Shared vocabulary changes in .specify/memory/constitution.md must land before any template or checklist wording is finalized.
Template changes should be implemented before story-specific validation notes are recorded in spec.md and quickstart.md.
Low-impact and higher-cost dry-run validation must complete before closing the corresponding story.
Cross-artifact cleanup should happen only after all targeted workflow surfaces are updated.

Parallel Opportunities

T003 and T004 can run in parallel because they update different template surfaces for the same authoring flow.
T006 and T007 can run in parallel because they update different checklist-producing template surfaces.
T010 can run in parallel with the earlier story validation recording once the shared vocabulary is stable because it targets root contributor guidance rather than the SpecKit templates.

Parallel Example: User Story 1

# After T002 establishes the shared vocabulary, these can proceed in parallel:
Task: "Update .specify/templates/spec-template.md"
Task: "Update .specify/templates/plan-template.md"

Parallel Example: User Story 2

# After T002 establishes the shared vocabulary, these can proceed in parallel:
Task: "Update .specify/templates/tasks-template.md"
Task: "Update .specify/templates/checklist-template.md"

Implementation Strategy

MVP First (User Story 1 Only)

Complete Phase 1: Setup.
Complete Phase 2: Foundational.
Complete Phase 3: User Story 1.
Validate the low-impact N/A path using a template-only scenario limited to .specify/templates/checklist-template.md and .specify/README.md before continuing.

Incremental Delivery

Lock the shared constitution vocabulary first.
Deliver the authoring prompts for specs and plans.
Deliver the reviewer-facing task and checklist surfaces.
Add contributor guidance and explicit escalation examples.
Finish with cross-artifact cleanup and quickstart completion review.

Parallel Team Strategy

One contributor can update the spec and plan templates while another prepares the task and checklist template changes after Foundational is done.
Reviewer guidance in .specify/README.md can follow once the checklist surface is stable.
Root README.md contributor guidance and final escalation examples can be completed in parallel with late-stage validation-note drafting.

Notes

[P] tasks operate on different files or independent workflow surfaces and can run in parallel once dependencies are satisfied.
[US1], [US2], and [US3] map tasks directly to the user stories in spec.md.
This feature is docs and workflow only, so validation is recorded in the active spec artifacts rather than by running Pest lanes.
The final workflow must stay lightweight for low-impact work while still surfacing explicit escalation for new test cost centers.

11 KiB Raw Blame History