Main Confidence / confidence (push) Failing after 48s

Details

docs: add Spec 212 test authoring guardrails (#245 )

## Summary

- add Spec 212 planning artifacts for test authoring constitution and review guardrails
- expand `TEST-GOV-001` and sync the SpecKit spec/plan/tasks/checklist templates plus contributor guidance
- define the canonical review checklist outcomes and record low-impact and higher-cost validation examples

## Validation

- docs/workflow only; no runtime Pest or Sail test lanes were run
- validation is recorded in `specs/212-test-authoring-guardrails/spec.md` and `specs/212-test-authoring-guardrails/quickstart.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #245

2026-04-18 10:08:00 +00:00

10 KiB

Raw Blame History

Data Model: Test Suite Authoring Constitution & Review Guardrails

This feature adds repository-owned governance artifacts only. It does not add product database tables or runtime-owned entities. All objects below are implemented as constitution text, markdown prompt blocks, checklists, logical contracts, or validation notes.

1. TestAuthoringConstitutionSection

Purpose: Defines the standing rules contributors and reviewers must follow when new tests are introduced or existing tests expand in cost.

Field	Type	Description
`sectionId`	string	Stable identifier for the constitution section.
`version`	string	Version of the rule set.
`scope`	string	Repository workflow scope, always `workspace`.
`classificationRule`	string	Requires explicit classification of new or changed tests.
`laneAwarenessRule`	string	Requires authors to name affected lane or lanes.
`heavyJustificationRule`	string	Requires justification for database, Livewire, Filament, or browser use.
`minimalFixtureRule`	string	States that minimal fixtures and cheap defaults are the norm.
`expensiveDefaultBanRule`	string	Forbids hidden shared helper, factory, or seed cost growth without disclosure or escalation.
`reviewExpectationRule`	string	Requires the reviewer guardrail questions to be applied when tests change.
`escalationRule`	string	Defines when a change must be documented locally or raised as follow-up governance work.
`linkedWorkflowSurfaces`	array	Template, checklist, and contributor-doc surfaces that must remain aligned with the section.

Relationships

One TestAuthoringConstitutionSection governs one SpecImpactPromptBlock, one PlanImpactPromptBlock, one TaskGovernanceChecklist, one ReviewGuardrailChecklist, and one ContributorGuidancePack.

Validation Rules

The rule set must stay short enough to be quoted or understood during routine authoring and review.
The section must reuse existing lane vocabulary from Specs 206 through 211.
The section must not invent new validation lanes or new runtime governance subsystems.

2. SpecImpactPromptBlock

Purpose: Defines the authoring-time questions every spec must answer about test, lane, and runtime cost impact.

Field	Type	Description
`blockId`	string	Stable identifier for the spec prompt block.
`requiredFields`	array	Required answers such as affected lanes, test-family impact, heavy-surface relevance, fixture-cost impact, budget or trend implications, and reviewer validation commands or `N/A`.
`narrowestProofRule`	string	Requires authors to name the narrowest sufficient validation path when runtime changes exist.
`naAllowanceRule`	string	Allows concise `N/A` or `none` answers for docs-only or low-impact work.
`escalationPrompt`	string	Direct question asking whether the change creates a new heavy family, new browser scope, or material lane-cost shift.
`reviewerHandOff`	string	States what reviewers should verify from the completed block.

Validation Rules

The block must be short enough for ordinary specs to complete quickly.
The block must distinguish between “no impact” and “impact exists but is acceptable.”
The block must not duplicate entire review checklist content; it only prepares the review handoff.

3. PlanImpactPromptBlock

Purpose: Defines the planning-time questions that convert the spec's declared impact into implementation-time guardrails.

Field	Type	Description
`blockId`	string	Stable identifier for the plan prompt block.
`changedTestTypes`	array	Test types being added or changed.
`helperOrFixtureImpact`	string	Whether helpers, factories, seeds, or defaults widen.
`laneReshapeQuestion`	string	Whether lane movement, heavy-family addition, or browser promotion is implicated.
`closingValidationRule`	string	Defines the minimum validation evidence to finish the feature.
`driftDocumentationRule`	string	States where material runtime drift or recalibration follow-up must be recorded.

Relationships

One PlanImpactPromptBlock operationalizes one SpecImpactPromptBlock.
One PlanImpactPromptBlock informs one TaskGovernanceChecklist.

Validation Rules

The block must make authoring decisions actionable in tasks, not merely restate the spec.
The block must expose helper or fixture widening even when the local feature is otherwise small.

4. TaskGovernanceChecklist

Purpose: Provides a short implementation-time checklist that keeps lane fit, setup cost, and validation visible while tasks are broken down.

Field	Type	Description
`checklistId`	string	Stable identifier for the task checklist.
`items`	array	Required checks such as lane assignment confirmed, no unnecessary heavy cost, minimal fixtures used, relevant validation planned, and budget or trend notes recorded when needed.
`appliesWhen`	string	Scope rule for runtime-changing work versus docs-only work.
`evidenceTarget`	string	Where the resulting note or evidence must be recorded.

Validation Rules

The checklist must remain short enough to fit inside ordinary task planning.
The checklist must not require runtime-lane execution for docs-only work.

5. ReviewGuardrailChecklist

Purpose: Gives reviewers a fast, repeatable decision aid for new or changed tests.

Field	Type	Description
`checklistId`	string	Stable identifier for the review checklist.
`questions`	array	Direct questions about lane fit, breadth, DB or UI-heavy necessity, setup cost, split need, escalation need, and budget or trend notes.
`expectedOutcomeSet`	array	Allowed reviewer outcomes such as `keep`, `split`, `document-local`, `follow-up-spec`, or `reject-drift`.
`maxReviewMinutes`	integer	Target application time for one representative change.
`escalationReference`	string	Link or pointer to the escalation policy used when a trigger is present.

Validation Rules

Questions must be phrased as decisions, not vague advice.
The checklist must stay usable in under 3 minutes for a representative diff.
The checklist must support both low-impact and high-impact changes.

6. EscalationAssessment

Purpose: Captures whether a change is ordinary test maintenance or a governance-significant event requiring extra documentation or follow-up.

Field	Type	Description
`assessmentId`	string	Stable identifier for one escalation assessment.
`triggerSet`	array	Detected triggers such as new heavy family, new browser scope, revived expensive defaults, material lane-cost shift, or broad suite reshaping.
`outcome`	enum	`none`, `document-in-feature`, `follow-up-spec`, or `reject-or-split`.
`reason`	string	Human-readable explanation of why the outcome was chosen.
`recordLocation`	string	Active spec path or implementation PR location where the outcome is recorded.
`examples`	array	Example changes that should resolve to this outcome.

Validation Rules

Every trigger must map to a documented action.
follow-up-spec is reserved for recurring pain or structural change, not ordinary recalibration.
none is valid only when the change stays inside an existing lane and family without hidden-cost growth.

7. ContributorGuidancePack

Purpose: Gives contributors concise operational guidance for choosing the smallest justified test surface and recognizing escalation signals.

Field	Type	Description
`guidanceId`	string	Stable identifier for the contributor guidance pack.
`decisionPoints`	array	High-value decisions such as unit vs feature vs heavy vs browser, when DB is justified, and when a test is too broad.
`examplePatterns`	array	Brief examples of acceptable `N/A`, lane-specific justification, and escalation-worthy changes.
`entryPoints`	array	Documentation surfaces where the guidance appears.
`sharedVocabulary`	array	Canonical governance terms reused across constitution, templates, and review.

Validation Rules

Guidance must stay short and operational.
Guidance must avoid duplicating long prose across multiple files.
Guidance must reflect the same vocabulary used in the constitution and review checklist.

8. ValidationScenario

Purpose: Represents one dry-run scenario used to prove that the authoring and review workflow stays usable.

Field	Type	Description
`scenarioId`	string	Stable scenario identifier.
`scenarioType`	enum	`low-impact` or `high-impact`.
`representativeArtifact`	string	Spec, plan, or diff used in the dry run.
`expectedPromptPattern`	string	Expected answer style, such as `N/A` or a multi-lane justification.
`expectedEscalationOutcome`	string	Expected escalation result for the scenario.
`status`	enum	`planned`, `validated`, or `needs-tuning`.
`notes`	string	What the dry run proved or what wording needs refinement.

Validation Rules

At least one low-impact and one high-impact scenario must be validated.
low-impact scenarios must prove the workflow stays lightweight.
high-impact scenarios must prove the escalation prompts catch the intended cost-center changes.

State Transitions

EscalationAssessment.outcome

none -> document-in-feature: allowed when a review reveals governance-relevant cost or scope that should be explicitly recorded but does not justify a new spec.
document-in-feature -> follow-up-spec: allowed when the discovered issue reflects recurring pain or structural lane change rather than one contained feature decision.
Any state -> reject-or-split: allowed when the change is too broad, too hidden in cost, or insufficiently justified to merge as proposed.

ValidationScenario.status

planned -> validated: allowed when the scenario can be completed with the expected prompt pattern and escalation outcome.
planned -> needs-tuning: allowed when wording or checklist structure creates unnecessary friction or misses the expected governance signal.
needs-tuning -> validated: allowed after the relevant constitution, template, or checklist wording is refined.

10 KiB Raw Blame History

Data Model: Test Suite Authoring Constitution & Review Guardrails

1. TestAuthoringConstitutionSection

2. SpecImpactPromptBlock

3. PlanImpactPromptBlock

4. TaskGovernanceChecklist

5. ReviewGuardrailChecklist

6. EscalationAssessment

7. ContributorGuidancePack

8. ValidationScenario

State Transitions

EscalationAssessment.outcome

ValidationScenario.status

10 KiB

Raw Blame History