## Summary - implement Spec 211 runtime trend reporting with bounded lane history, drift classification, hotspot trend output, and recalibration evidence handling - extend the repo-truth governance seams and workflow wrappers for comparable-bundle hydration, trend artifact publication, and contract-backed reporting - add the Spec 211 planning artifacts, data model, quickstart, tasks, and repository contract documents ## Validation - parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend-history.schema.json` - parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend.logical.openapi.yaml` - re-ran cross-artifact consistency analysis for the Spec 211 artifact set until no material findings remained - no application test suite was re-run as part of this final commit/push/PR step Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #244
12 KiB
Data Model: Test Runtime Trend Reporting & Baseline Recalibration
This feature adds repository-owned governance artifacts only. It does not add product database tables. All objects below are implemented as manifest metadata, generated JSON payloads, markdown summaries, or guard-test fixtures derived from the existing lane report outputs.
1. LaneTrendPolicy
Purpose: Defines the lane-specific rules for bounded history retention, comparable-window evaluation, hotspot visibility, and recalibration guidance.
| Field | Type | Description |
|---|---|---|
laneId |
string | Canonical lane identifier (fast-feedback, confidence, heavy-governance, browser, junit, profiling). |
workflowProfile |
string | Workflow profile that owns the lane history source in CI. |
retentionLimit |
integer | Max history records retained for the lane. |
comparisonWindowSize |
integer | Number of recent comparable records used for drift evaluation. |
minimumComparableSamples |
integer | Required sample count before a stable non-unstable health class is allowed. |
varianceFloorSeconds |
integer | Minimum meaningful delta for the lane, aligned with current enforcement tolerance. |
nearBudgetHeadroomSeconds |
integer | Headroom threshold for budget-near. |
hotspotFamilyLimit |
integer | Max family deltas shown in readable summaries. |
hotspotFileLimit |
integer | Max file hotspots shown in readable summaries. |
slowestEntryRetention |
integer | Max slowest test entries retained in JSON evidence. |
recalibrationPolicy |
array | Rule summary for acceptable baseline and budget recalibration triggers. |
Relationships
- One
LaneTrendPolicygoverns manyLaneTrendRecordentries for the same lane. - One
LaneTrendPolicyinforms oneTrendComparisonWindow, oneLaneDriftAssessment, and zero or moreRecalibrationDecisionRecordentries per reporting cycle.
Validation Rules
retentionLimitmust be greater than or equal tocomparisonWindowSize.minimumComparableSamplesmust be at least 3.varianceFloorSecondsmust align with or exceed the lane's existing enforcement tolerance.- Primary lanes use a larger retention window than support lanes.
2. LaneTrendRecord
Purpose: Captures the per-run evidence snapshot that can safely be compared over time.
| Field | Type | Description |
|---|---|---|
runRef |
string | Stable run reference from CI or local execution. |
laneId |
string | Governed lane identifier. |
workflowId |
string | Workflow profile or logical workflow owner for the run. |
triggerClass |
string | Pull request, mainline push, manual, scheduled, or local classification. |
generatedAt |
datetime | When the record was emitted. |
wallClockSeconds |
number | Current lane runtime in seconds. |
baselineSeconds |
number or null | Current comparison baseline for the lane if defined. |
baselineSource |
string | Manifest source or comparison source that supplied the baseline. |
budgetSeconds |
number | Current lane budget threshold in seconds. |
budgetStatus |
string | Current lane budget status from the existing budget evaluator. |
blockingStatus |
string | Whether the current CI context blocks on this outcome. |
comparisonFingerprint |
string | Hash or structured fingerprint capturing comparability boundaries. |
classificationTotals |
array | Runtime grouped by current classification totals. |
familyTotals |
array | Runtime grouped by current family totals. |
hotspotFiles |
array | Current dominant hotspot files. |
slowestEntries |
array | Current slowest test entries, capped by policy. |
artifactRefs |
array | References to the summary, report, budget, JUnit, and history artifacts backing the record. |
Validation Rules
- A record must derive from the same lane's current
summary.md,report.json,budget.json, and available JUnit output. comparisonFingerprintmust be present for any record eligible for comparison.wallClockSeconds,budgetSeconds, andgeneratedAtare required.slowestEntriesmust not exceed the lane policy retention cap.
3. TrendComparisonWindow
Purpose: Represents the bounded comparable history used to evaluate one lane in one reporting cycle.
| Field | Type | Description |
|---|---|---|
laneId |
string | Governed lane identifier. |
policyRef |
string | Reference to the governing LaneTrendPolicy. |
currentRecord |
object | The latest LaneTrendRecord. |
previousComparableRecord |
object or null | The most recent prior comparable record, if one exists. |
comparableRecords |
array | Ordered comparable records used for trend evaluation. |
excludedRecords |
array | Recent records skipped because of fingerprint mismatch or invalid evidence. |
windowStatus |
enum | stable, insufficient-history, scope-changed, or noisy. |
sampleCount |
integer | Number of comparable records in the active window. |
Validation Rules
- Every comparable record must share the same
comparisonFingerprint. sampleCountmay not exceedcomparisonWindowSize.previousComparableRecordmust be the immediately preceding entry incomparableRecordswhen present.windowStatusbecomesinsufficient-historywheneversampleCountis belowminimumComparableSamples.
4. LaneDriftAssessment
Purpose: Summarizes the current drift verdict for one lane using the bounded comparison window.
| Field | Type | Description |
|---|---|---|
laneId |
string | Governed lane identifier. |
healthClass |
enum | healthy, budget-near, trending-worse, regressed, or unstable. |
deltaToPreviousSeconds |
number or null | Current runtime delta vs previous comparable run. |
deltaToPreviousPercent |
number or null | Percent delta vs previous comparable run. |
deltaToBaselineSeconds |
number or null | Current runtime delta vs lane baseline. |
deltaToBaselinePercent |
number or null | Percent delta vs lane baseline. |
budgetHeadroomSeconds |
number | Remaining headroom before budget breach. |
worseningStreak |
integer | Count of recent comparable records showing meaningful worsening. |
varianceObservedSeconds |
number | Effective variance observed across the active window. |
recalibrationRecommendation |
enum | none, investigate, review-baseline, or review-budget. |
summaryLine |
string | Human-readable explanation emitted into markdown summaries. |
Validation Rules
healthClassmay only be non-unstablewhen the comparison window has at leastminimumComparableSamplescomparable records.recalibrationRecommendationmust remain separate fromhealthClass.budgetHeadroomSecondsmay be negative only when the lane is over budget.
5. HotspotTrendSnapshot
Purpose: Captures how the dominant runtime contributors changed between the current and previous comparable run.
| Field | Type | Description |
|---|---|---|
laneId |
string | Governed lane identifier. |
familyDeltas |
array | Top family-level deltas with current seconds, previous seconds, and delta values. |
fileHotspots |
array | Top file hotspots with current/previous runtime and rank movement. |
newEntrants |
array | Families or files newly entering the visible hotspot set. |
droppedEntrants |
array | Families or files leaving the visible hotspot set. |
evidenceAvailability |
enum | available or unavailable, used when JUnit or attribution evidence is missing. |
Validation Rules
- Human-readable summaries must cap output at the policy's family/file limits.
- JSON evidence may retain more detail, but must not exceed
slowestEntryRetention. - If hotspot evidence is unavailable, the summary must say so explicitly.
6. RecalibrationDecisionRecord
Purpose: Records structured evidence for a proposed, approved, or rejected baseline/budget recalibration.
| Field | Type | Description |
|---|---|---|
laneId |
string | Governed lane identifier. |
targetType |
enum | baseline or budget. |
decisionStatus |
enum | candidate, approved, or rejected. |
evidenceRunRefs |
array | Comparable runs supporting the decision. |
previousValueSeconds |
number | Existing baseline or budget value. |
proposedValueSeconds |
number or null | Proposed replacement value. |
rationaleCode |
enum | lane-scope-change, infrastructure-shift, post-improvement-reset, sustained-erosion, noise-rejected, or manual-hold. |
recordedIn |
string | Active spec path or implementation PR reference where the decision is documented. |
notes |
string | Concise reviewer-facing explanation. |
Validation Rules
- Approved baseline changes require at least one accepted rationale tied to scope or environment truth.
- Approved budget changes require a stronger evidence window than approved baseline changes.
- Rejected decisions must retain the rejection reason.
- The artifact may propose candidates, but approval remains human-controlled.
7. TrendSummaryCycle
Purpose: Represents one generated trend-aware reporting cycle across the relevant lanes.
| Field | Type | Description |
|---|---|---|
cycleId |
string | Reporting-cycle identifier, typically anchored to the current lane run or summary generation timestamp. |
generatedAt |
datetime | When the cycle summary was emitted. |
laneSummaries |
array | Per-lane summary entries containing laneId, current runtime, previous comparable runtime, baseline, budget, and the embedded drift assessment used by the readable summary surface. |
laneAssessments |
array | LaneDriftAssessment items for all relevant lanes. |
hotspotSnapshots |
array | HotspotTrendSnapshot items for lanes with available evidence. |
recalibrationDecisions |
array | Candidate, approved, or rejected recalibration records emitted for the cycle. |
artifactPublicationStatus |
array | Whether required current-run and history artifacts were published successfully. |
warnings |
array | Legibility notes such as missing comparable history or unavailable hotspot evidence. |
Validation Rules
- Every relevant primary lane must have exactly one
laneSummariesentry and exactly oneLaneDriftAssessmentper cycle. - Each
laneSummariesentry must expose the current runtime, previous comparable runtime, baseline, budget, and embedded health assessment needed by the readable summary surface. warningsmust be explicit when any required evidence is unavailable.- The cycle summary must stay readable without requiring a second dashboard surface.
State Transitions
LaneDriftAssessment.healthClass
unstable->healthy: allowed once there are enough comparable samples and the lane is comfortably below budget without sustained worsening.unstable->budget-near: allowed once there are enough comparable samples and budget headroom falls inside the near-budget window.unstable->trending-worse: allowed once there are enough comparable samples and worsening exceeds the lane variance floor across the bounded window.healthy<->budget-near: allowed as headroom enters or leaves the near-budget band.healthyorbudget-near->trending-worse: allowed when sustained worsening appears without a budget breach.trending-worse->regressed: allowed when the lane breaches budget or shows a materially worse repeated trend strong enough to stop calling it merely erosion.- Any state ->
unstable: allowed when comparability breaks, history is insufficient, or the window is too noisy to classify reliably.
RecalibrationDecisionRecord.decisionStatus
candidate->approved: allowed only by explicit human review with structured evidence.candidate->rejected: allowed when the evidence is noisy, incomplete, or policy says repository truth should not move.approvedandrejected: terminal statuses for the recorded decision.