Some checks failed
Main Confidence / confidence (push) Failing after 46s
## Summary - implement Spec 211 runtime trend reporting with bounded lane history, drift classification, hotspot trend output, and recalibration evidence handling - extend the repo-truth governance seams and workflow wrappers for comparable-bundle hydration, trend artifact publication, and contract-backed reporting - add the Spec 211 planning artifacts, data model, quickstart, tasks, and repository contract documents ## Validation - parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend-history.schema.json` - parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend.logical.openapi.yaml` - re-ran cross-artifact consistency analysis for the Spec 211 artifact set until no material findings remained - no application test suite was re-run as part of this final commit/push/PR step Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #244
192 lines
12 KiB
Markdown
192 lines
12 KiB
Markdown
# Data Model: Test Runtime Trend Reporting & Baseline Recalibration
|
|
|
|
This feature adds repository-owned governance artifacts only. It does not add product database tables. All objects below are implemented as manifest metadata, generated JSON payloads, markdown summaries, or guard-test fixtures derived from the existing lane report outputs.
|
|
|
|
## 1. LaneTrendPolicy
|
|
|
|
**Purpose**: Defines the lane-specific rules for bounded history retention, comparable-window evaluation, hotspot visibility, and recalibration guidance.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `laneId` | string | Canonical lane identifier (`fast-feedback`, `confidence`, `heavy-governance`, `browser`, `junit`, `profiling`). |
|
|
| `workflowProfile` | string | Workflow profile that owns the lane history source in CI. |
|
|
| `retentionLimit` | integer | Max history records retained for the lane. |
|
|
| `comparisonWindowSize` | integer | Number of recent comparable records used for drift evaluation. |
|
|
| `minimumComparableSamples` | integer | Required sample count before a stable non-`unstable` health class is allowed. |
|
|
| `varianceFloorSeconds` | integer | Minimum meaningful delta for the lane, aligned with current enforcement tolerance. |
|
|
| `nearBudgetHeadroomSeconds` | integer | Headroom threshold for `budget-near`. |
|
|
| `hotspotFamilyLimit` | integer | Max family deltas shown in readable summaries. |
|
|
| `hotspotFileLimit` | integer | Max file hotspots shown in readable summaries. |
|
|
| `slowestEntryRetention` | integer | Max slowest test entries retained in JSON evidence. |
|
|
| `recalibrationPolicy` | array | Rule summary for acceptable baseline and budget recalibration triggers. |
|
|
|
|
**Relationships**
|
|
|
|
- One `LaneTrendPolicy` governs many `LaneTrendRecord` entries for the same lane.
|
|
- One `LaneTrendPolicy` informs one `TrendComparisonWindow`, one `LaneDriftAssessment`, and zero or more `RecalibrationDecisionRecord` entries per reporting cycle.
|
|
|
|
**Validation Rules**
|
|
|
|
- `retentionLimit` must be greater than or equal to `comparisonWindowSize`.
|
|
- `minimumComparableSamples` must be at least 3.
|
|
- `varianceFloorSeconds` must align with or exceed the lane's existing enforcement tolerance.
|
|
- Primary lanes use a larger retention window than support lanes.
|
|
|
|
## 2. LaneTrendRecord
|
|
|
|
**Purpose**: Captures the per-run evidence snapshot that can safely be compared over time.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `runRef` | string | Stable run reference from CI or local execution. |
|
|
| `laneId` | string | Governed lane identifier. |
|
|
| `workflowId` | string | Workflow profile or logical workflow owner for the run. |
|
|
| `triggerClass` | string | Pull request, mainline push, manual, scheduled, or local classification. |
|
|
| `generatedAt` | datetime | When the record was emitted. |
|
|
| `wallClockSeconds` | number | Current lane runtime in seconds. |
|
|
| `baselineSeconds` | number or null | Current comparison baseline for the lane if defined. |
|
|
| `baselineSource` | string | Manifest source or comparison source that supplied the baseline. |
|
|
| `budgetSeconds` | number | Current lane budget threshold in seconds. |
|
|
| `budgetStatus` | string | Current lane budget status from the existing budget evaluator. |
|
|
| `blockingStatus` | string | Whether the current CI context blocks on this outcome. |
|
|
| `comparisonFingerprint` | string | Hash or structured fingerprint capturing comparability boundaries. |
|
|
| `classificationTotals` | array | Runtime grouped by current classification totals. |
|
|
| `familyTotals` | array | Runtime grouped by current family totals. |
|
|
| `hotspotFiles` | array | Current dominant hotspot files. |
|
|
| `slowestEntries` | array | Current slowest test entries, capped by policy. |
|
|
| `artifactRefs` | array | References to the summary, report, budget, JUnit, and history artifacts backing the record. |
|
|
|
|
**Validation Rules**
|
|
|
|
- A record must derive from the same lane's current `summary.md`, `report.json`, `budget.json`, and available JUnit output.
|
|
- `comparisonFingerprint` must be present for any record eligible for comparison.
|
|
- `wallClockSeconds`, `budgetSeconds`, and `generatedAt` are required.
|
|
- `slowestEntries` must not exceed the lane policy retention cap.
|
|
|
|
## 3. TrendComparisonWindow
|
|
|
|
**Purpose**: Represents the bounded comparable history used to evaluate one lane in one reporting cycle.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `laneId` | string | Governed lane identifier. |
|
|
| `policyRef` | string | Reference to the governing `LaneTrendPolicy`. |
|
|
| `currentRecord` | object | The latest `LaneTrendRecord`. |
|
|
| `previousComparableRecord` | object or null | The most recent prior comparable record, if one exists. |
|
|
| `comparableRecords` | array | Ordered comparable records used for trend evaluation. |
|
|
| `excludedRecords` | array | Recent records skipped because of fingerprint mismatch or invalid evidence. |
|
|
| `windowStatus` | enum | `stable`, `insufficient-history`, `scope-changed`, or `noisy`. |
|
|
| `sampleCount` | integer | Number of comparable records in the active window. |
|
|
|
|
**Validation Rules**
|
|
|
|
- Every comparable record must share the same `comparisonFingerprint`.
|
|
- `sampleCount` may not exceed `comparisonWindowSize`.
|
|
- `previousComparableRecord` must be the immediately preceding entry in `comparableRecords` when present.
|
|
- `windowStatus` becomes `insufficient-history` whenever `sampleCount` is below `minimumComparableSamples`.
|
|
|
|
## 4. LaneDriftAssessment
|
|
|
|
**Purpose**: Summarizes the current drift verdict for one lane using the bounded comparison window.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `laneId` | string | Governed lane identifier. |
|
|
| `healthClass` | enum | `healthy`, `budget-near`, `trending-worse`, `regressed`, or `unstable`. |
|
|
| `deltaToPreviousSeconds` | number or null | Current runtime delta vs previous comparable run. |
|
|
| `deltaToPreviousPercent` | number or null | Percent delta vs previous comparable run. |
|
|
| `deltaToBaselineSeconds` | number or null | Current runtime delta vs lane baseline. |
|
|
| `deltaToBaselinePercent` | number or null | Percent delta vs lane baseline. |
|
|
| `budgetHeadroomSeconds` | number | Remaining headroom before budget breach. |
|
|
| `worseningStreak` | integer | Count of recent comparable records showing meaningful worsening. |
|
|
| `varianceObservedSeconds` | number | Effective variance observed across the active window. |
|
|
| `recalibrationRecommendation` | enum | `none`, `investigate`, `review-baseline`, or `review-budget`. |
|
|
| `summaryLine` | string | Human-readable explanation emitted into markdown summaries. |
|
|
|
|
**Validation Rules**
|
|
|
|
- `healthClass` may only be non-`unstable` when the comparison window has at least `minimumComparableSamples` comparable records.
|
|
- `recalibrationRecommendation` must remain separate from `healthClass`.
|
|
- `budgetHeadroomSeconds` may be negative only when the lane is over budget.
|
|
|
|
## 5. HotspotTrendSnapshot
|
|
|
|
**Purpose**: Captures how the dominant runtime contributors changed between the current and previous comparable run.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `laneId` | string | Governed lane identifier. |
|
|
| `familyDeltas` | array | Top family-level deltas with current seconds, previous seconds, and delta values. |
|
|
| `fileHotspots` | array | Top file hotspots with current/previous runtime and rank movement. |
|
|
| `newEntrants` | array | Families or files newly entering the visible hotspot set. |
|
|
| `droppedEntrants` | array | Families or files leaving the visible hotspot set. |
|
|
| `evidenceAvailability` | enum | `available` or `unavailable`, used when JUnit or attribution evidence is missing. |
|
|
|
|
**Validation Rules**
|
|
|
|
- Human-readable summaries must cap output at the policy's family/file limits.
|
|
- JSON evidence may retain more detail, but must not exceed `slowestEntryRetention`.
|
|
- If hotspot evidence is unavailable, the summary must say so explicitly.
|
|
|
|
## 6. RecalibrationDecisionRecord
|
|
|
|
**Purpose**: Records structured evidence for a proposed, approved, or rejected baseline/budget recalibration.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `laneId` | string | Governed lane identifier. |
|
|
| `targetType` | enum | `baseline` or `budget`. |
|
|
| `decisionStatus` | enum | `candidate`, `approved`, or `rejected`. |
|
|
| `evidenceRunRefs` | array | Comparable runs supporting the decision. |
|
|
| `previousValueSeconds` | number | Existing baseline or budget value. |
|
|
| `proposedValueSeconds` | number or null | Proposed replacement value. |
|
|
| `rationaleCode` | enum | `lane-scope-change`, `infrastructure-shift`, `post-improvement-reset`, `sustained-erosion`, `noise-rejected`, or `manual-hold`. |
|
|
| `recordedIn` | string | Active spec path or implementation PR reference where the decision is documented. |
|
|
| `notes` | string | Concise reviewer-facing explanation. |
|
|
|
|
**Validation Rules**
|
|
|
|
- Approved baseline changes require at least one accepted rationale tied to scope or environment truth.
|
|
- Approved budget changes require a stronger evidence window than approved baseline changes.
|
|
- Rejected decisions must retain the rejection reason.
|
|
- The artifact may propose candidates, but approval remains human-controlled.
|
|
|
|
## 7. TrendSummaryCycle
|
|
|
|
**Purpose**: Represents one generated trend-aware reporting cycle across the relevant lanes.
|
|
|
|
| Field | Type | Description |
|
|
|-------|------|-------------|
|
|
| `cycleId` | string | Reporting-cycle identifier, typically anchored to the current lane run or summary generation timestamp. |
|
|
| `generatedAt` | datetime | When the cycle summary was emitted. |
|
|
| `laneSummaries` | array | Per-lane summary entries containing `laneId`, current runtime, previous comparable runtime, baseline, budget, and the embedded drift assessment used by the readable summary surface. |
|
|
| `laneAssessments` | array | `LaneDriftAssessment` items for all relevant lanes. |
|
|
| `hotspotSnapshots` | array | `HotspotTrendSnapshot` items for lanes with available evidence. |
|
|
| `recalibrationDecisions` | array | Candidate, approved, or rejected recalibration records emitted for the cycle. |
|
|
| `artifactPublicationStatus` | array | Whether required current-run and history artifacts were published successfully. |
|
|
| `warnings` | array | Legibility notes such as missing comparable history or unavailable hotspot evidence. |
|
|
|
|
**Validation Rules**
|
|
|
|
- Every relevant primary lane must have exactly one `laneSummaries` entry and exactly one `LaneDriftAssessment` per cycle.
|
|
- Each `laneSummaries` entry must expose the current runtime, previous comparable runtime, baseline, budget, and embedded health assessment needed by the readable summary surface.
|
|
- `warnings` must be explicit when any required evidence is unavailable.
|
|
- The cycle summary must stay readable without requiring a second dashboard surface.
|
|
|
|
## State Transitions
|
|
|
|
### LaneDriftAssessment.healthClass
|
|
|
|
- `unstable` -> `healthy`: allowed once there are enough comparable samples and the lane is comfortably below budget without sustained worsening.
|
|
- `unstable` -> `budget-near`: allowed once there are enough comparable samples and budget headroom falls inside the near-budget window.
|
|
- `unstable` -> `trending-worse`: allowed once there are enough comparable samples and worsening exceeds the lane variance floor across the bounded window.
|
|
- `healthy` <-> `budget-near`: allowed as headroom enters or leaves the near-budget band.
|
|
- `healthy` or `budget-near` -> `trending-worse`: allowed when sustained worsening appears without a budget breach.
|
|
- `trending-worse` -> `regressed`: allowed when the lane breaches budget or shows a materially worse repeated trend strong enough to stop calling it merely erosion.
|
|
- Any state -> `unstable`: allowed when comparability breaks, history is insufficient, or the window is too noisy to classify reliably.
|
|
|
|
### RecalibrationDecisionRecord.decisionStatus
|
|
|
|
- `candidate` -> `approved`: allowed only by explicit human review with structured evidence.
|
|
- `candidate` -> `rejected`: allowed when the evidence is noisy, incomplete, or policy says repository truth should not move.
|
|
- `approved` and `rejected`: terminal statuses for the recorded decision. |