TenantAtlas/specs/211-runtime-trend-recalibration/data-model.md
Ahmed Darrazi 97262787c9
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 4m15s
feat: implement runtime trend recalibration reporting
2026-04-18 09:34:23 +02:00

12 KiB

Data Model: Test Runtime Trend Reporting & Baseline Recalibration

This feature adds repository-owned governance artifacts only. It does not add product database tables. All objects below are implemented as manifest metadata, generated JSON payloads, markdown summaries, or guard-test fixtures derived from the existing lane report outputs.

1. LaneTrendPolicy

Purpose: Defines the lane-specific rules for bounded history retention, comparable-window evaluation, hotspot visibility, and recalibration guidance.

Field Type Description
laneId string Canonical lane identifier (fast-feedback, confidence, heavy-governance, browser, junit, profiling).
workflowProfile string Workflow profile that owns the lane history source in CI.
retentionLimit integer Max history records retained for the lane.
comparisonWindowSize integer Number of recent comparable records used for drift evaluation.
minimumComparableSamples integer Required sample count before a stable non-unstable health class is allowed.
varianceFloorSeconds integer Minimum meaningful delta for the lane, aligned with current enforcement tolerance.
nearBudgetHeadroomSeconds integer Headroom threshold for budget-near.
hotspotFamilyLimit integer Max family deltas shown in readable summaries.
hotspotFileLimit integer Max file hotspots shown in readable summaries.
slowestEntryRetention integer Max slowest test entries retained in JSON evidence.
recalibrationPolicy array Rule summary for acceptable baseline and budget recalibration triggers.

Relationships

  • One LaneTrendPolicy governs many LaneTrendRecord entries for the same lane.
  • One LaneTrendPolicy informs one TrendComparisonWindow, one LaneDriftAssessment, and zero or more RecalibrationDecisionRecord entries per reporting cycle.

Validation Rules

  • retentionLimit must be greater than or equal to comparisonWindowSize.
  • minimumComparableSamples must be at least 3.
  • varianceFloorSeconds must align with or exceed the lane's existing enforcement tolerance.
  • Primary lanes use a larger retention window than support lanes.

2. LaneTrendRecord

Purpose: Captures the per-run evidence snapshot that can safely be compared over time.

Field Type Description
runRef string Stable run reference from CI or local execution.
laneId string Governed lane identifier.
workflowId string Workflow profile or logical workflow owner for the run.
triggerClass string Pull request, mainline push, manual, scheduled, or local classification.
generatedAt datetime When the record was emitted.
wallClockSeconds number Current lane runtime in seconds.
baselineSeconds number or null Current comparison baseline for the lane if defined.
baselineSource string Manifest source or comparison source that supplied the baseline.
budgetSeconds number Current lane budget threshold in seconds.
budgetStatus string Current lane budget status from the existing budget evaluator.
blockingStatus string Whether the current CI context blocks on this outcome.
comparisonFingerprint string Hash or structured fingerprint capturing comparability boundaries.
classificationTotals array Runtime grouped by current classification totals.
familyTotals array Runtime grouped by current family totals.
hotspotFiles array Current dominant hotspot files.
slowestEntries array Current slowest test entries, capped by policy.
artifactRefs array References to the summary, report, budget, JUnit, and history artifacts backing the record.

Validation Rules

  • A record must derive from the same lane's current summary.md, report.json, budget.json, and available JUnit output.
  • comparisonFingerprint must be present for any record eligible for comparison.
  • wallClockSeconds, budgetSeconds, and generatedAt are required.
  • slowestEntries must not exceed the lane policy retention cap.

3. TrendComparisonWindow

Purpose: Represents the bounded comparable history used to evaluate one lane in one reporting cycle.

Field Type Description
laneId string Governed lane identifier.
policyRef string Reference to the governing LaneTrendPolicy.
currentRecord object The latest LaneTrendRecord.
previousComparableRecord object or null The most recent prior comparable record, if one exists.
comparableRecords array Ordered comparable records used for trend evaluation.
excludedRecords array Recent records skipped because of fingerprint mismatch or invalid evidence.
windowStatus enum stable, insufficient-history, scope-changed, or noisy.
sampleCount integer Number of comparable records in the active window.

Validation Rules

  • Every comparable record must share the same comparisonFingerprint.
  • sampleCount may not exceed comparisonWindowSize.
  • previousComparableRecord must be the immediately preceding entry in comparableRecords when present.
  • windowStatus becomes insufficient-history whenever sampleCount is below minimumComparableSamples.

4. LaneDriftAssessment

Purpose: Summarizes the current drift verdict for one lane using the bounded comparison window.

Field Type Description
laneId string Governed lane identifier.
healthClass enum healthy, budget-near, trending-worse, regressed, or unstable.
deltaToPreviousSeconds number or null Current runtime delta vs previous comparable run.
deltaToPreviousPercent number or null Percent delta vs previous comparable run.
deltaToBaselineSeconds number or null Current runtime delta vs lane baseline.
deltaToBaselinePercent number or null Percent delta vs lane baseline.
budgetHeadroomSeconds number Remaining headroom before budget breach.
worseningStreak integer Count of recent comparable records showing meaningful worsening.
varianceObservedSeconds number Effective variance observed across the active window.
recalibrationRecommendation enum none, investigate, review-baseline, or review-budget.
summaryLine string Human-readable explanation emitted into markdown summaries.

Validation Rules

  • healthClass may only be non-unstable when the comparison window has at least minimumComparableSamples comparable records.
  • recalibrationRecommendation must remain separate from healthClass.
  • budgetHeadroomSeconds may be negative only when the lane is over budget.

5. HotspotTrendSnapshot

Purpose: Captures how the dominant runtime contributors changed between the current and previous comparable run.

Field Type Description
laneId string Governed lane identifier.
familyDeltas array Top family-level deltas with current seconds, previous seconds, and delta values.
fileHotspots array Top file hotspots with current/previous runtime and rank movement.
newEntrants array Families or files newly entering the visible hotspot set.
droppedEntrants array Families or files leaving the visible hotspot set.
evidenceAvailability enum available or unavailable, used when JUnit or attribution evidence is missing.

Validation Rules

  • Human-readable summaries must cap output at the policy's family/file limits.
  • JSON evidence may retain more detail, but must not exceed slowestEntryRetention.
  • If hotspot evidence is unavailable, the summary must say so explicitly.

6. RecalibrationDecisionRecord

Purpose: Records structured evidence for a proposed, approved, or rejected baseline/budget recalibration.

Field Type Description
laneId string Governed lane identifier.
targetType enum baseline or budget.
decisionStatus enum candidate, approved, or rejected.
evidenceRunRefs array Comparable runs supporting the decision.
previousValueSeconds number Existing baseline or budget value.
proposedValueSeconds number or null Proposed replacement value.
rationaleCode enum lane-scope-change, infrastructure-shift, post-improvement-reset, sustained-erosion, noise-rejected, or manual-hold.
recordedIn string Active spec path or implementation PR reference where the decision is documented.
notes string Concise reviewer-facing explanation.

Validation Rules

  • Approved baseline changes require at least one accepted rationale tied to scope or environment truth.
  • Approved budget changes require a stronger evidence window than approved baseline changes.
  • Rejected decisions must retain the rejection reason.
  • The artifact may propose candidates, but approval remains human-controlled.

7. TrendSummaryCycle

Purpose: Represents one generated trend-aware reporting cycle across the relevant lanes.

Field Type Description
cycleId string Reporting-cycle identifier, typically anchored to the current lane run or summary generation timestamp.
generatedAt datetime When the cycle summary was emitted.
laneSummaries array Per-lane summary entries containing laneId, current runtime, previous comparable runtime, baseline, budget, and the embedded drift assessment used by the readable summary surface.
laneAssessments array LaneDriftAssessment items for all relevant lanes.
hotspotSnapshots array HotspotTrendSnapshot items for lanes with available evidence.
recalibrationDecisions array Candidate, approved, or rejected recalibration records emitted for the cycle.
artifactPublicationStatus array Whether required current-run and history artifacts were published successfully.
warnings array Legibility notes such as missing comparable history or unavailable hotspot evidence.

Validation Rules

  • Every relevant primary lane must have exactly one laneSummaries entry and exactly one LaneDriftAssessment per cycle.
  • Each laneSummaries entry must expose the current runtime, previous comparable runtime, baseline, budget, and embedded health assessment needed by the readable summary surface.
  • warnings must be explicit when any required evidence is unavailable.
  • The cycle summary must stay readable without requiring a second dashboard surface.

State Transitions

LaneDriftAssessment.healthClass

  • unstable -> healthy: allowed once there are enough comparable samples and the lane is comfortably below budget without sustained worsening.
  • unstable -> budget-near: allowed once there are enough comparable samples and budget headroom falls inside the near-budget window.
  • unstable -> trending-worse: allowed once there are enough comparable samples and worsening exceeds the lane variance floor across the bounded window.
  • healthy <-> budget-near: allowed as headroom enters or leaves the near-budget band.
  • healthy or budget-near -> trending-worse: allowed when sustained worsening appears without a budget breach.
  • trending-worse -> regressed: allowed when the lane breaches budget or shows a materially worse repeated trend strong enough to stop calling it merely erosion.
  • Any state -> unstable: allowed when comparability breaks, history is insufficient, or the window is too noisy to classify reliably.

RecalibrationDecisionRecord.decisionStatus

  • candidate -> approved: allowed only by explicit human review with structured evidence.
  • candidate -> rejected: allowed when the evidence is noisy, incomplete, or policy says repository truth should not move.
  • approved and rejected: terminal statuses for the recorded decision.