10 KiB
Quickstart: Test Runtime Trend Reporting & Baseline Recalibration
Preconditions
- Specs 206 through 210 are already implemented and remain the governing baseline for lane selection, budgets, CI workflow routing, and artifact publication.
- Local validation runs from the repository root and uses Sail-backed commands for PHP and test execution.
- At least one prior comparable artifact bundle or prior lane
*-latest.trend-history.jsonfile is available when validating a non-unstablehistory window locally. - No database migration, product route, Filament panel, or frontend asset step is required for this feature.
Planned Artifact Additions
- Extend the existing lane artifact set with
apps/platform/storage/logs/test-lanes/<lane>-latest.trend-history.json. - Extend the existing
summary.md,report.json, andbudget.jsonoutputs with trend-aware sections and fields rather than creating a parallel human-readable artifact surface. - Stage the new history artifact into the existing
.gitea-artifacts/<workflow-profile>upload bundle for the owning lane.
Recommended Implementation Order
- Extend
TestLaneManifestwith the lane trend policy, bounded retention limits, comparison-fingerprint inputs, and recalibration guidance anchors. - Extend
TestLaneReportso it can read a prior*-latest.trend-history.json, append the currentLaneTrendRecord, trim to the lane retention limit, compute the trend window, emit drift status, and surface hotspot deltas. - Extend
TestLaneBudgetwith recalibration recommendation helpers that stay separate from current budget outcome. - Extend
scripts/platform-test-reportso it refreshes trend-aware outputs after a prior history file has been hydrated intoapps/platform/storage/logs/test-lanes. - Extend
scripts/platform-test-artifactsand the checked-in artifact contracts so the trend history file is staged and uploaded with the existing lane bundle. - Update only the necessary Gitea workflow steps so each lane can hydrate the previous matching history artifact before report generation without widening lane execution.
- Add or update Pest guard coverage for trend history, drift classes, hotspot deltas, recalibration rules, and workflow/artifact publication contracts.
- Update
README.mdwith reviewer guidance and capture representative validation evidence for the main trend cases.
Local Validation Flow
1. Generate current lane artifacts
./scripts/platform-test-lane fast-feedback
./scripts/platform-test-lane confidence
./scripts/platform-test-report fast-feedback --skip-latest-history
./scripts/platform-test-report confidence --skip-latest-history
2. Hydrate prior comparable history for a stable-window validation
Use the wrapper flags instead of manual artifact copying so local runs exercise the same hydration contract as CI.
./scripts/platform-test-report fast-feedback --history-file=/absolute/path/to/fast-feedback-latest.trend-history.json
./scripts/platform-test-report confidence --history-bundle=/absolute/path/to/comparable-bundle-or-zip
3. Rebuild workflow-shaped evidence without widening lane execution
./scripts/platform-test-report fast-feedback --workflow-id=pr-fast-feedback --trigger-class=pull-request --fetch-latest-history
./scripts/platform-test-report confidence --workflow-id=main-confidence --trigger-class=mainline-push --fetch-latest-history
./scripts/platform-test-report heavy-governance --workflow-id=heavy-governance --trigger-class=manual --skip-latest-history
./scripts/platform-test-report browser --workflow-id=browser-manual --trigger-class=manual --skip-latest-history
./scripts/platform-test-report profiling --skip-latest-history
./scripts/platform-test-report junit --skip-latest-history
4. Stage artifact bundles exactly as CI will publish them
./scripts/platform-test-artifacts fast-feedback .gitea-artifacts/pr-fast-feedback --workflow-id=pr-fast-feedback --trigger-class=pull-request
./scripts/platform-test-artifacts confidence .gitea-artifacts/main-confidence --workflow-id=main-confidence --trigger-class=mainline-push
5. Run focused guard coverage and formatting
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Guards
cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent
6. Time-box one reviewer summary check
Use the generated summary only, set a two-minute timer, and verify that the reviewer can name the health class for each primary lane plus whether recalibration discussion is warranted before opening raw lane outputs.
Health Class Cheat Sheet
healthy: the lane has enough comparable history, remains comfortably under budget, and recent variance stays below the lane noise floor.budget-near: the lane is still passing, but its headroom is inside the lane's warning band.trending-worse: multiple comparable samples are worsening above the documented variance floor.regressed: the lane is over budget or repeatedly worsening enough that the report should stop calling it normal erosion.unstable: the report is intentionally refusing a stronger label because the window is too short, too noisy, or no longer comparable.
Recalibration is separate from health. The report can emit candidate, approved, or rejected baseline or budget decisions, but it never mutates repository truth automatically.
Recorded Evidence Snapshot (2026-04-17)
| Scenario | Lane | Runtime Window | Outcome |
|---|---|---|---|
| Live cold-start wrapper run | fast-feedback |
current 120.29s, previous 120.29s, baseline 176.74s, budget 200s |
unstable, hotspot evidence unavailable, budget recalibration rejected (manual-hold) because only two comparable samples existed |
| Stable healthy window | fast-feedback |
current 176.10s, previous 175.60s, baseline 176.74s, budget 200s |
healthy, no recalibration recommended |
| Stable budget-near window | confidence |
current 433.00s, previous 430.00s, baseline 394.38s, budget 450s |
budget-near, investigate before the lane becomes a repeated blocker |
| Noisy window | fast-feedback |
current 170.00s, previous 195.00s, baseline 176.74s, budget 200s |
unstable with windowStatus=noisy, so the spike is treated as noise instead of structural regression |
| Hotspot-stable example | confidence |
current 394.38s, previous 401.12s, baseline 394.38s, budget 450s |
healthy; dominant families stayed flat and the top files remained the baseline compare matrix pair plus onboarding-wizard enforcement |
| Approved baseline recalibration | fast-feedback |
current 176.30s, previous 176.00s, baseline reset from 176.74s to 182.00s, budget 200s |
baseline recalibration recorded as approved with rationale post-improvement-reset after the lane stabilized |
| Rejected budget recalibration | fast-feedback |
current 193.00s, previous 176.00s, baseline 176.74s, budget 200s |
budget-near, but budget recalibration stayed rejected with rationale noise-rejected |
| Candidate budget review | confidence |
current 460.00s, previous 420.00s, baseline 394.38s, budget 450s |
regressed, budget review emitted as a candidate only after a five-run evidence window |
| Primary-lane cold starts | browser, heavy-governance |
109.67s/150s and 228.34s/300s |
both reported unstable on first refresh, which is the intended cold-start behavior |
| Support-lane path | profiling, junit |
2701.51s/3000s and 380.14s/450s |
both wrappers now emit bounded trend-history.json; junit support-lane report refresh was repaired so the documented command actually works |
Representative Evidence Set
Capture at least one example for each of the following before calling the feature complete:
- Three sequential comparable samples for each primary lane:
fast-feedback,confidence,heavy-governance, andbrowser. healthy: current runtime comfortably below budget with stable or improving recent comparable history.budget-near: current runtime remains under budget but inside the lane's near-budget headroom band.trending-worse: a bounded comparable window shows repeated worsening that is larger than the lane noise floor.regressed: a budget breach or materially repeated worsening is clearly visible.unstable: insufficient comparable history, fingerprint mismatch, or noisy evidence makes a stable label unsafe.- Approved recalibration case: explicit evidence shows why repository truth should change.
- Rejected recalibration case: explicit evidence shows why repository truth should stay unchanged.
- One support-lane example from
junitorprofilingwhen it materially improves hotspot or comparison evidence.
Each recorded example should name the lane, current runtime, previous runtime, baseline, budget, health class, hotspot summary, and the recalibration conclusion when relevant.
Material runtime drift, bundle-hydration caveats, and approved or rejected recalibration follow-up must be recorded in specs/211-runtime-trend-recalibration/spec.md or the active implementation PR. This quickstart may mirror the same evidence, but it does not replace the delivery record.
CI Rollout Notes
- CI should hydrate the previous matching
*-latest.trend-history.jsonfrom the most recent comparable uploaded artifact bundle before the report refresh step. - The uploaded bundle for each governed workflow must include the refreshed
*-latest.trend-history.jsonso the next run only needs one prior bundle. - The workflow-owned refresh steps now pass
--fetch-latest-historytogether withTENANTATLAS_GITEA_TOKENand top-levelactions: readpluscontents: readpermissions so bundle discovery stays explicit. - Pull request and
devpush validation remain the narrowest proving paths; heavy/browser/manual/scheduled lanes provide representative cross-lane evidence and must not be widened.
Final Review Checklist
- Trend policy lives in repository truth, not workflow prose.
summary.md,report.json,budget.json, and*-latest.trend-history.jsonagree on lane runtime and health class.- Baseline and budget recalibration remain explicit, reviewable, and separate.
- Hotspot summaries stay readable and bounded.
- A timed reviewer dry run confirms the generated summary remains decidable within two minutes.
- The implementation does not add product persistence, routes, assets, or a second analytics surface.