# Quickstart: Test Runtime Trend Reporting & Baseline Recalibration ## Preconditions - Specs 206 through 210 are already implemented and remain the governing baseline for lane selection, budgets, CI workflow routing, and artifact publication. - Local validation runs from the repository root and uses Sail-backed commands for PHP and test execution. - At least one prior comparable artifact bundle or prior lane `*-latest.trend-history.json` file is available when validating a non-`unstable` history window locally. - No database migration, product route, Filament panel, or frontend asset step is required for this feature. ## Planned Artifact Additions - Extend the existing lane artifact set with `apps/platform/storage/logs/test-lanes/-latest.trend-history.json`. - Extend the existing `summary.md`, `report.json`, and `budget.json` outputs with trend-aware sections and fields rather than creating a parallel human-readable artifact surface. - Stage the new history artifact into the existing `.gitea-artifacts/` upload bundle for the owning lane. ## Recommended Implementation Order 1. Extend `TestLaneManifest` with the lane trend policy, bounded retention limits, comparison-fingerprint inputs, and recalibration guidance anchors. 2. Extend `TestLaneReport` so it can read a prior `*-latest.trend-history.json`, append the current `LaneTrendRecord`, trim to the lane retention limit, compute the trend window, emit drift status, and surface hotspot deltas. 3. Extend `TestLaneBudget` with recalibration recommendation helpers that stay separate from current budget outcome. 4. Extend `scripts/platform-test-report` so it refreshes trend-aware outputs after a prior history file has been hydrated into `apps/platform/storage/logs/test-lanes`. 5. Extend `scripts/platform-test-artifacts` and the checked-in artifact contracts so the trend history file is staged and uploaded with the existing lane bundle. 6. Update only the necessary Gitea workflow steps so each lane can hydrate the previous matching history artifact before report generation without widening lane execution. 7. Add or update Pest guard coverage for trend history, drift classes, hotspot deltas, recalibration rules, and workflow/artifact publication contracts. 8. Update `README.md` with reviewer guidance and capture representative validation evidence for the main trend cases. ## Local Validation Flow ### 1. Generate current lane artifacts ```bash ./scripts/platform-test-lane fast-feedback ./scripts/platform-test-lane confidence ./scripts/platform-test-report fast-feedback --skip-latest-history ./scripts/platform-test-report confidence --skip-latest-history ``` ### 2. Hydrate prior comparable history for a stable-window validation Use the wrapper flags instead of manual artifact copying so local runs exercise the same hydration contract as CI. ```bash ./scripts/platform-test-report fast-feedback --history-file=/absolute/path/to/fast-feedback-latest.trend-history.json ./scripts/platform-test-report confidence --history-bundle=/absolute/path/to/comparable-bundle-or-zip ``` ### 3. Rebuild workflow-shaped evidence without widening lane execution ```bash ./scripts/platform-test-report fast-feedback --workflow-id=pr-fast-feedback --trigger-class=pull-request --fetch-latest-history ./scripts/platform-test-report confidence --workflow-id=main-confidence --trigger-class=mainline-push --fetch-latest-history ./scripts/platform-test-report heavy-governance --workflow-id=heavy-governance --trigger-class=manual --skip-latest-history ./scripts/platform-test-report browser --workflow-id=browser-manual --trigger-class=manual --skip-latest-history ./scripts/platform-test-report profiling --skip-latest-history ./scripts/platform-test-report junit --skip-latest-history ``` ### 4. Stage artifact bundles exactly as CI will publish them ```bash ./scripts/platform-test-artifacts fast-feedback .gitea-artifacts/pr-fast-feedback --workflow-id=pr-fast-feedback --trigger-class=pull-request ./scripts/platform-test-artifacts confidence .gitea-artifacts/main-confidence --workflow-id=main-confidence --trigger-class=mainline-push ``` ### 5. Run focused guard coverage and formatting ```bash cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Guards cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent ``` ### 6. Time-box one reviewer summary check Use the generated summary only, set a two-minute timer, and verify that the reviewer can name the health class for each primary lane plus whether recalibration discussion is warranted before opening raw lane outputs. ## Health Class Cheat Sheet - `healthy`: the lane has enough comparable history, remains comfortably under budget, and recent variance stays below the lane noise floor. - `budget-near`: the lane is still passing, but its headroom is inside the lane's warning band. - `trending-worse`: multiple comparable samples are worsening above the documented variance floor. - `regressed`: the lane is over budget or repeatedly worsening enough that the report should stop calling it normal erosion. - `unstable`: the report is intentionally refusing a stronger label because the window is too short, too noisy, or no longer comparable. Recalibration is separate from health. The report can emit `candidate`, `approved`, or `rejected` baseline or budget decisions, but it never mutates repository truth automatically. ## Recorded Evidence Snapshot (2026-04-17) | Scenario | Lane | Runtime Window | Outcome | |----------|------|----------------|---------| | Live cold-start wrapper run | `fast-feedback` | current `120.29s`, previous `120.29s`, baseline `176.74s`, budget `200s` | `unstable`, hotspot evidence unavailable, budget recalibration rejected (`manual-hold`) because only two comparable samples existed | | Stable healthy window | `fast-feedback` | current `176.10s`, previous `175.60s`, baseline `176.74s`, budget `200s` | `healthy`, no recalibration recommended | | Stable budget-near window | `confidence` | current `433.00s`, previous `430.00s`, baseline `394.38s`, budget `450s` | `budget-near`, investigate before the lane becomes a repeated blocker | | Noisy window | `fast-feedback` | current `170.00s`, previous `195.00s`, baseline `176.74s`, budget `200s` | `unstable` with `windowStatus=noisy`, so the spike is treated as noise instead of structural regression | | Hotspot-stable example | `confidence` | current `394.38s`, previous `401.12s`, baseline `394.38s`, budget `450s` | `healthy`; dominant families stayed flat and the top files remained the baseline compare matrix pair plus onboarding-wizard enforcement | | Approved baseline recalibration | `fast-feedback` | current `176.30s`, previous `176.00s`, baseline reset from `176.74s` to `182.00s`, budget `200s` | baseline recalibration recorded as `approved` with rationale `post-improvement-reset` after the lane stabilized | | Rejected budget recalibration | `fast-feedback` | current `193.00s`, previous `176.00s`, baseline `176.74s`, budget `200s` | `budget-near`, but budget recalibration stayed `rejected` with rationale `noise-rejected` | | Candidate budget review | `confidence` | current `460.00s`, previous `420.00s`, baseline `394.38s`, budget `450s` | `regressed`, budget review emitted as a `candidate` only after a five-run evidence window | | Primary-lane cold starts | `browser`, `heavy-governance` | `109.67s/150s` and `228.34s/300s` | both reported `unstable` on first refresh, which is the intended cold-start behavior | | Support-lane path | `profiling`, `junit` | `2701.51s/3000s` and `380.14s/450s` | both wrappers now emit bounded `trend-history.json`; `junit` support-lane report refresh was repaired so the documented command actually works | ## Representative Evidence Set Capture at least one example for each of the following before calling the feature complete: 1. Three sequential comparable samples for each primary lane: `fast-feedback`, `confidence`, `heavy-governance`, and `browser`. 2. `healthy`: current runtime comfortably below budget with stable or improving recent comparable history. 3. `budget-near`: current runtime remains under budget but inside the lane's near-budget headroom band. 4. `trending-worse`: a bounded comparable window shows repeated worsening that is larger than the lane noise floor. 5. `regressed`: a budget breach or materially repeated worsening is clearly visible. 6. `unstable`: insufficient comparable history, fingerprint mismatch, or noisy evidence makes a stable label unsafe. 7. Approved recalibration case: explicit evidence shows why repository truth should change. 8. Rejected recalibration case: explicit evidence shows why repository truth should stay unchanged. 9. One support-lane example from `junit` or `profiling` when it materially improves hotspot or comparison evidence. Each recorded example should name the lane, current runtime, previous runtime, baseline, budget, health class, hotspot summary, and the recalibration conclusion when relevant. Material runtime drift, bundle-hydration caveats, and approved or rejected recalibration follow-up must be recorded in `specs/211-runtime-trend-recalibration/spec.md` or the active implementation PR. This quickstart may mirror the same evidence, but it does not replace the delivery record. ## CI Rollout Notes - CI should hydrate the previous matching `*-latest.trend-history.json` from the most recent comparable uploaded artifact bundle before the report refresh step. - The uploaded bundle for each governed workflow must include the refreshed `*-latest.trend-history.json` so the next run only needs one prior bundle. - The workflow-owned refresh steps now pass `--fetch-latest-history` together with `TENANTATLAS_GITEA_TOKEN` and top-level `actions: read` plus `contents: read` permissions so bundle discovery stays explicit. - Pull request and `dev` push validation remain the narrowest proving paths; heavy/browser/manual/scheduled lanes provide representative cross-lane evidence and must not be widened. ## Final Review Checklist - Trend policy lives in repository truth, not workflow prose. - `summary.md`, `report.json`, `budget.json`, and `*-latest.trend-history.json` agree on lane runtime and health class. - Baseline and budget recalibration remain explicit, reviewable, and separate. - Hotspot summaries stay readable and bounded. - A timed reviewer dry run confirms the generated summary remains decidable within two minutes. - The implementation does not add product persistence, routes, assets, or a second analytics surface.