TenantAtlas/specs/211-runtime-trend-recalibration/plan.md

# Implementation Plan: Test Runtime Trend Reporting & Baseline Recalibration

**Branch**: `211-runtime-trend-recalibration` | **Date**: 2026-04-17 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/211-runtime-trend-recalibration/spec.md`
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/211-runtime-trend-recalibration/spec.md`

## Summary

Implement Spec 211 by extending the existing repo-truth test-governance seams in `TestLaneManifest`, `TestLaneBudget`, `TestLaneReport`, the repo-root reporting wrapper, and the current CI artifact bundles so each governed lane can emit bounded runtime history, current-vs-previous-vs-baseline-vs-budget summaries, lane-first drift states, hotspot deltas, and explicit recalibration recommendations without introducing product database persistence or a second analytics platform.

## Technical Context

**Language/Version**: PHP 8.4.15 for repo-truth governance logic, Bash for repo-root wrappers, GitHub-compatible Gitea Actions workflow YAML under `.gitea/workflows/`, plus JSON Schema and logical OpenAPI for repository contracts
**Primary Dependencies**: Laravel 12, Pest v4, PHPUnit 12, Filament v5, Livewire v4, Laravel Sail, Gitea Actions backed by `act_runner`, uploaded artifact bundles, and the existing `Tests\Support\TestLaneManifest`, `TestLaneBudget`, and `TestLaneReport` seams
**Storage**: SQLite `:memory:` for lane execution, filesystem artifacts under `apps/platform/storage/logs/test-lanes`, staged CI bundles under `.gitea-artifacts/<workflow-profile>`, bounded derived trend/history artifacts adjacent to current lane artifacts, and no new product database persistence
**Testing**: Existing Pest lane and workflow guard suites, new repo-level trend/history/recalibration guard coverage, and representative local plus Gitea artifact sequences for primary lanes
**Validation Lanes**: `fast-feedback` and `confidence` for the narrowest proving path, with representative `heavy-governance`, `browser`, `junit`, and `profiling` evidence used only where hotspot attribution or cross-lane trend behavior needs proof
**Target Platform**: TenantAtlas monorepo on Gitea Actions with `act_runner`, Docker-isolated Sail jobs, repo-root lane/report wrappers, and local developer validation from the repository root
**Project Type**: Monorepo with a Laravel platform app and separate Astro website; this feature is scoped to repository/platform test governance only
**Performance Goals**: Produce lane summaries that remain understandable in under two minutes, classify drift from at least three comparable samples without duplicating full lane reruns, and keep hotspot trend visibility bounded to the dominant contributors rather than exhaustive historical detail
**Constraints**: Repo truth first; no product routes, panels, assets, or dependencies; no new product DB tables; lane-first reporting remains primary; baselines and budgets stay separate; recalibration is explicit; history stays bounded and lightweight; cross-run comparison must work from existing artifact bundles or explicit local inputs rather than assuming unlimited shared storage
**Scale/Scope**: Four primary governed lanes plus two support lanes, at least three comparable samples required for meaningful status, rolling bounded history per lane, and top hotspot visibility based on existing family/classification attribution and slowest-entry reporting

### Filament v5 Implementation Notes

- **Livewire v4.0+ compliance**: Preserved. This feature governs repository test-runtime reporting only and does not alter the Filament or Livewire runtime stack.
- **Provider registration location**: Unchanged. Existing panel providers remain registered in `bootstrap/providers.php`.
- **Global search rule**: No globally searchable resources are added or modified.
- **Destructive actions**: No runtime destructive actions are introduced. Existing confirmation and authorization behavior remain unchanged.
- **Asset strategy**: No panel or shared assets are added. Existing `filament:assets` deployment behavior remains unchanged.
- **Testing plan**: Add or update Pest guards for trend-history contracts, bundle discovery and hydration semantics, JSON schema plus logical OpenAPI contract sync validation, drift classification, recalibration evidence, hotspot delta output, wrapper/report integration, artifact staging/export behavior, timed review-speed acceptance, and representative multi-run evidence for the primary lanes.

## Test Governance Check

- **Affected validation lanes**: `fast-feedback` and `confidence` are the narrowest proving lanes; `heavy-governance`, `browser`, `junit`, and `profiling` remain evidence inputs only when the trend layer needs hotspot or cross-lane proof.
- **Narrowest proving command(s)**: `./scripts/platform-test-lane fast-feedback`, `./scripts/platform-test-report fast-feedback`, `./scripts/platform-test-lane confidence`, and `./scripts/platform-test-report confidence`.
- **Fixture / helper cost risks**: Low and bounded to repo-level report/history fixtures, manifest metadata, and guard helpers. The implementation must not add shared product fixtures, broaden default setup, or widen lane membership.
- **Heavy-family additions or promotions**: None. The feature consumes existing heavy/browser lanes as evidence sources and must not promote new coverage into them by accident.
- **Budget / baseline / trend follow-up**: Drift thresholds, bounded-history size, and any approved baseline or budget recalibration notes must be recorded in the active spec or implementation PR, with quickstart serving only as supplemental reproduction guidance rather than the delivery record.
- **Why no dedicated follow-up spec is needed**: Spec 211 is itself the structural trend-governance feature. After rollout, ordinary threshold upkeep should return to the normal feature-spec workflow unless recurring pain or another lane-model change appears.

## Constitution Check

*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*

- Inventory-first: PASS. No inventory, backup, or snapshot product truth changes.
- Read/write separation: PASS. This is repository-only reporting and governance work with no end-user mutations.
- Graph contract path: PASS. No Microsoft Graph calls or contract-registry changes.
- Deterministic capabilities: PASS. No capability resolver, role mapping, or authorization registry changes.
- RBAC-UX, workspace isolation, tenant isolation: PASS. No runtime routes, policies, or tenant/workspace access behavior changes.
- Run observability and Ops-UX: PASS. Trend artifacts remain filesystem- and bundle-based and do not introduce `OperationRun` changes.
- Data minimization: PASS. Trend history must remain derived from summary/report/budget outputs and must not store secrets, tenant payloads, or raw environment detail.
- Test governance (TEST-GOV-001): PASS WITH WORK. The feature must keep the narrowest proving lane explicit, avoid widening heavy lanes, and document any threshold or recalibration follow-up as part of the active delivery artifact.
- Proportionality and bloat control: PASS WITH LIMITS. The new history artifact, drift states, and recalibration rules are justified because per-run evidence alone cannot support trend-based governance. The implementation must stay inside the existing lane/report seams and avoid turning trend logic into a generalized analytics framework.
- TEST-TRUTH-001: PASS WITH WORK. Trend output must remain derived from real lane artifacts and comparable evidence windows, not optimistic labels or hand-maintained spreadsheets.
- Filament/UI constitutions: PASS / NOT APPLICABLE. No operator-facing runtime UI, action surfaces, badges, or panels are changed.

**Phase 0 Gate Result**: PASS

- The feature stays bounded to repository test-governance artifacts, history windows, trend evaluation, and documentation.
- No new product database truth, Graph seams, runtime routes, or authorization planes are introduced.
- The implementation extends existing lane/report structures rather than inventing a separate monitoring subsystem.

## Project Structure

### Documentation (this feature)

```text
specs/211-runtime-trend-recalibration/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
│   ├── test-runtime-trend-history.schema.json
│   └── test-runtime-trend.logical.openapi.yaml
└── tasks.md
```

### Source Code (repository root)

```text
.gitea/
├── workflows/
│   ├── test-pr-fast-feedback.yml
│   ├── test-main-confidence.yml
│   ├── test-heavy-governance.yml
│   └── test-browser.yml
apps/
├── platform/
│   ├── tests/
│   │   ├── Feature/Guards/
│   │   └── Support/
│   │       ├── TestLaneManifest.php
│   │       ├── TestLaneBudget.php
│   │       └── TestLaneReport.php
│   └── storage/logs/test-lanes/
scripts/
├── platform-test-lane
├── platform-test-report
└── platform-test-artifacts
README.md
```

**Structure Decision**: Keep trend truth in the existing `TestLaneManifest` / `TestLaneBudget` / `TestLaneReport` seams, extend the repo-root reporting flow rather than adding a second execution surface, and keep historical evidence adjacent to the existing lane artifact root and CI bundles so no new database or generic analytics layer is introduced.

## Complexity Tracking

| Violation | Why Needed | Simpler Alternative Rejected Because |
|-----------|------------|-------------------------------------|
| Repo-level trend history artifact | Multi-run drift and recalibration cannot be justified from one run plus README prose | Comparing only current vs previous or current vs baseline cannot distinguish sustained erosion, noise, and scope-change boundaries |
| Repo-level drift health states | Reviewers need consistent intermediate states between healthy and hard failure | A binary green/red view hides budget-near erosion and treats one-off spikes like structural regression |

## Proportionality Review

- **Current operator problem**: Maintainers can enforce budgets per run but cannot yet see whether runtime is eroding, whether a hotspot is becoming dominant, or whether baseline/budget recalibration is justified.
- **Existing structure is insufficient because**: Current lane reports describe one execution at a time and only include limited baseline comparison for narrow historical cases; they do not retain a bounded comparable window or policy-driven drift classification.
- **Narrowest correct implementation**: Extend the existing lane/report contract with bounded history, derived trend evaluation, and explicit recalibration guidance using the same lane artifact root and CI bundles.
- **Ownership cost created**: The repo must maintain history-window policy, drift thresholds, hotspot-delta output, recalibration guidance, and a small set of guard tests validating those semantics.
- **Alternative intentionally rejected**: A new database table, a git-tracked history file committed on every run, or a generalized analytics dashboard, because each would import more persistence or framework weight than the repository currently needs.
- **Release truth**: Current-release repository truth needed to make Specs 206 through 210 durable over time.

## Phase 0 — Research (complete)

- Output: [research.md](./research.md)
- Resolved key decisions:
  - Keep trend history and summaries adjacent to the existing lane artifact contract instead of creating a second storage system.
  - Treat uploaded Gitea artifact bundles as the shared CI history source, with explicit local artifact input as the fallback for local validation and reproducible examples.
  - Use a bounded rolling window per lane with a minimum comparable sample count before declaring stable health states.
  - Reuse existing family/classification attribution and slowest-entry output for hotspot trends instead of archiving exhaustive per-test history.
  - Separate lane health classification from recalibration recommendation so budgets and baselines do not collapse into a single status.
  - Extend the existing summary/report artifacts with trend-specific outputs and sections instead of creating a dashboard or parallel reporting surface.
  - Keep recalibration explicit and policy-driven, with different acceptable triggers for baseline changes and budget changes.

## Phase 1 — Design & Contracts (complete)

- Output: [data-model.md](./data-model.md) formalizes lane trend policy, trend records, comparison windows, drift assessments, hotspot trend snapshots, recalibration decisions, and cycle summaries.
- Output: [contracts/test-runtime-trend-history.schema.json](./contracts/test-runtime-trend-history.schema.json) defines the repository contract for bounded lane history, trend evaluation, hotspot deltas, and recalibration evidence.
- Output: [contracts/test-runtime-trend.logical.openapi.yaml](./contracts/test-runtime-trend.logical.openapi.yaml) captures the logical contract for updating one lane history window, evaluating one lane trend, evaluating recalibration, and emitting a cycle summary.
- Output: [quickstart.md](./quickstart.md) provides the implementation order, validation commands, and representative multi-run evidence checklist.

### Post-design Constitution Re-check

- PASS: No runtime routes, panels, Graph seams, or authorization planes are introduced.
- PASS: Trend history remains repository-owned and derived from existing lane artifacts rather than new product persistence.
- PASS: The design stays lane-first and keeps hotspot reporting supportive rather than dominant.
- PASS WITH WORK: The bounded history window and Gitea artifact hydration must remain lightweight and optional enough for local validation without assuming unlimited external retention.
- PASS WITH WORK: Baseline and budget updates must remain explicit manifest/spec changes backed by evidence, not runtime self-mutation.

## Phase 2 — Implementation Planning

`tasks.md` should cover:

- Auditing `TestLaneManifest`, `TestLaneBudget`, `TestLaneReport`, `scripts/platform-test-report`, and `scripts/platform-test-artifacts` as the only valid seams for trend history, drift policy, and artifact export.
- Extending `TestLaneManifest` with lane trend policy metadata, bounded-retention rules, comparability requirements, hotspot limits, and recalibration guidance anchors while keeping budgets and baselines distinct.
- Extending `TestLaneReport` so it can read current lane outputs plus a bounded prior artifact window, emit lane trend records, evaluate drift status, compute hotspot deltas, and write trend-aware summary/report/budget payloads.
- Extending `TestLaneBudget` with explicit recalibration recommendation helpers that assess baseline and budget policy separately from current budget outcome.
- Extending `scripts/platform-test-report` so it can discover, select, and hydrate the latest comparable prior history window from uploaded artifact bundles or explicit local artifact directories, then refresh trend-aware outputs without re-running a second full lane.
- Extending `scripts/platform-test-artifacts` and the existing workflow artifact contracts so trend-specific files are staged and uploaded alongside the current summary/report/budget/JUnit bundle.
- Updating `.gitea/workflows/test-pr-fast-feedback.yml`, `.gitea/workflows/test-main-confidence.yml`, `.gitea/workflows/test-heavy-governance.yml`, and `.gitea/workflows/test-browser.yml` only as needed to pass history-source context and export the new trend files, without widening lane execution.
- Adding or updating Pest guards for bounded history contracts, comparability breaks, latest-comparable-bundle hydration, drift-state classification, hotspot delta legibility, recalibration recommendation rules, JSON schema and logical OpenAPI contract sync, and no accidental heavy/browser promotion.
- Updating `README.md` with concise contributor guidance for reading trend summaries, understanding `healthy` / `budget-near` / `trending-worse` / `regressed` / `unstable`, and knowing when recalibration discussion is appropriate.
- Recording at least three sequential comparable samples for each primary lane, one support-lane example from `junit` or `profiling`, at least one healthy case, one budget-near case, one repeated worsening or regressed case, one unstable/noisy case, one justified plus one rejected recalibration case, and one timed reviewer read proving the summary remains decidable within two minutes.

### Contract Implementation Note

- The JSON schema is repository-tooling-oriented and defines the bounded history/trend contract even if the first implementation stores most of that truth in PHP arrays and generated JSON artifacts.
- The OpenAPI file is logical rather than transport-prescriptive. It documents how wrappers, support classes, and CI artifact inputs must interact, not a public HTTP API.
- The design intentionally reuses current lane report/budget artifacts as the canonical current-run evidence and layers bounded history on top.

### Deployment Sequencing Note

- No database migration is planned.
- No asset publish step changes.
- Recommended rollout order: add trend policy metadata and contracts, extend report generation to build trend outputs from explicit local inputs, extend artifact staging and workflow export, validate with local multi-run sequences for `fast-feedback` and `confidence`, then capture representative Gitea bundle sequences for the remaining primary lanes and document any approved recalibration evidence.