TenantAtlas/specs/211-runtime-trend-recalibration/tasks.md

# Tasks: Test Runtime Trend Reporting & Baseline Recalibration

**Input**: Design documents from `/Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/211-runtime-trend-recalibration/`
**Prerequisites**: `plan.md` (required), `spec.md` (required), `research.md`, `data-model.md`, `contracts/`, `quickstart.md`

**Tests**: Required. This feature changes repository test-governance runtime behavior, so each user story includes Pest guard coverage plus focused lane and wrapper validation through Sail and the repo-root test-governance scripts.

**Organization**: Tasks are grouped by user story so each story can be implemented and validated independently where possible.

## Phase 1: Setup (Shared Context)

**Purpose**: Freeze the real repo-truth seams and artifact boundaries before implementation begins.

- [X] T001 [P] Audit `apps/platform/tests/Support/TestLaneManifest.php`, `apps/platform/tests/Support/TestLaneBudget.php`, `apps/platform/tests/Support/TestLaneReport.php`, `scripts/platform-test-report`, `scripts/platform-test-artifacts`, and `.gitea/workflows/*.yml` as the only valid trend-history and runtime-governance seams before implementation

---

## Phase 2: Foundational (Blocking Prerequisites)

**Purpose**: Extend the shared manifest, artifact, and wrapper seams that every story depends on.

**Critical**: No user story work should begin until this phase is complete.

- [X] T002 Extend `apps/platform/tests/Support/TestLaneManifest.php` with lane trend policy metadata, retention and comparison-window defaults, comparison-fingerprint inputs, hotspot limits, and `trend-history.json` artifact contracts aligned to `specs/211-runtime-trend-recalibration/data-model.md`
- [X] T003 [P] Extend `apps/platform/tests/Support/TestLaneReport.php` artifact path, read or write, and staging helpers so `apps/platform/storage/logs/test-lanes/<lane>-latest.trend-history.json` can be published alongside the existing summary, budget, report, and JUnit artifacts
- [X] T004 [P] Update `scripts/platform-test-report` and `scripts/platform-test-artifacts` to discover, select, and hydrate the latest comparable prior bundle or explicit local history input, then export the canonical `trend-history.json` artifact through the existing repo-root wrappers
- [X] T005 [P] Add or update shared guard coverage in `apps/platform/tests/Feature/Guards/TestLaneManifestTest.php`, `apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneHistoryHydrationContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneTrendContractSchemaTest.php`, and `apps/platform/tests/Feature/Guards/TestLaneTrendLogicalContractTest.php` to lock lane trend policy metadata, latest-comparable-bundle hydration semantics, JSON schema sync against `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend-history.schema.json`, logical contract sync against `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend.logical.openapi.yaml`, and staged bundle completeness for `trend-history.json`

**Checkpoint**: The shared trend-governance seams are ready for story-specific summary, recalibration, and hotspot work.

---

## Phase 3: User Story 1 - See Lane Drift Before It Becomes A Repeated Gate (Priority: P1) 🎯 MVP

**Goal**: Publish lane-first trend summaries that show current, previous, baseline, budget, and health status before a lane becomes a recurring blocker.

**Independent Test**: Review representative three-sample run sequences for `fast-feedback` and `confidence`, confirm the summary shows current, previous, baseline, and budget values, and verify that healthy, near-budget, worsening, and noisy cases are distinguishable without manual arithmetic.

### Tests for User Story 1

- [X] T006 [P] [US1] Add `apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php` and update `apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php` to assert bounded history windows and current, previous, baseline, and budget fields for `fast-feedback` and `confidence`
- [X] T007 [P] [US1] Add `apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php` to cover `healthy`, `budget-near`, `trending-worse`, `regressed`, and `unstable` outcomes, including one-off noisy spike handling

### Implementation for User Story 1

- [X] T008 [US1] Extend `apps/platform/tests/Support/TestLaneReport.php` with `LaneTrendRecord` generation, comparison-window evaluation, comparison fingerprints, and trend-aware `summary.md` plus `report.json` output for `fast-feedback` and `confidence`
- [X] T009 [US1] Update `apps/platform/tests/Support/TestLaneManifest.php`, `.gitea/workflows/test-pr-fast-feedback.yml`, and `.gitea/workflows/test-main-confidence.yml` so pull-request and mainline bundles discover and hydrate the latest comparable history bundle, then republish the refreshed `trend-history.json` artifact without widening lane execution
- [X] T010 [US1] Update `README.md` and `specs/211-runtime-trend-recalibration/quickstart.md` with reviewer guidance and local validation steps for reading lane health summaries across `fast-feedback` and `confidence`
- [X] T011 [US1] Run the narrowest proving path with `./scripts/platform-test-lane fast-feedback`, `./scripts/platform-test-report fast-feedback`, `./scripts/platform-test-lane confidence`, and `./scripts/platform-test-report confidence`, then record representative three-sample `healthy`, `budget-near`, and `unstable` evidence in `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/quickstart.md`

**Checkpoint**: At this point, lane drift visibility for the main contributor lanes should be independently functional and reviewable.

---

## Phase 4: User Story 2 - Decide Recalibration With Evidence Instead Of Habit (Priority: P1)

**Goal**: Separate baseline and budget recalibration from ordinary health status and make every recalibration decision evidence-backed.

**Independent Test**: Review one justified recalibration case and one rejected recalibration case, and confirm the report plus policy make the outcome understandable without private notes.

### Tests for User Story 2

- [X] T012 [P] [US2] Add `apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php` to assert baseline-vs-budget separation, evidence-window requirements, and approved versus rejected rationale handling
- [X] T013 [P] [US2] Add `apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php` to assert candidate, approved, and rejected recalibration records together with explicit summary disclosure for recalibration outcomes

### Implementation for User Story 2

- [X] T014 [US2] Extend `apps/platform/tests/Support/TestLaneBudget.php` with recalibration recommendation helpers, lane-specific tolerance reuse, and explicit baseline plus budget review rules aligned to `specs/211-runtime-trend-recalibration/data-model.md`
- [X] T015 [US2] Extend `apps/platform/tests/Support/TestLaneManifest.php` and `apps/platform/tests/Support/TestLaneReport.php` to emit structured recalibration policy metadata, decision records, evidence run references, and `recordedIn` guidance pointing to `specs/211-runtime-trend-recalibration/spec.md` or the implementation PR without mutating manifest truth automatically
- [X] T016 [US2] Update `README.md` and `specs/211-runtime-trend-recalibration/quickstart.md` with the approved and rejected recalibration policy, required evidence windows, and reviewer follow-up rules
- [X] T017 [US2] Run recalibration validation with `./scripts/platform-test-report fast-feedback` and `./scripts/platform-test-report confidence` against seeded prior histories, then record one approved and one rejected recalibration example in `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/quickstart.md`

**Checkpoint**: At this point, recalibration guidance should be independently testable and clearly separated from ordinary lane health.

---

## Phase 5: User Story 3 - Track Dominant Hotspots Over Time (Priority: P2)

**Goal**: Surface persistent, worsening, and newly dominant hotspots so follow-up optimization work targets the real cost drivers.

**Independent Test**: Review representative hotspot summaries for each primary lane across multiple runs and confirm that persistent, worsening, newly dominant, and unavailable hotspot states are visible.

### Tests for User Story 3

- [X] T018 [P] [US3] Add `apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php` to assert top family and file delta output, new or dropped hotspot detection, and explicit unavailable-hotspot disclosure
- [X] T019 [P] [US3] Update `apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php`, `apps/platform/tests/Feature/Guards/FastFeedbackLaneContractTest.php`, `apps/platform/tests/Feature/Guards/ConfidenceLaneContractTest.php`, `apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php`, `apps/platform/tests/Feature/Guards/BrowserLaneIsolationTest.php`, and `apps/platform/tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php` to assert support-lane hotspot evidence and hotspot visibility for all primary lanes plus the chosen `junit` or `profiling` support example

### Implementation for User Story 3

- [X] T020 [US3] Extend `apps/platform/tests/Support/TestLaneReport.php` with hotspot delta computation from `classificationTotals`, `familyTotals`, `hotspotFiles`, and `slowestEntries`, capping readable output to the policy limits defined in `apps/platform/tests/Support/TestLaneManifest.php`
- [X] T021 [US3] Update `apps/platform/tests/Support/TestLaneManifest.php`, `.gitea/workflows/test-heavy-governance.yml`, and `.gitea/workflows/test-browser.yml` so heavy and browser bundles retain hotspot-supporting history context and surface missing hotspot evidence explicitly
- [X] T022 [US3] Update `README.md` and `specs/211-runtime-trend-recalibration/quickstart.md` with hotspot investigation guidance, `profiling` and `junit` support-lane usage, and examples of persistent versus newly dominant hotspots
- [X] T023 [US3] Run representative hotspot validation with `./scripts/platform-test-report fast-feedback`, `./scripts/platform-test-report confidence`, `./scripts/platform-test-lane heavy-governance`, `./scripts/platform-test-report heavy-governance`, `./scripts/platform-test-lane browser`, `./scripts/platform-test-report browser`, and one support-lane report path from `./scripts/platform-test-report profiling` or `./scripts/platform-test-report junit`, then record persistent, worsening, newly dominant, and unavailable hotspot evidence for each primary lane in `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/quickstart.md`

**Checkpoint**: At this point, hotspot trend visibility should be independently functional without depending on recalibration rollout evidence.

---

## Phase 6: Polish & Cross-Cutting Concerns

**Purpose**: Validate the full trend-governance slice, record evidence, and finish formatting.

- [X] T024 Run focused Pest coverage for `apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php`, `apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php`, `apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneHistoryHydrationContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneTrendContractSchemaTest.php`, `apps/platform/tests/Feature/Guards/TestLaneTrendLogicalContractTest.php`, `apps/platform/tests/Feature/Guards/TestLaneManifestTest.php`, `apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php`, `apps/platform/tests/Feature/Guards/FastFeedbackLaneContractTest.php`, `apps/platform/tests/Feature/Guards/ConfidenceLaneContractTest.php`, `apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php`, `apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php`, `apps/platform/tests/Feature/Guards/BrowserLaneIsolationTest.php`, and `apps/platform/tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php` with `cd apps/platform && ./vendor/bin/sail artisan test --compact ...`
- [X] T025 [P] Execute the representative local and Gitea evidence set across `.gitea/workflows/test-pr-fast-feedback.yml`, `.gitea/workflows/test-main-confidence.yml`, `.gitea/workflows/test-heavy-governance.yml`, and `.gitea/workflows/test-browser.yml`, capture at least three sequential comparable samples for each primary lane, include one support-lane example from `junit` or `profiling`, time-box a reviewer dry run to confirm the summary remains decidable within two minutes, and record lane, health class, hotspot availability, recalibration outcome, and any material runtime drift follow-up in `specs/211-runtime-trend-recalibration/spec.md` and `specs/211-runtime-trend-recalibration/quickstart.md`
- [X] T026 Run `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent` for changes in `apps/platform/tests/Support/TestLaneManifest.php`, `apps/platform/tests/Support/TestLaneBudget.php`, `apps/platform/tests/Support/TestLaneReport.php`, and the new or updated guard tests under `apps/platform/tests/Feature/Guards/`

---

## Dependencies & Execution Order

### Phase Dependencies

- **Setup (Phase 1)**: No dependencies and can start immediately.
- **Foundational (Phase 2)**: Depends on Phase 1 and blocks all user story work.
- **User Story 1 (Phase 3)**: Depends on Phase 2 only and is the MVP slice.
- **User Story 2 (Phase 4)**: Depends on Phase 2 and benefits from the trend-history infrastructure completed for User Story 1.
- **User Story 3 (Phase 5)**: Depends on Phase 2 and should follow User Story 1 because hotspot deltas reuse the same history and assessment outputs.
- **Polish (Phase 6)**: Depends on all desired user stories being complete.

### User Story Dependencies

- **User Story 1 (P1)**: Can begin immediately after Foundational and delivers the first usable runtime-trend surface.
- **User Story 2 (P1)**: Requires the same history contract as User Story 1 but remains independently valuable once that contract exists.
- **User Story 3 (P2)**: Reuses the bounded history from User Story 1 and the policy limits from Foundational, but does not need User Story 2 to be useful.

### Within Each User Story

- Story-specific guard tests should be written and fail before implementation.
- Manifest and wrapper contract changes should be in place before finalizing report output, schema validation, and comparable-bundle hydration steps.
- README and quickstart guidance should land after the corresponding runtime behavior exists.
- Lane validation and evidence capture should complete before closing a story.

### Parallel Opportunities

- T003, T004, and T005 can proceed in parallel once T002 fixes the shared manifest shape.
- In User Story 1, T006 and T007 can run in parallel because they cover separate guard surfaces.
- In User Story 2, T012 and T013 can run in parallel because policy rules and evidence-record assertions are independent tests.
- In User Story 3, T018 and T019 can run in parallel because they touch separate guard suites.
- T025 can run in parallel with final formatting once all implementation and guard work is stable.

---

## Parallel Example: User Story 1

```bash
# After T002-T005 establish the shared history contract, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php and update TestLaneArtifactsContractTest.php"
Task: "Add apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php"
```

---

## Parallel Example: User Story 2

```bash
# After User Story 1 exposes comparable history, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php"
Task: "Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php"
```

---

## Parallel Example: User Story 3

```bash
# After the shared hotspot-ready report shape exists, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php"
Task: "Update apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php and apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php"
```

---

## Implementation Strategy

### MVP First (User Story 1 Only)

1. Complete Phase 1: Setup.
2. Complete Phase 2: Foundational.
3. Complete Phase 3: User Story 1.
4. Validate `fast-feedback` and `confidence` trend summaries independently before continuing.

### Incremental Delivery

1. Deliver bounded history and lane health summaries first.
2. Add explicit recalibration policy and evidence records next.
3. Add hotspot delta visibility for heavy, browser, and support-lane-assisted investigations last.
4. Finish with focused guard validation, real evidence capture, and formatting.

### Parallel Team Strategy

1. One contributor can extend `apps/platform/tests/Support/TestLaneManifest.php` and wrapper scripts while another prepares the new guard suites.
2. After Foundational completes, User Story 1 test work and workflow hydration changes can be split across contributors.
3. User Story 2 recalibration logic and User Story 3 hotspot logic can proceed separately once the history contract is stable.

---

## Notes

- `[P]` tasks operate on different files or independent guard suites and can run in parallel once dependencies are satisfied.
- `[US1]`, `[US2]`, and `[US3]` map tasks directly to the user stories in `spec.md`.
- This feature changes runtime-governance behavior, so the narrowest relevant lane reruns and evidence capture remain part of the definition of done.
- Live Gitea validation remains required because local wrapper tests alone cannot prove cross-run artifact hydration and uploaded bundle behavior.