Main Confidence / confidence (push) Failing after 46s

Details

feat: implement runtime trend recalibration reporting (#244 )

## Summary
- implement Spec 211 runtime trend reporting with bounded lane history, drift classification, hotspot trend output, and recalibration evidence handling
- extend the repo-truth governance seams and workflow wrappers for comparable-bundle hydration, trend artifact publication, and contract-backed reporting
- add the Spec 211 planning artifacts, data model, quickstart, tasks, and repository contract documents

## Validation
- parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend-history.schema.json`
- parsed `specs/211-runtime-trend-recalibration/contracts/test-runtime-trend.logical.openapi.yaml`
- re-ran cross-artifact consistency analysis for the Spec 211 artifact set until no material findings remained
- no application test suite was re-run as part of this final commit/push/PR step

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #244

2026-04-18 07:36:05 +00:00

18 KiB

Raw Blame History

Tasks: Test Runtime Trend Reporting & Baseline Recalibration

Input: Design documents from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/211-runtime-trend-recalibration/ Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md

Tests: Required. This feature changes repository test-governance runtime behavior, so each user story includes Pest guard coverage plus focused lane and wrapper validation through Sail and the repo-root test-governance scripts.

Organization: Tasks are grouped by user story so each story can be implemented and validated independently where possible.

Phase 1: Setup (Shared Context)

Purpose: Freeze the real repo-truth seams and artifact boundaries before implementation begins.

T001 [P] Audit apps/platform/tests/Support/TestLaneManifest.php, apps/platform/tests/Support/TestLaneBudget.php, apps/platform/tests/Support/TestLaneReport.php, scripts/platform-test-report, scripts/platform-test-artifacts, and .gitea/workflows/*.yml as the only valid trend-history and runtime-governance seams before implementation

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Extend the shared manifest, artifact, and wrapper seams that every story depends on.

Critical: No user story work should begin until this phase is complete.

T002 Extend apps/platform/tests/Support/TestLaneManifest.php with lane trend policy metadata, retention and comparison-window defaults, comparison-fingerprint inputs, hotspot limits, and trend-history.json artifact contracts aligned to specs/211-runtime-trend-recalibration/data-model.md
T003 [P] Extend apps/platform/tests/Support/TestLaneReport.php artifact path, read or write, and staging helpers so apps/platform/storage/logs/test-lanes/<lane>-latest.trend-history.json can be published alongside the existing summary, budget, report, and JUnit artifacts
T004 [P] Update scripts/platform-test-report and scripts/platform-test-artifacts to discover, select, and hydrate the latest comparable prior bundle or explicit local history input, then export the canonical trend-history.json artifact through the existing repo-root wrappers
T005 [P] Add or update shared guard coverage in apps/platform/tests/Feature/Guards/TestLaneManifestTest.php, apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php, apps/platform/tests/Feature/Guards/TestLaneHistoryHydrationContractTest.php, apps/platform/tests/Feature/Guards/TestLaneTrendContractSchemaTest.php, and apps/platform/tests/Feature/Guards/TestLaneTrendLogicalContractTest.php to lock lane trend policy metadata, latest-comparable-bundle hydration semantics, JSON schema sync against specs/211-runtime-trend-recalibration/contracts/test-runtime-trend-history.schema.json, logical contract sync against specs/211-runtime-trend-recalibration/contracts/test-runtime-trend.logical.openapi.yaml, and staged bundle completeness for trend-history.json

Checkpoint: The shared trend-governance seams are ready for story-specific summary, recalibration, and hotspot work.

Phase 3: User Story 1 - See Lane Drift Before It Becomes A Repeated Gate (Priority: P1) 🎯 MVP

Goal: Publish lane-first trend summaries that show current, previous, baseline, budget, and health status before a lane becomes a recurring blocker.

Independent Test: Review representative three-sample run sequences for fast-feedback and confidence, confirm the summary shows current, previous, baseline, and budget values, and verify that healthy, near-budget, worsening, and noisy cases are distinguishable without manual arithmetic.

Tests for User Story 1

T006 [P] [US1] Add apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php and update apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php to assert bounded history windows and current, previous, baseline, and budget fields for fast-feedback and confidence
T007 [P] [US1] Add apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php to cover healthy, budget-near, trending-worse, regressed, and unstable outcomes, including one-off noisy spike handling

Implementation for User Story 1

T008 [US1] Extend apps/platform/tests/Support/TestLaneReport.php with LaneTrendRecord generation, comparison-window evaluation, comparison fingerprints, and trend-aware summary.md plus report.json output for fast-feedback and confidence
T009 [US1] Update apps/platform/tests/Support/TestLaneManifest.php, .gitea/workflows/test-pr-fast-feedback.yml, and .gitea/workflows/test-main-confidence.yml so pull-request and mainline bundles discover and hydrate the latest comparable history bundle, then republish the refreshed trend-history.json artifact without widening lane execution
T010 [US1] Update README.md and specs/211-runtime-trend-recalibration/quickstart.md with reviewer guidance and local validation steps for reading lane health summaries across fast-feedback and confidence
T011 [US1] Run the narrowest proving path with ./scripts/platform-test-lane fast-feedback, ./scripts/platform-test-report fast-feedback, ./scripts/platform-test-lane confidence, and ./scripts/platform-test-report confidence, then record representative three-sample healthy, budget-near, and unstable evidence in specs/211-runtime-trend-recalibration/spec.md and specs/211-runtime-trend-recalibration/quickstart.md

Checkpoint: At this point, lane drift visibility for the main contributor lanes should be independently functional and reviewable.

Phase 4: User Story 2 - Decide Recalibration With Evidence Instead Of Habit (Priority: P1)

Goal: Separate baseline and budget recalibration from ordinary health status and make every recalibration decision evidence-backed.

Independent Test: Review one justified recalibration case and one rejected recalibration case, and confirm the report plus policy make the outcome understandable without private notes.

Tests for User Story 2

T012 [P] [US2] Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php to assert baseline-vs-budget separation, evidence-window requirements, and approved versus rejected rationale handling
T013 [P] [US2] Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php to assert candidate, approved, and rejected recalibration records together with explicit summary disclosure for recalibration outcomes

Implementation for User Story 2

T014 [US2] Extend apps/platform/tests/Support/TestLaneBudget.php with recalibration recommendation helpers, lane-specific tolerance reuse, and explicit baseline plus budget review rules aligned to specs/211-runtime-trend-recalibration/data-model.md
T015 [US2] Extend apps/platform/tests/Support/TestLaneManifest.php and apps/platform/tests/Support/TestLaneReport.php to emit structured recalibration policy metadata, decision records, evidence run references, and recordedIn guidance pointing to specs/211-runtime-trend-recalibration/spec.md or the implementation PR without mutating manifest truth automatically
T016 [US2] Update README.md and specs/211-runtime-trend-recalibration/quickstart.md with the approved and rejected recalibration policy, required evidence windows, and reviewer follow-up rules
T017 [US2] Run recalibration validation with ./scripts/platform-test-report fast-feedback and ./scripts/platform-test-report confidence against seeded prior histories, then record one approved and one rejected recalibration example in specs/211-runtime-trend-recalibration/spec.md and specs/211-runtime-trend-recalibration/quickstart.md

Checkpoint: At this point, recalibration guidance should be independently testable and clearly separated from ordinary lane health.

Phase 5: User Story 3 - Track Dominant Hotspots Over Time (Priority: P2)

Goal: Surface persistent, worsening, and newly dominant hotspots so follow-up optimization work targets the real cost drivers.

Independent Test: Review representative hotspot summaries for each primary lane across multiple runs and confirm that persistent, worsening, newly dominant, and unavailable hotspot states are visible.

Tests for User Story 3

T018 [P] [US3] Add apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php to assert top family and file delta output, new or dropped hotspot detection, and explicit unavailable-hotspot disclosure
T019 [P] [US3] Update apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php, apps/platform/tests/Feature/Guards/FastFeedbackLaneContractTest.php, apps/platform/tests/Feature/Guards/ConfidenceLaneContractTest.php, apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php, apps/platform/tests/Feature/Guards/BrowserLaneIsolationTest.php, and apps/platform/tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php to assert support-lane hotspot evidence and hotspot visibility for all primary lanes plus the chosen junit or profiling support example

Implementation for User Story 3

T020 [US3] Extend apps/platform/tests/Support/TestLaneReport.php with hotspot delta computation from classificationTotals, familyTotals, hotspotFiles, and slowestEntries, capping readable output to the policy limits defined in apps/platform/tests/Support/TestLaneManifest.php
T021 [US3] Update apps/platform/tests/Support/TestLaneManifest.php, .gitea/workflows/test-heavy-governance.yml, and .gitea/workflows/test-browser.yml so heavy and browser bundles retain hotspot-supporting history context and surface missing hotspot evidence explicitly
T022 [US3] Update README.md and specs/211-runtime-trend-recalibration/quickstart.md with hotspot investigation guidance, profiling and junit support-lane usage, and examples of persistent versus newly dominant hotspots
T023 [US3] Run representative hotspot validation with ./scripts/platform-test-report fast-feedback, ./scripts/platform-test-report confidence, ./scripts/platform-test-lane heavy-governance, ./scripts/platform-test-report heavy-governance, ./scripts/platform-test-lane browser, ./scripts/platform-test-report browser, and one support-lane report path from ./scripts/platform-test-report profiling or ./scripts/platform-test-report junit, then record persistent, worsening, newly dominant, and unavailable hotspot evidence for each primary lane in specs/211-runtime-trend-recalibration/spec.md and specs/211-runtime-trend-recalibration/quickstart.md

Checkpoint: At this point, hotspot trend visibility should be independently functional without depending on recalibration rollout evidence.

Phase 6: Polish & Cross-Cutting Concerns

Purpose: Validate the full trend-governance slice, record evidence, and finish formatting.

T024 Run focused Pest coverage for apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php, apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php, apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php, apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php, apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php, apps/platform/tests/Feature/Guards/TestLaneHistoryHydrationContractTest.php, apps/platform/tests/Feature/Guards/TestLaneTrendContractSchemaTest.php, apps/platform/tests/Feature/Guards/TestLaneTrendLogicalContractTest.php, apps/platform/tests/Feature/Guards/TestLaneManifestTest.php, apps/platform/tests/Feature/Guards/TestLaneArtifactsContractTest.php, apps/platform/tests/Feature/Guards/FastFeedbackLaneContractTest.php, apps/platform/tests/Feature/Guards/ConfidenceLaneContractTest.php, apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php, apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php, apps/platform/tests/Feature/Guards/BrowserLaneIsolationTest.php, and apps/platform/tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php with cd apps/platform && ./vendor/bin/sail artisan test --compact ...
T025 [P] Execute the representative local and Gitea evidence set across .gitea/workflows/test-pr-fast-feedback.yml, .gitea/workflows/test-main-confidence.yml, .gitea/workflows/test-heavy-governance.yml, and .gitea/workflows/test-browser.yml, capture at least three sequential comparable samples for each primary lane, include one support-lane example from junit or profiling, time-box a reviewer dry run to confirm the summary remains decidable within two minutes, and record lane, health class, hotspot availability, recalibration outcome, and any material runtime drift follow-up in specs/211-runtime-trend-recalibration/spec.md and specs/211-runtime-trend-recalibration/quickstart.md
T026 Run cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent for changes in apps/platform/tests/Support/TestLaneManifest.php, apps/platform/tests/Support/TestLaneBudget.php, apps/platform/tests/Support/TestLaneReport.php, and the new or updated guard tests under apps/platform/tests/Feature/Guards/

Dependencies & Execution Order

Phase Dependencies

Setup (Phase 1): No dependencies and can start immediately.
Foundational (Phase 2): Depends on Phase 1 and blocks all user story work.
User Story 1 (Phase 3): Depends on Phase 2 only and is the MVP slice.
User Story 2 (Phase 4): Depends on Phase 2 and benefits from the trend-history infrastructure completed for User Story 1.
User Story 3 (Phase 5): Depends on Phase 2 and should follow User Story 1 because hotspot deltas reuse the same history and assessment outputs.
Polish (Phase 6): Depends on all desired user stories being complete.

User Story Dependencies

User Story 1 (P1): Can begin immediately after Foundational and delivers the first usable runtime-trend surface.
User Story 2 (P1): Requires the same history contract as User Story 1 but remains independently valuable once that contract exists.
User Story 3 (P2): Reuses the bounded history from User Story 1 and the policy limits from Foundational, but does not need User Story 2 to be useful.

Within Each User Story

Story-specific guard tests should be written and fail before implementation.
Manifest and wrapper contract changes should be in place before finalizing report output, schema validation, and comparable-bundle hydration steps.
README and quickstart guidance should land after the corresponding runtime behavior exists.
Lane validation and evidence capture should complete before closing a story.

Parallel Opportunities

T003, T004, and T005 can proceed in parallel once T002 fixes the shared manifest shape.
In User Story 1, T006 and T007 can run in parallel because they cover separate guard surfaces.
In User Story 2, T012 and T013 can run in parallel because policy rules and evidence-record assertions are independent tests.
In User Story 3, T018 and T019 can run in parallel because they touch separate guard suites.
T025 can run in parallel with final formatting once all implementation and guard work is stable.

Parallel Example: User Story 1

# After T002-T005 establish the shared history contract, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneTrendSummaryContractTest.php and update TestLaneArtifactsContractTest.php"
Task: "Add apps/platform/tests/Feature/Guards/TestLaneTrendClassificationTest.php"

Parallel Example: User Story 2

# After User Story 1 exposes comparable history, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationPolicyTest.php"
Task: "Add apps/platform/tests/Feature/Guards/TestLaneRecalibrationEvidenceContractTest.php"

Parallel Example: User Story 3

# After the shared hotspot-ready report shape exists, these can proceed in parallel:
Task: "Add apps/platform/tests/Feature/Guards/TestLaneHotspotTrendContractTest.php"
Task: "Update apps/platform/tests/Feature/Guards/ProfileLaneContractTest.php and apps/platform/tests/Feature/Guards/HeavyGovernanceLaneContractTest.php"

Implementation Strategy

MVP First (User Story 1 Only)

Complete Phase 1: Setup.
Complete Phase 2: Foundational.
Complete Phase 3: User Story 1.
Validate fast-feedback and confidence trend summaries independently before continuing.

Incremental Delivery

Deliver bounded history and lane health summaries first.
Add explicit recalibration policy and evidence records next.
Add hotspot delta visibility for heavy, browser, and support-lane-assisted investigations last.
Finish with focused guard validation, real evidence capture, and formatting.

Parallel Team Strategy

One contributor can extend apps/platform/tests/Support/TestLaneManifest.php and wrapper scripts while another prepares the new guard suites.
After Foundational completes, User Story 1 test work and workflow hydration changes can be split across contributors.
User Story 2 recalibration logic and User Story 3 hotspot logic can proceed separately once the history contract is stable.

Notes

[P] tasks operate on different files or independent guard suites and can run in parallel once dependencies are satisfied.
[US1], [US2], and [US3] map tasks directly to the user stories in spec.md.
This feature changes runtime-governance behavior, so the narrowest relevant lane reruns and evidence capture remain part of the definition of done.
Live Gitea validation remains required because local wrapper tests alone cannot prove cross-run artifact hydration and uploaded bundle behavior.

18 KiB Raw Blame History