TenantAtlas/specs/295-full-suite-ci-baseline/plan.md
ahmido f03555eae1 Spec 295: full suite CI lane baseline (#350)
## Summary
- add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work
- fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv
- add a guard test covering the artifact staging input contract

## Scope guards
- no browser screenshot baselines included
- no generated test artifacts included
- no runtime application code changes included

## Notes
- classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md`
- this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #350
2026-05-11 11:14:56 +00:00

182 lines
13 KiB
Markdown

# Implementation Plan: Full Suite Failure Classification & CI Lane Baseline
**Branch**: `295-full-suite-ci-baseline` | **Date**: 2026-05-11 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md`
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md`
## Summary
Spec `295` determines whether the full TenantPilot platform suite is again a reliable CI signal after Specs `293` and `294`. The implementation must run the raw full suite when classifiable, fall back to explicit existing lane wrappers when needed, classify every red group in `failure-classification.md`, validate report/artifact/budget failure classes, and only fix small CI/lane contract defects. Product/runtime failures are split into follow-up ownership instead of repaired here.
## Technical Context
**Language/Version**: PHP 8.4.15, Laravel 12.52.0
**Primary Dependencies**: Pest 4.3.1, PHPUnit 12.5.4, Laravel Sail 1.52.0, Filament 5.2.1, Livewire 4.1.4
**Storage**: no application storage changes; spec-local `failure-classification.md` only
**Testing**: Pest via Sail-first commands and existing lane wrappers
**Validation Lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support, profiling only if classification needs it
**Target Platform**: local Sail and Gitea-compatible CI wrappers
**Project Type**: Laravel monolith under `apps/platform` with repo-root CI helper scripts
**Performance Goals**: classify the existing suite signal without creating a new permanent lane or widening lane cost
**Constraints**: no broad suite repair, no legacy `/admin/t/...`, no TenantPanelProvider restoration, no runtime persistence, no new test family by default
**Scale/Scope**: complete platform test suite signal plus existing CI lane/report/artifact contracts
## UI / Surface Guardrail Plan
- **Guardrail scope**: no operator-facing surface change
- **Native vs custom classification summary**: N/A
- **Shared-family relevance**: CI/test-governance workflow only
- **State layers in scope**: none
- **Audience modes in scope**: N/A
- **Decision/diagnostic/raw hierarchy plan**: N/A for product UI; classification output keeps summary first and raw failure detail in row notes
- **Raw/support gating plan**: N/A
- **One-primary-action / duplicate-truth control**: one final readiness decision in `failure-classification.md`
- **Handling modes by drift class or surface**: CI/lane contract drift may be fixed; product/runtime drift becomes `follow-up-spec-required` or `product-runtime-or-test-regression`
- **Repository-signal treatment**: review-mandatory for every failing group; hard-stop if a group remains unclassified
- **Special surface test profiles**: `browser-smoke`, `surface-guard`, `discovery-heavy`, `global-context-shell`
- **Required tests or manual smoke**: existing Pest lane wrappers and raw full-suite command; no in-app Browser smoke unless implementation later changes visible UI, which is out of scope
- **Exception path and spread control**: any repair outside CI/lane contract correction triggers follow-up-spec classification
- **Active feature PR close-out entry**: `FullSuiteClassification`
## Shared Pattern & System Fit
- **Cross-cutting feature marker**: yes
- **Systems touched**: `scripts/platform-test-lane`, `scripts/platform-test-report`, `scripts/platform-test-artifacts`, `apps/platform/composer.json`, `apps/platform/tests/Support/TestLaneManifest.php`, `apps/platform/tests/Support/TestLaneReport.php`, `apps/platform/tests/Support/TestLaneBudget.php`, CI guard tests under `apps/platform/tests/Feature/Guards/`
- **Shared abstractions reused**: `TestLaneManifest`, `TestLaneReport`, `TestLaneBudget`, existing wrapper scripts and composer scripts
- **New abstraction introduced? why?**: none
- **Why the existing abstraction was sufficient or insufficient**: existing lane and failure-class contracts are the current source of truth; this spec proves or minimally corrects them instead of adding another layer
- **Bounded deviation / spread control**: product/runtime failures must be classified and split rather than repaired here
## OperationRun UX Impact
- **Touches OperationRun start/completion/link UX?**: no
- **Central contract reused**: N/A
- **Delegated UX behaviors**: N/A
- **Surface-owned behavior kept local**: N/A
- **Queued DB-notification policy**: N/A
- **Terminal notification path**: N/A
- **Exception path**: none
## Provider Boundary & Portability Fit
- **Shared provider/platform boundary touched?**: no product provider boundary change
- **Provider-owned seams**: provider/verification test failures may be classified, but runtime repair is out of scope unless it is strictly CI/lane contract drift
- **Platform-core seams**: CI lane/report/artifact contract only
- **Neutral platform terms / contracts preserved**: `workspace`, `managed environment`, `provider connection`, `lane`, `failure group`, `CI signal`
- **Retained provider-specific semantics and why**: none added
- **Bounded extraction or follow-up path**: follow-up-spec for any real provider/verification runtime debt after Spec `294`
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
- Inventory-first: PASS. No inventory or snapshot runtime behavior changes.
- Read/write separation: PASS. No application write/change function is introduced.
- Graph contract path: PASS. No Microsoft Graph calls are introduced or changed.
- Deterministic capabilities: PASS. Capability derivation is not changed.
- RBAC-UX: PASS. Existing RBAC tests may fail and be classified, but authorization behavior is not changed by this spec unless a future follow-up owns it.
- Workspace isolation: PASS. Workspace/managed-environment isolation failures are product/runtime debt, not CI-wrapper debt.
- Tenant isolation: PASS. No tenant-plane route or compatibility behavior is restored.
- Run observability: PASS. No new `OperationRun`, queue, scheduled work, or terminal notification policy is introduced.
- Test governance (TEST-GOV-001): PASS. The spec explicitly names proving purpose, lane mix, fixture cost boundaries, heavy/browser visibility, budget/trend treatment, and split decisions.
- Proportionality (PROP-001): PASS. The only new structure is one spec-local classification artifact needed for current CI readiness.
- No premature abstraction (ABSTR-001): PASS. No new CI framework or lane abstraction is introduced.
- Persisted truth (PERSIST-001): PASS. No application persistence; spec artifact is not runtime truth.
- Behavioral state (STATE-001): PASS. The classification vocabulary controls implementation workflow only and does not become product state.
- Shared pattern first (XCUT-001): PASS. Existing `TestLaneManifest`, `TestLaneReport`, wrapper scripts, and guard tests remain the shared path.
- Provider boundary (PROV-001): PASS. No provider runtime or vocabulary boundary is changed.
- V1 explicitness / few layers (V1-EXP-001, LAYER-001): PASS. Use direct classification and existing helpers.
- Spec discipline / bloat check (SPEC-DISC-001, BLOAT-001): PASS with proportionality review in `spec.md`.
- Filament-native UI (UI-FIL-001): PASS. No operator-facing Filament UI change.
- Filament v5 / Livewire v4: PASS. Current app info confirms Filament 5.2.1 and Livewire 4.1.4; this spec does not alter that relationship.
- Provider registration: PASS. No panel provider changes; Laravel provider registration remains in `apps/platform/bootstrap/providers.php`.
**Post-design re-check**: PASS while categories, seams, planned commands, and out-of-scope boundaries remain aligned across `spec.md`, `plan.md`, `research.md`, `data-model.md`, `quickstart.md`, `tasks.md`, `checklists/requirements.md`, and `failure-classification.md`.
## Test Governance Check
- **Pinned categories**: `ci-signal-restored`, `ci-wrapper-or-manifest-regression`, `artifact-publication-regression`, `budget-or-trend-baseline-drift`, `product-runtime-or-test-regression`, `browser-lane-regression`, `flaky-or-environment`, `follow-up-spec-required`, `resolved-or-not-needed`
- **Pinned seams**: `raw-full-suite`, `fast-feedback-lane`, `confidence-lane`, `heavy-governance-lane`, `browser-lane`, `profiling-or-junit-support`, `lane-reporting`, `artifact-publication`, `budget-trend-baseline`, `legacy-cutover-regression-guard`, `provider-verification-regression-guard`
- **Test purpose / classification by changed surface**: full-suite classification, CI lane contract verification, and optional CI/lane guard tests only
- **Affected validation lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support
- **Why this lane mix is the narrowest sufficient proof**: raw full suite answers the main readiness question; explicit lane split keeps classification possible when the raw run is too noisy; report/artifact commands validate CI interpretability
- **Narrowest proving command(s)**:
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser`
- corresponding `./scripts/platform-test-report <lane>` commands for report/artifact classification
- **Fixture / helper / factory / seed / context cost risks**: no new defaults; classify fixture-heavy failures instead of widening setup by default
- **Expensive defaults or shared helper growth introduced?**: no
- **Heavy-family additions, promotions, or visibility changes**: none by default
- **Surface-class relief / special coverage rule**: browser/heavy lane output is classification-only unless active fix scope explicitly owns it
- **Closing validation and reviewer handoff**: reviewers should confirm no unclassified failing group, no hidden budget relaxation, no new lane family, and no legacy cutover behavior restoration
- **Budget / baseline / trend follow-up**: classify in `failure-classification.md`; only adjust a baseline when the row explains why current evidence supports it
- **Review-stop questions**: lane fit, hidden fixture cost, product repair scope creep, browser scope creep, budget baseline relaxation
- **Escalation path**: `document-in-feature` for CI/lane contract corrections, `follow-up-spec` for product/runtime failures
- **Active feature PR close-out entry**: `FullSuiteClassification`
- **Why no dedicated follow-up spec is needed**: this spec is itself the bounded classification pass. Follow-up specs are created only for classified product/runtime groups.
## Project Structure
### Documentation (this feature)
```text
specs/295-full-suite-ci-baseline/
├── checklists/
│ └── requirements.md
├── data-model.md
├── failure-classification.md
├── plan.md
├── quickstart.md
├── research.md
├── spec.md
└── tasks.md
```
### Source Code (repository root)
```text
scripts/
├── platform-test-artifacts
├── platform-test-lane
└── platform-test-report
apps/platform/
├── composer.json
└── tests/
├── Feature/Guards/
└── Support/
```
**Structure Decision**: implementation should touch only the documentation artifacts above unless classification proves a small CI/lane contract defect in the listed scripts/support/guard-test surfaces. Runtime application code, migrations, models, Filament resources, routes, views, and provider services are out of scope.
## Complexity Tracking
| Violation | Why Needed | Simpler Alternative Rejected Because |
|---|---|---|
| Spec-local failure-classification vocabulary | The full-suite readiness decision needs one bounded way to classify all red groups after Specs `293` and `294` | Raw terminal notes would not preserve ownership, lane, or follow-up decisions |
## Proportionality Review
- **Current operator problem**: maintainers cannot safely decide whether CI is restored without a classified full-suite baseline.
- **Existing structure is insufficient because**: targeted green lanes and raw full-suite output answer different questions; neither alone assigns follow-up ownership.
- **Narrowest correct implementation**: one spec-local classification artifact and existing lane wrappers.
- **Ownership cost**: temporary classification upkeep during implementation and possibly small lane contract guard adjustments.
- **Alternative intentionally rejected**: new full-suite CI framework or fix-all suite cleanup.
- **Release truth**: current-release test governance and CI readiness.
## Phase 0: Research Output
See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/research.md`.
## Phase 1: Design Output
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/data-model.md`
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/quickstart.md`
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/failure-classification.md`
## Phase 2: Task Planning Output
See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/tasks.md`.