# Implementation Plan: Full Suite Failure Classification & CI Lane Baseline **Branch**: `295-full-suite-ci-baseline` | **Date**: 2026-05-11 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md` **Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md` ## Summary Spec `295` determines whether the full TenantPilot platform suite is again a reliable CI signal after Specs `293` and `294`. The implementation must run the raw full suite when classifiable, fall back to explicit existing lane wrappers when needed, classify every red group in `failure-classification.md`, validate report/artifact/budget failure classes, and only fix small CI/lane contract defects. Product/runtime failures are split into follow-up ownership instead of repaired here. ## Technical Context **Language/Version**: PHP 8.4.15, Laravel 12.52.0 **Primary Dependencies**: Pest 4.3.1, PHPUnit 12.5.4, Laravel Sail 1.52.0, Filament 5.2.1, Livewire 4.1.4 **Storage**: no application storage changes; spec-local `failure-classification.md` only **Testing**: Pest via Sail-first commands and existing lane wrappers **Validation Lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support, profiling only if classification needs it **Target Platform**: local Sail and Gitea-compatible CI wrappers **Project Type**: Laravel monolith under `apps/platform` with repo-root CI helper scripts **Performance Goals**: classify the existing suite signal without creating a new permanent lane or widening lane cost **Constraints**: no broad suite repair, no legacy `/admin/t/...`, no TenantPanelProvider restoration, no runtime persistence, no new test family by default **Scale/Scope**: complete platform test suite signal plus existing CI lane/report/artifact contracts ## UI / Surface Guardrail Plan - **Guardrail scope**: no operator-facing surface change - **Native vs custom classification summary**: N/A - **Shared-family relevance**: CI/test-governance workflow only - **State layers in scope**: none - **Audience modes in scope**: N/A - **Decision/diagnostic/raw hierarchy plan**: N/A for product UI; classification output keeps summary first and raw failure detail in row notes - **Raw/support gating plan**: N/A - **One-primary-action / duplicate-truth control**: one final readiness decision in `failure-classification.md` - **Handling modes by drift class or surface**: CI/lane contract drift may be fixed; product/runtime drift becomes `follow-up-spec-required` or `product-runtime-or-test-regression` - **Repository-signal treatment**: review-mandatory for every failing group; hard-stop if a group remains unclassified - **Special surface test profiles**: `browser-smoke`, `surface-guard`, `discovery-heavy`, `global-context-shell` - **Required tests or manual smoke**: existing Pest lane wrappers and raw full-suite command; no in-app Browser smoke unless implementation later changes visible UI, which is out of scope - **Exception path and spread control**: any repair outside CI/lane contract correction triggers follow-up-spec classification - **Active feature PR close-out entry**: `FullSuiteClassification` ## Shared Pattern & System Fit - **Cross-cutting feature marker**: yes - **Systems touched**: `scripts/platform-test-lane`, `scripts/platform-test-report`, `scripts/platform-test-artifacts`, `apps/platform/composer.json`, `apps/platform/tests/Support/TestLaneManifest.php`, `apps/platform/tests/Support/TestLaneReport.php`, `apps/platform/tests/Support/TestLaneBudget.php`, CI guard tests under `apps/platform/tests/Feature/Guards/` - **Shared abstractions reused**: `TestLaneManifest`, `TestLaneReport`, `TestLaneBudget`, existing wrapper scripts and composer scripts - **New abstraction introduced? why?**: none - **Why the existing abstraction was sufficient or insufficient**: existing lane and failure-class contracts are the current source of truth; this spec proves or minimally corrects them instead of adding another layer - **Bounded deviation / spread control**: product/runtime failures must be classified and split rather than repaired here ## OperationRun UX Impact - **Touches OperationRun start/completion/link UX?**: no - **Central contract reused**: N/A - **Delegated UX behaviors**: N/A - **Surface-owned behavior kept local**: N/A - **Queued DB-notification policy**: N/A - **Terminal notification path**: N/A - **Exception path**: none ## Provider Boundary & Portability Fit - **Shared provider/platform boundary touched?**: no product provider boundary change - **Provider-owned seams**: provider/verification test failures may be classified, but runtime repair is out of scope unless it is strictly CI/lane contract drift - **Platform-core seams**: CI lane/report/artifact contract only - **Neutral platform terms / contracts preserved**: `workspace`, `managed environment`, `provider connection`, `lane`, `failure group`, `CI signal` - **Retained provider-specific semantics and why**: none added - **Bounded extraction or follow-up path**: follow-up-spec for any real provider/verification runtime debt after Spec `294` ## Constitution Check *GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.* - Inventory-first: PASS. No inventory or snapshot runtime behavior changes. - Read/write separation: PASS. No application write/change function is introduced. - Graph contract path: PASS. No Microsoft Graph calls are introduced or changed. - Deterministic capabilities: PASS. Capability derivation is not changed. - RBAC-UX: PASS. Existing RBAC tests may fail and be classified, but authorization behavior is not changed by this spec unless a future follow-up owns it. - Workspace isolation: PASS. Workspace/managed-environment isolation failures are product/runtime debt, not CI-wrapper debt. - Tenant isolation: PASS. No tenant-plane route or compatibility behavior is restored. - Run observability: PASS. No new `OperationRun`, queue, scheduled work, or terminal notification policy is introduced. - Test governance (TEST-GOV-001): PASS. The spec explicitly names proving purpose, lane mix, fixture cost boundaries, heavy/browser visibility, budget/trend treatment, and split decisions. - Proportionality (PROP-001): PASS. The only new structure is one spec-local classification artifact needed for current CI readiness. - No premature abstraction (ABSTR-001): PASS. No new CI framework or lane abstraction is introduced. - Persisted truth (PERSIST-001): PASS. No application persistence; spec artifact is not runtime truth. - Behavioral state (STATE-001): PASS. The classification vocabulary controls implementation workflow only and does not become product state. - Shared pattern first (XCUT-001): PASS. Existing `TestLaneManifest`, `TestLaneReport`, wrapper scripts, and guard tests remain the shared path. - Provider boundary (PROV-001): PASS. No provider runtime or vocabulary boundary is changed. - V1 explicitness / few layers (V1-EXP-001, LAYER-001): PASS. Use direct classification and existing helpers. - Spec discipline / bloat check (SPEC-DISC-001, BLOAT-001): PASS with proportionality review in `spec.md`. - Filament-native UI (UI-FIL-001): PASS. No operator-facing Filament UI change. - Filament v5 / Livewire v4: PASS. Current app info confirms Filament 5.2.1 and Livewire 4.1.4; this spec does not alter that relationship. - Provider registration: PASS. No panel provider changes; Laravel provider registration remains in `apps/platform/bootstrap/providers.php`. **Post-design re-check**: PASS while categories, seams, planned commands, and out-of-scope boundaries remain aligned across `spec.md`, `plan.md`, `research.md`, `data-model.md`, `quickstart.md`, `tasks.md`, `checklists/requirements.md`, and `failure-classification.md`. ## Test Governance Check - **Pinned categories**: `ci-signal-restored`, `ci-wrapper-or-manifest-regression`, `artifact-publication-regression`, `budget-or-trend-baseline-drift`, `product-runtime-or-test-regression`, `browser-lane-regression`, `flaky-or-environment`, `follow-up-spec-required`, `resolved-or-not-needed` - **Pinned seams**: `raw-full-suite`, `fast-feedback-lane`, `confidence-lane`, `heavy-governance-lane`, `browser-lane`, `profiling-or-junit-support`, `lane-reporting`, `artifact-publication`, `budget-trend-baseline`, `legacy-cutover-regression-guard`, `provider-verification-regression-guard` - **Test purpose / classification by changed surface**: full-suite classification, CI lane contract verification, and optional CI/lane guard tests only - **Affected validation lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support - **Why this lane mix is the narrowest sufficient proof**: raw full suite answers the main readiness question; explicit lane split keeps classification possible when the raw run is too noisy; report/artifact commands validate CI interpretability - **Narrowest proving command(s)**: - `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)` - `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback` - `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence` - `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance` - `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser` - corresponding `./scripts/platform-test-report ` commands for report/artifact classification - **Fixture / helper / factory / seed / context cost risks**: no new defaults; classify fixture-heavy failures instead of widening setup by default - **Expensive defaults or shared helper growth introduced?**: no - **Heavy-family additions, promotions, or visibility changes**: none by default - **Surface-class relief / special coverage rule**: browser/heavy lane output is classification-only unless active fix scope explicitly owns it - **Closing validation and reviewer handoff**: reviewers should confirm no unclassified failing group, no hidden budget relaxation, no new lane family, and no legacy cutover behavior restoration - **Budget / baseline / trend follow-up**: classify in `failure-classification.md`; only adjust a baseline when the row explains why current evidence supports it - **Review-stop questions**: lane fit, hidden fixture cost, product repair scope creep, browser scope creep, budget baseline relaxation - **Escalation path**: `document-in-feature` for CI/lane contract corrections, `follow-up-spec` for product/runtime failures - **Active feature PR close-out entry**: `FullSuiteClassification` - **Why no dedicated follow-up spec is needed**: this spec is itself the bounded classification pass. Follow-up specs are created only for classified product/runtime groups. ## Project Structure ### Documentation (this feature) ```text specs/295-full-suite-ci-baseline/ ├── checklists/ │ └── requirements.md ├── data-model.md ├── failure-classification.md ├── plan.md ├── quickstart.md ├── research.md ├── spec.md └── tasks.md ``` ### Source Code (repository root) ```text scripts/ ├── platform-test-artifacts ├── platform-test-lane └── platform-test-report apps/platform/ ├── composer.json └── tests/ ├── Feature/Guards/ └── Support/ ``` **Structure Decision**: implementation should touch only the documentation artifacts above unless classification proves a small CI/lane contract defect in the listed scripts/support/guard-test surfaces. Runtime application code, migrations, models, Filament resources, routes, views, and provider services are out of scope. ## Complexity Tracking | Violation | Why Needed | Simpler Alternative Rejected Because | |---|---|---| | Spec-local failure-classification vocabulary | The full-suite readiness decision needs one bounded way to classify all red groups after Specs `293` and `294` | Raw terminal notes would not preserve ownership, lane, or follow-up decisions | ## Proportionality Review - **Current operator problem**: maintainers cannot safely decide whether CI is restored without a classified full-suite baseline. - **Existing structure is insufficient because**: targeted green lanes and raw full-suite output answer different questions; neither alone assigns follow-up ownership. - **Narrowest correct implementation**: one spec-local classification artifact and existing lane wrappers. - **Ownership cost**: temporary classification upkeep during implementation and possibly small lane contract guard adjustments. - **Alternative intentionally rejected**: new full-suite CI framework or fix-all suite cleanup. - **Release truth**: current-release test governance and CI readiness. ## Phase 0: Research Output See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/research.md`. ## Phase 1: Design Output - `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/data-model.md` - `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/quickstart.md` - `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/failure-classification.md` ## Phase 2: Task Planning Output See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/tasks.md`.