## Summary - add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work - fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv - add a guard test covering the artifact staging input contract ## Scope guards - no browser screenshot baselines included - no generated test artifacts included - no runtime application code changes included ## Notes - classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md` - this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295 Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #350
182 lines
13 KiB
Markdown
182 lines
13 KiB
Markdown
# Implementation Plan: Full Suite Failure Classification & CI Lane Baseline
|
|
|
|
**Branch**: `295-full-suite-ci-baseline` | **Date**: 2026-05-11 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md`
|
|
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md`
|
|
|
|
## Summary
|
|
|
|
Spec `295` determines whether the full TenantPilot platform suite is again a reliable CI signal after Specs `293` and `294`. The implementation must run the raw full suite when classifiable, fall back to explicit existing lane wrappers when needed, classify every red group in `failure-classification.md`, validate report/artifact/budget failure classes, and only fix small CI/lane contract defects. Product/runtime failures are split into follow-up ownership instead of repaired here.
|
|
|
|
## Technical Context
|
|
|
|
**Language/Version**: PHP 8.4.15, Laravel 12.52.0
|
|
**Primary Dependencies**: Pest 4.3.1, PHPUnit 12.5.4, Laravel Sail 1.52.0, Filament 5.2.1, Livewire 4.1.4
|
|
**Storage**: no application storage changes; spec-local `failure-classification.md` only
|
|
**Testing**: Pest via Sail-first commands and existing lane wrappers
|
|
**Validation Lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support, profiling only if classification needs it
|
|
**Target Platform**: local Sail and Gitea-compatible CI wrappers
|
|
**Project Type**: Laravel monolith under `apps/platform` with repo-root CI helper scripts
|
|
**Performance Goals**: classify the existing suite signal without creating a new permanent lane or widening lane cost
|
|
**Constraints**: no broad suite repair, no legacy `/admin/t/...`, no TenantPanelProvider restoration, no runtime persistence, no new test family by default
|
|
**Scale/Scope**: complete platform test suite signal plus existing CI lane/report/artifact contracts
|
|
|
|
## UI / Surface Guardrail Plan
|
|
|
|
- **Guardrail scope**: no operator-facing surface change
|
|
- **Native vs custom classification summary**: N/A
|
|
- **Shared-family relevance**: CI/test-governance workflow only
|
|
- **State layers in scope**: none
|
|
- **Audience modes in scope**: N/A
|
|
- **Decision/diagnostic/raw hierarchy plan**: N/A for product UI; classification output keeps summary first and raw failure detail in row notes
|
|
- **Raw/support gating plan**: N/A
|
|
- **One-primary-action / duplicate-truth control**: one final readiness decision in `failure-classification.md`
|
|
- **Handling modes by drift class or surface**: CI/lane contract drift may be fixed; product/runtime drift becomes `follow-up-spec-required` or `product-runtime-or-test-regression`
|
|
- **Repository-signal treatment**: review-mandatory for every failing group; hard-stop if a group remains unclassified
|
|
- **Special surface test profiles**: `browser-smoke`, `surface-guard`, `discovery-heavy`, `global-context-shell`
|
|
- **Required tests or manual smoke**: existing Pest lane wrappers and raw full-suite command; no in-app Browser smoke unless implementation later changes visible UI, which is out of scope
|
|
- **Exception path and spread control**: any repair outside CI/lane contract correction triggers follow-up-spec classification
|
|
- **Active feature PR close-out entry**: `FullSuiteClassification`
|
|
|
|
## Shared Pattern & System Fit
|
|
|
|
- **Cross-cutting feature marker**: yes
|
|
- **Systems touched**: `scripts/platform-test-lane`, `scripts/platform-test-report`, `scripts/platform-test-artifacts`, `apps/platform/composer.json`, `apps/platform/tests/Support/TestLaneManifest.php`, `apps/platform/tests/Support/TestLaneReport.php`, `apps/platform/tests/Support/TestLaneBudget.php`, CI guard tests under `apps/platform/tests/Feature/Guards/`
|
|
- **Shared abstractions reused**: `TestLaneManifest`, `TestLaneReport`, `TestLaneBudget`, existing wrapper scripts and composer scripts
|
|
- **New abstraction introduced? why?**: none
|
|
- **Why the existing abstraction was sufficient or insufficient**: existing lane and failure-class contracts are the current source of truth; this spec proves or minimally corrects them instead of adding another layer
|
|
- **Bounded deviation / spread control**: product/runtime failures must be classified and split rather than repaired here
|
|
|
|
## OperationRun UX Impact
|
|
|
|
- **Touches OperationRun start/completion/link UX?**: no
|
|
- **Central contract reused**: N/A
|
|
- **Delegated UX behaviors**: N/A
|
|
- **Surface-owned behavior kept local**: N/A
|
|
- **Queued DB-notification policy**: N/A
|
|
- **Terminal notification path**: N/A
|
|
- **Exception path**: none
|
|
|
|
## Provider Boundary & Portability Fit
|
|
|
|
- **Shared provider/platform boundary touched?**: no product provider boundary change
|
|
- **Provider-owned seams**: provider/verification test failures may be classified, but runtime repair is out of scope unless it is strictly CI/lane contract drift
|
|
- **Platform-core seams**: CI lane/report/artifact contract only
|
|
- **Neutral platform terms / contracts preserved**: `workspace`, `managed environment`, `provider connection`, `lane`, `failure group`, `CI signal`
|
|
- **Retained provider-specific semantics and why**: none added
|
|
- **Bounded extraction or follow-up path**: follow-up-spec for any real provider/verification runtime debt after Spec `294`
|
|
|
|
## Constitution Check
|
|
|
|
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
|
|
|
|
- Inventory-first: PASS. No inventory or snapshot runtime behavior changes.
|
|
- Read/write separation: PASS. No application write/change function is introduced.
|
|
- Graph contract path: PASS. No Microsoft Graph calls are introduced or changed.
|
|
- Deterministic capabilities: PASS. Capability derivation is not changed.
|
|
- RBAC-UX: PASS. Existing RBAC tests may fail and be classified, but authorization behavior is not changed by this spec unless a future follow-up owns it.
|
|
- Workspace isolation: PASS. Workspace/managed-environment isolation failures are product/runtime debt, not CI-wrapper debt.
|
|
- Tenant isolation: PASS. No tenant-plane route or compatibility behavior is restored.
|
|
- Run observability: PASS. No new `OperationRun`, queue, scheduled work, or terminal notification policy is introduced.
|
|
- Test governance (TEST-GOV-001): PASS. The spec explicitly names proving purpose, lane mix, fixture cost boundaries, heavy/browser visibility, budget/trend treatment, and split decisions.
|
|
- Proportionality (PROP-001): PASS. The only new structure is one spec-local classification artifact needed for current CI readiness.
|
|
- No premature abstraction (ABSTR-001): PASS. No new CI framework or lane abstraction is introduced.
|
|
- Persisted truth (PERSIST-001): PASS. No application persistence; spec artifact is not runtime truth.
|
|
- Behavioral state (STATE-001): PASS. The classification vocabulary controls implementation workflow only and does not become product state.
|
|
- Shared pattern first (XCUT-001): PASS. Existing `TestLaneManifest`, `TestLaneReport`, wrapper scripts, and guard tests remain the shared path.
|
|
- Provider boundary (PROV-001): PASS. No provider runtime or vocabulary boundary is changed.
|
|
- V1 explicitness / few layers (V1-EXP-001, LAYER-001): PASS. Use direct classification and existing helpers.
|
|
- Spec discipline / bloat check (SPEC-DISC-001, BLOAT-001): PASS with proportionality review in `spec.md`.
|
|
- Filament-native UI (UI-FIL-001): PASS. No operator-facing Filament UI change.
|
|
- Filament v5 / Livewire v4: PASS. Current app info confirms Filament 5.2.1 and Livewire 4.1.4; this spec does not alter that relationship.
|
|
- Provider registration: PASS. No panel provider changes; Laravel provider registration remains in `apps/platform/bootstrap/providers.php`.
|
|
|
|
**Post-design re-check**: PASS while categories, seams, planned commands, and out-of-scope boundaries remain aligned across `spec.md`, `plan.md`, `research.md`, `data-model.md`, `quickstart.md`, `tasks.md`, `checklists/requirements.md`, and `failure-classification.md`.
|
|
|
|
## Test Governance Check
|
|
|
|
- **Pinned categories**: `ci-signal-restored`, `ci-wrapper-or-manifest-regression`, `artifact-publication-regression`, `budget-or-trend-baseline-drift`, `product-runtime-or-test-regression`, `browser-lane-regression`, `flaky-or-environment`, `follow-up-spec-required`, `resolved-or-not-needed`
|
|
- **Pinned seams**: `raw-full-suite`, `fast-feedback-lane`, `confidence-lane`, `heavy-governance-lane`, `browser-lane`, `profiling-or-junit-support`, `lane-reporting`, `artifact-publication`, `budget-trend-baseline`, `legacy-cutover-regression-guard`, `provider-verification-regression-guard`
|
|
- **Test purpose / classification by changed surface**: full-suite classification, CI lane contract verification, and optional CI/lane guard tests only
|
|
- **Affected validation lanes**: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support
|
|
- **Why this lane mix is the narrowest sufficient proof**: raw full suite answers the main readiness question; explicit lane split keeps classification possible when the raw run is too noisy; report/artifact commands validate CI interpretability
|
|
- **Narrowest proving command(s)**:
|
|
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)`
|
|
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback`
|
|
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence`
|
|
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance`
|
|
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser`
|
|
- corresponding `./scripts/platform-test-report <lane>` commands for report/artifact classification
|
|
- **Fixture / helper / factory / seed / context cost risks**: no new defaults; classify fixture-heavy failures instead of widening setup by default
|
|
- **Expensive defaults or shared helper growth introduced?**: no
|
|
- **Heavy-family additions, promotions, or visibility changes**: none by default
|
|
- **Surface-class relief / special coverage rule**: browser/heavy lane output is classification-only unless active fix scope explicitly owns it
|
|
- **Closing validation and reviewer handoff**: reviewers should confirm no unclassified failing group, no hidden budget relaxation, no new lane family, and no legacy cutover behavior restoration
|
|
- **Budget / baseline / trend follow-up**: classify in `failure-classification.md`; only adjust a baseline when the row explains why current evidence supports it
|
|
- **Review-stop questions**: lane fit, hidden fixture cost, product repair scope creep, browser scope creep, budget baseline relaxation
|
|
- **Escalation path**: `document-in-feature` for CI/lane contract corrections, `follow-up-spec` for product/runtime failures
|
|
- **Active feature PR close-out entry**: `FullSuiteClassification`
|
|
- **Why no dedicated follow-up spec is needed**: this spec is itself the bounded classification pass. Follow-up specs are created only for classified product/runtime groups.
|
|
|
|
## Project Structure
|
|
|
|
### Documentation (this feature)
|
|
|
|
```text
|
|
specs/295-full-suite-ci-baseline/
|
|
├── checklists/
|
|
│ └── requirements.md
|
|
├── data-model.md
|
|
├── failure-classification.md
|
|
├── plan.md
|
|
├── quickstart.md
|
|
├── research.md
|
|
├── spec.md
|
|
└── tasks.md
|
|
```
|
|
|
|
### Source Code (repository root)
|
|
|
|
```text
|
|
scripts/
|
|
├── platform-test-artifacts
|
|
├── platform-test-lane
|
|
└── platform-test-report
|
|
|
|
apps/platform/
|
|
├── composer.json
|
|
└── tests/
|
|
├── Feature/Guards/
|
|
└── Support/
|
|
```
|
|
|
|
**Structure Decision**: implementation should touch only the documentation artifacts above unless classification proves a small CI/lane contract defect in the listed scripts/support/guard-test surfaces. Runtime application code, migrations, models, Filament resources, routes, views, and provider services are out of scope.
|
|
|
|
## Complexity Tracking
|
|
|
|
| Violation | Why Needed | Simpler Alternative Rejected Because |
|
|
|---|---|---|
|
|
| Spec-local failure-classification vocabulary | The full-suite readiness decision needs one bounded way to classify all red groups after Specs `293` and `294` | Raw terminal notes would not preserve ownership, lane, or follow-up decisions |
|
|
|
|
## Proportionality Review
|
|
|
|
- **Current operator problem**: maintainers cannot safely decide whether CI is restored without a classified full-suite baseline.
|
|
- **Existing structure is insufficient because**: targeted green lanes and raw full-suite output answer different questions; neither alone assigns follow-up ownership.
|
|
- **Narrowest correct implementation**: one spec-local classification artifact and existing lane wrappers.
|
|
- **Ownership cost**: temporary classification upkeep during implementation and possibly small lane contract guard adjustments.
|
|
- **Alternative intentionally rejected**: new full-suite CI framework or fix-all suite cleanup.
|
|
- **Release truth**: current-release test governance and CI readiness.
|
|
|
|
## Phase 0: Research Output
|
|
|
|
See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/research.md`.
|
|
|
|
## Phase 1: Design Output
|
|
|
|
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/data-model.md`
|
|
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/quickstart.md`
|
|
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/failure-classification.md`
|
|
|
|
## Phase 2: Task Planning Output
|
|
|
|
See `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/tasks.md`.
|