## Summary - add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work - fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv - add a guard test covering the artifact staging input contract ## Scope guards - no browser screenshot baselines included - no generated test artifacts included - no runtime application code changes included ## Notes - classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md` - this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295 Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #350
13 KiB
Implementation Plan: Full Suite Failure Classification & CI Lane Baseline
Branch: 295-full-suite-ci-baseline | Date: 2026-05-11 | Spec: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md
Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md
Summary
Spec 295 determines whether the full TenantPilot platform suite is again a reliable CI signal after Specs 293 and 294. The implementation must run the raw full suite when classifiable, fall back to explicit existing lane wrappers when needed, classify every red group in failure-classification.md, validate report/artifact/budget failure classes, and only fix small CI/lane contract defects. Product/runtime failures are split into follow-up ownership instead of repaired here.
Technical Context
Language/Version: PHP 8.4.15, Laravel 12.52.0
Primary Dependencies: Pest 4.3.1, PHPUnit 12.5.4, Laravel Sail 1.52.0, Filament 5.2.1, Livewire 4.1.4
Storage: no application storage changes; spec-local failure-classification.md only
Testing: Pest via Sail-first commands and existing lane wrappers
Validation Lanes: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support, profiling only if classification needs it
Target Platform: local Sail and Gitea-compatible CI wrappers
Project Type: Laravel monolith under apps/platform with repo-root CI helper scripts
Performance Goals: classify the existing suite signal without creating a new permanent lane or widening lane cost
Constraints: no broad suite repair, no legacy /admin/t/..., no TenantPanelProvider restoration, no runtime persistence, no new test family by default
Scale/Scope: complete platform test suite signal plus existing CI lane/report/artifact contracts
UI / Surface Guardrail Plan
- Guardrail scope: no operator-facing surface change
- Native vs custom classification summary: N/A
- Shared-family relevance: CI/test-governance workflow only
- State layers in scope: none
- Audience modes in scope: N/A
- Decision/diagnostic/raw hierarchy plan: N/A for product UI; classification output keeps summary first and raw failure detail in row notes
- Raw/support gating plan: N/A
- One-primary-action / duplicate-truth control: one final readiness decision in
failure-classification.md - Handling modes by drift class or surface: CI/lane contract drift may be fixed; product/runtime drift becomes
follow-up-spec-requiredorproduct-runtime-or-test-regression - Repository-signal treatment: review-mandatory for every failing group; hard-stop if a group remains unclassified
- Special surface test profiles:
browser-smoke,surface-guard,discovery-heavy,global-context-shell - Required tests or manual smoke: existing Pest lane wrappers and raw full-suite command; no in-app Browser smoke unless implementation later changes visible UI, which is out of scope
- Exception path and spread control: any repair outside CI/lane contract correction triggers follow-up-spec classification
- Active feature PR close-out entry:
FullSuiteClassification
Shared Pattern & System Fit
- Cross-cutting feature marker: yes
- Systems touched:
scripts/platform-test-lane,scripts/platform-test-report,scripts/platform-test-artifacts,apps/platform/composer.json,apps/platform/tests/Support/TestLaneManifest.php,apps/platform/tests/Support/TestLaneReport.php,apps/platform/tests/Support/TestLaneBudget.php, CI guard tests underapps/platform/tests/Feature/Guards/ - Shared abstractions reused:
TestLaneManifest,TestLaneReport,TestLaneBudget, existing wrapper scripts and composer scripts - New abstraction introduced? why?: none
- Why the existing abstraction was sufficient or insufficient: existing lane and failure-class contracts are the current source of truth; this spec proves or minimally corrects them instead of adding another layer
- Bounded deviation / spread control: product/runtime failures must be classified and split rather than repaired here
OperationRun UX Impact
- Touches OperationRun start/completion/link UX?: no
- Central contract reused: N/A
- Delegated UX behaviors: N/A
- Surface-owned behavior kept local: N/A
- Queued DB-notification policy: N/A
- Terminal notification path: N/A
- Exception path: none
Provider Boundary & Portability Fit
- Shared provider/platform boundary touched?: no product provider boundary change
- Provider-owned seams: provider/verification test failures may be classified, but runtime repair is out of scope unless it is strictly CI/lane contract drift
- Platform-core seams: CI lane/report/artifact contract only
- Neutral platform terms / contracts preserved:
workspace,managed environment,provider connection,lane,failure group,CI signal - Retained provider-specific semantics and why: none added
- Bounded extraction or follow-up path: follow-up-spec for any real provider/verification runtime debt after Spec
294
Constitution Check
GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.
- Inventory-first: PASS. No inventory or snapshot runtime behavior changes.
- Read/write separation: PASS. No application write/change function is introduced.
- Graph contract path: PASS. No Microsoft Graph calls are introduced or changed.
- Deterministic capabilities: PASS. Capability derivation is not changed.
- RBAC-UX: PASS. Existing RBAC tests may fail and be classified, but authorization behavior is not changed by this spec unless a future follow-up owns it.
- Workspace isolation: PASS. Workspace/managed-environment isolation failures are product/runtime debt, not CI-wrapper debt.
- Tenant isolation: PASS. No tenant-plane route or compatibility behavior is restored.
- Run observability: PASS. No new
OperationRun, queue, scheduled work, or terminal notification policy is introduced. - Test governance (TEST-GOV-001): PASS. The spec explicitly names proving purpose, lane mix, fixture cost boundaries, heavy/browser visibility, budget/trend treatment, and split decisions.
- Proportionality (PROP-001): PASS. The only new structure is one spec-local classification artifact needed for current CI readiness.
- No premature abstraction (ABSTR-001): PASS. No new CI framework or lane abstraction is introduced.
- Persisted truth (PERSIST-001): PASS. No application persistence; spec artifact is not runtime truth.
- Behavioral state (STATE-001): PASS. The classification vocabulary controls implementation workflow only and does not become product state.
- Shared pattern first (XCUT-001): PASS. Existing
TestLaneManifest,TestLaneReport, wrapper scripts, and guard tests remain the shared path. - Provider boundary (PROV-001): PASS. No provider runtime or vocabulary boundary is changed.
- V1 explicitness / few layers (V1-EXP-001, LAYER-001): PASS. Use direct classification and existing helpers.
- Spec discipline / bloat check (SPEC-DISC-001, BLOAT-001): PASS with proportionality review in
spec.md. - Filament-native UI (UI-FIL-001): PASS. No operator-facing Filament UI change.
- Filament v5 / Livewire v4: PASS. Current app info confirms Filament 5.2.1 and Livewire 4.1.4; this spec does not alter that relationship.
- Provider registration: PASS. No panel provider changes; Laravel provider registration remains in
apps/platform/bootstrap/providers.php.
Post-design re-check: PASS while categories, seams, planned commands, and out-of-scope boundaries remain aligned across spec.md, plan.md, research.md, data-model.md, quickstart.md, tasks.md, checklists/requirements.md, and failure-classification.md.
Test Governance Check
- Pinned categories:
ci-signal-restored,ci-wrapper-or-manifest-regression,artifact-publication-regression,budget-or-trend-baseline-drift,product-runtime-or-test-regression,browser-lane-regression,flaky-or-environment,follow-up-spec-required,resolved-or-not-needed - Pinned seams:
raw-full-suite,fast-feedback-lane,confidence-lane,heavy-governance-lane,browser-lane,profiling-or-junit-support,lane-reporting,artifact-publication,budget-trend-baseline,legacy-cutover-regression-guard,provider-verification-regression-guard - Test purpose / classification by changed surface: full-suite classification, CI lane contract verification, and optional CI/lane guard tests only
- Affected validation lanes: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support
- Why this lane mix is the narrowest sufficient proof: raw full suite answers the main readiness question; explicit lane split keeps classification possible when the raw run is too noisy; report/artifact commands validate CI interpretability
- Narrowest proving command(s):
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedbackexport PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidenceexport PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governanceexport PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser- corresponding
./scripts/platform-test-report <lane>commands for report/artifact classification
- Fixture / helper / factory / seed / context cost risks: no new defaults; classify fixture-heavy failures instead of widening setup by default
- Expensive defaults or shared helper growth introduced?: no
- Heavy-family additions, promotions, or visibility changes: none by default
- Surface-class relief / special coverage rule: browser/heavy lane output is classification-only unless active fix scope explicitly owns it
- Closing validation and reviewer handoff: reviewers should confirm no unclassified failing group, no hidden budget relaxation, no new lane family, and no legacy cutover behavior restoration
- Budget / baseline / trend follow-up: classify in
failure-classification.md; only adjust a baseline when the row explains why current evidence supports it - Review-stop questions: lane fit, hidden fixture cost, product repair scope creep, browser scope creep, budget baseline relaxation
- Escalation path:
document-in-featurefor CI/lane contract corrections,follow-up-specfor product/runtime failures - Active feature PR close-out entry:
FullSuiteClassification - Why no dedicated follow-up spec is needed: this spec is itself the bounded classification pass. Follow-up specs are created only for classified product/runtime groups.
Project Structure
Documentation (this feature)
specs/295-full-suite-ci-baseline/
├── checklists/
│ └── requirements.md
├── data-model.md
├── failure-classification.md
├── plan.md
├── quickstart.md
├── research.md
├── spec.md
└── tasks.md
Source Code (repository root)
scripts/
├── platform-test-artifacts
├── platform-test-lane
└── platform-test-report
apps/platform/
├── composer.json
└── tests/
├── Feature/Guards/
└── Support/
Structure Decision: implementation should touch only the documentation artifacts above unless classification proves a small CI/lane contract defect in the listed scripts/support/guard-test surfaces. Runtime application code, migrations, models, Filament resources, routes, views, and provider services are out of scope.
Complexity Tracking
| Violation | Why Needed | Simpler Alternative Rejected Because |
|---|---|---|
| Spec-local failure-classification vocabulary | The full-suite readiness decision needs one bounded way to classify all red groups after Specs 293 and 294 |
Raw terminal notes would not preserve ownership, lane, or follow-up decisions |
Proportionality Review
- Current operator problem: maintainers cannot safely decide whether CI is restored without a classified full-suite baseline.
- Existing structure is insufficient because: targeted green lanes and raw full-suite output answer different questions; neither alone assigns follow-up ownership.
- Narrowest correct implementation: one spec-local classification artifact and existing lane wrappers.
- Ownership cost: temporary classification upkeep during implementation and possibly small lane contract guard adjustments.
- Alternative intentionally rejected: new full-suite CI framework or fix-all suite cleanup.
- Release truth: current-release test governance and CI readiness.
Phase 0: Research Output
See /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/research.md.
Phase 1: Design Output
/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/data-model.md/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/quickstart.md/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/failure-classification.md
Phase 2: Task Planning Output
See /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/tasks.md.