ahmido f03555eae1 Spec 295: full suite CI lane baseline (#350 )

## Summary
- add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work
- fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv
- add a guard test covering the artifact staging input contract

## Scope guards
- no browser screenshot baselines included
- no generated test artifacts included
- no runtime application code changes included

## Notes
- classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md`
- this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #350

2026-05-11 11:14:56 +00:00

13 KiB

Raw Permalink Blame History

Implementation Plan: Full Suite Failure Classification & CI Lane Baseline

Branch: 295-full-suite-ci-baseline | Date: 2026-05-11 | Spec: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/spec.md

Summary

Spec 295 determines whether the full TenantPilot platform suite is again a reliable CI signal after Specs 293 and 294. The implementation must run the raw full suite when classifiable, fall back to explicit existing lane wrappers when needed, classify every red group in failure-classification.md, validate report/artifact/budget failure classes, and only fix small CI/lane contract defects. Product/runtime failures are split into follow-up ownership instead of repaired here.

Technical Context

Language/Version: PHP 8.4.15, Laravel 12.52.0
Primary Dependencies: Pest 4.3.1, PHPUnit 12.5.4, Laravel Sail 1.52.0, Filament 5.2.1, Livewire 4.1.4
Storage: no application storage changes; spec-local failure-classification.md only
Testing: Pest via Sail-first commands and existing lane wrappers
Validation Lanes: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support, profiling only if classification needs it
Target Platform: local Sail and Gitea-compatible CI wrappers
Project Type: Laravel monolith under apps/platform with repo-root CI helper scripts
Performance Goals: classify the existing suite signal without creating a new permanent lane or widening lane cost
Constraints: no broad suite repair, no legacy /admin/t/..., no TenantPanelProvider restoration, no runtime persistence, no new test family by default
Scale/Scope: complete platform test suite signal plus existing CI lane/report/artifact contracts

UI / Surface Guardrail Plan

Guardrail scope: no operator-facing surface change
Native vs custom classification summary: N/A
Shared-family relevance: CI/test-governance workflow only
State layers in scope: none
Audience modes in scope: N/A
Decision/diagnostic/raw hierarchy plan: N/A for product UI; classification output keeps summary first and raw failure detail in row notes
Raw/support gating plan: N/A
One-primary-action / duplicate-truth control: one final readiness decision in failure-classification.md
Handling modes by drift class or surface: CI/lane contract drift may be fixed; product/runtime drift becomes follow-up-spec-required or product-runtime-or-test-regression
Repository-signal treatment: review-mandatory for every failing group; hard-stop if a group remains unclassified
Special surface test profiles: browser-smoke, surface-guard, discovery-heavy, global-context-shell
Required tests or manual smoke: existing Pest lane wrappers and raw full-suite command; no in-app Browser smoke unless implementation later changes visible UI, which is out of scope
Exception path and spread control: any repair outside CI/lane contract correction triggers follow-up-spec classification
Active feature PR close-out entry: FullSuiteClassification

Shared Pattern & System Fit

Cross-cutting feature marker: yes
Systems touched: scripts/platform-test-lane, scripts/platform-test-report, scripts/platform-test-artifacts, apps/platform/composer.json, apps/platform/tests/Support/TestLaneManifest.php, apps/platform/tests/Support/TestLaneReport.php, apps/platform/tests/Support/TestLaneBudget.php, CI guard tests under apps/platform/tests/Feature/Guards/
Shared abstractions reused: TestLaneManifest, TestLaneReport, TestLaneBudget, existing wrapper scripts and composer scripts
New abstraction introduced? why?: none
Why the existing abstraction was sufficient or insufficient: existing lane and failure-class contracts are the current source of truth; this spec proves or minimally corrects them instead of adding another layer
Bounded deviation / spread control: product/runtime failures must be classified and split rather than repaired here

OperationRun UX Impact

Touches OperationRun start/completion/link UX?: no
Central contract reused: N/A
Delegated UX behaviors: N/A
Surface-owned behavior kept local: N/A
Queued DB-notification policy: N/A
Terminal notification path: N/A
Exception path: none

Provider Boundary & Portability Fit

Shared provider/platform boundary touched?: no product provider boundary change
Provider-owned seams: provider/verification test failures may be classified, but runtime repair is out of scope unless it is strictly CI/lane contract drift
Platform-core seams: CI lane/report/artifact contract only
Neutral platform terms / contracts preserved: workspace, managed environment, provider connection, lane, failure group, CI signal
Retained provider-specific semantics and why: none added
Bounded extraction or follow-up path: follow-up-spec for any real provider/verification runtime debt after Spec 294

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Inventory-first: PASS. No inventory or snapshot runtime behavior changes.
Read/write separation: PASS. No application write/change function is introduced.
Graph contract path: PASS. No Microsoft Graph calls are introduced or changed.
Deterministic capabilities: PASS. Capability derivation is not changed.
RBAC-UX: PASS. Existing RBAC tests may fail and be classified, but authorization behavior is not changed by this spec unless a future follow-up owns it.
Workspace isolation: PASS. Workspace/managed-environment isolation failures are product/runtime debt, not CI-wrapper debt.
Tenant isolation: PASS. No tenant-plane route or compatibility behavior is restored.
Run observability: PASS. No new OperationRun, queue, scheduled work, or terminal notification policy is introduced.
Test governance (TEST-GOV-001): PASS. The spec explicitly names proving purpose, lane mix, fixture cost boundaries, heavy/browser visibility, budget/trend treatment, and split decisions.
Proportionality (PROP-001): PASS. The only new structure is one spec-local classification artifact needed for current CI readiness.
No premature abstraction (ABSTR-001): PASS. No new CI framework or lane abstraction is introduced.
Persisted truth (PERSIST-001): PASS. No application persistence; spec artifact is not runtime truth.
Behavioral state (STATE-001): PASS. The classification vocabulary controls implementation workflow only and does not become product state.
Shared pattern first (XCUT-001): PASS. Existing TestLaneManifest, TestLaneReport, wrapper scripts, and guard tests remain the shared path.
Provider boundary (PROV-001): PASS. No provider runtime or vocabulary boundary is changed.
V1 explicitness / few layers (V1-EXP-001, LAYER-001): PASS. Use direct classification and existing helpers.
Spec discipline / bloat check (SPEC-DISC-001, BLOAT-001): PASS with proportionality review in spec.md.
Filament-native UI (UI-FIL-001): PASS. No operator-facing Filament UI change.
Filament v5 / Livewire v4: PASS. Current app info confirms Filament 5.2.1 and Livewire 4.1.4; this spec does not alter that relationship.
Provider registration: PASS. No panel provider changes; Laravel provider registration remains in apps/platform/bootstrap/providers.php.

Post-design re-check: PASS while categories, seams, planned commands, and out-of-scope boundaries remain aligned across spec.md, plan.md, research.md, data-model.md, quickstart.md, tasks.md, checklists/requirements.md, and failure-classification.md.

Test Governance Check

Pinned categories: ci-signal-restored, ci-wrapper-or-manifest-regression, artifact-publication-regression, budget-or-trend-baseline-drift, product-runtime-or-test-regression, browser-lane-regression, flaky-or-environment, follow-up-spec-required, resolved-or-not-needed
Pinned seams: raw-full-suite, fast-feedback-lane, confidence-lane, heavy-governance-lane, browser-lane, profiling-or-junit-support, lane-reporting, artifact-publication, budget-trend-baseline, legacy-cutover-regression-guard, provider-verification-regression-guard
Test purpose / classification by changed surface: full-suite classification, CI lane contract verification, and optional CI/lane guard tests only
Affected validation lanes: raw full suite, fast-feedback, confidence, heavy-governance, browser, junit/report support
Why this lane mix is the narrowest sufficient proof: raw full suite answers the main readiness question; explicit lane split keeps classification possible when the raw run is too noisy; report/artifact commands validate CI interpretability
Narrowest proving command(s):
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser
- corresponding ./scripts/platform-test-report <lane> commands for report/artifact classification
Fixture / helper / factory / seed / context cost risks: no new defaults; classify fixture-heavy failures instead of widening setup by default
Expensive defaults or shared helper growth introduced?: no
Heavy-family additions, promotions, or visibility changes: none by default
Surface-class relief / special coverage rule: browser/heavy lane output is classification-only unless active fix scope explicitly owns it
Closing validation and reviewer handoff: reviewers should confirm no unclassified failing group, no hidden budget relaxation, no new lane family, and no legacy cutover behavior restoration
Budget / baseline / trend follow-up: classify in failure-classification.md; only adjust a baseline when the row explains why current evidence supports it
Review-stop questions: lane fit, hidden fixture cost, product repair scope creep, browser scope creep, budget baseline relaxation
Escalation path: document-in-feature for CI/lane contract corrections, follow-up-spec for product/runtime failures
Active feature PR close-out entry: FullSuiteClassification
Why no dedicated follow-up spec is needed: this spec is itself the bounded classification pass. Follow-up specs are created only for classified product/runtime groups.

Project Structure

Documentation (this feature)

specs/295-full-suite-ci-baseline/
├── checklists/
│   └── requirements.md
├── data-model.md
├── failure-classification.md
├── plan.md
├── quickstart.md
├── research.md
├── spec.md
└── tasks.md

Source Code (repository root)

scripts/
├── platform-test-artifacts
├── platform-test-lane
└── platform-test-report

apps/platform/
├── composer.json
└── tests/
    ├── Feature/Guards/
    └── Support/

Structure Decision: implementation should touch only the documentation artifacts above unless classification proves a small CI/lane contract defect in the listed scripts/support/guard-test surfaces. Runtime application code, migrations, models, Filament resources, routes, views, and provider services are out of scope.

Complexity Tracking

Violation	Why Needed	Simpler Alternative Rejected Because
Spec-local failure-classification vocabulary	The full-suite readiness decision needs one bounded way to classify all red groups after Specs `293` and `294`	Raw terminal notes would not preserve ownership, lane, or follow-up decisions

Proportionality Review

Current operator problem: maintainers cannot safely decide whether CI is restored without a classified full-suite baseline.
Existing structure is insufficient because: targeted green lanes and raw full-suite output answer different questions; neither alone assigns follow-up ownership.
Narrowest correct implementation: one spec-local classification artifact and existing lane wrappers.
Ownership cost: temporary classification upkeep during implementation and possibly small lane contract guard adjustments.
Alternative intentionally rejected: new full-suite CI framework or fix-all suite cleanup.
Release truth: current-release test governance and CI readiness.

Phase 0: Research Output

See /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/research.md.

Phase 1: Design Output

/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/data-model.md
/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/quickstart.md
/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/failure-classification.md

Phase 2: Task Planning Output

See /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/295-full-suite-ci-baseline/tasks.md.

13 KiB Raw Permalink Blame History