ahmido f03555eae1 Spec 295: full suite CI lane baseline (#350 )

## Summary
- add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work
- fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv
- add a guard test covering the artifact staging input contract

## Scope guards
- no browser screenshot baselines included
- no generated test artifacts included
- no runtime application code changes included

## Notes
- classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md`
- this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #350

2026-05-11 11:14:56 +00:00

15 KiB

Raw Permalink Blame History

Failure Classification: Full Suite Failure Classification & CI Lane Baseline

Purpose

Use this artifact during implementation of Spec 295 to classify the complete platform suite signal after Specs 293 and 294.

This artifact is spec-local workflow truth only. It is not application runtime truth.

Implementation Scope Lock

Date: 2026-05-11
Branch: 295-full-suite-ci-baseline
Baseline commit: eb85b76e Added Skill for Codex
Pre-run working tree: only the active untracked spec directory specs/295-full-suite-ci-baseline/ is present; git diff --stat is empty.
Scope confirmation: no runtime application code, Filament UI, routes, provider runtime, TenantPanelProvider behavior, /admin/t/... behavior, or completed Spec 293 / 294 artifacts are in scope unless a narrow CI/lane contract defect is proven by classification evidence.
Forbidden repair confirmation: product/runtime failures, browser UI behavior failures, and provider/verification runtime failures are classification and follow-up candidates only unless the observed failure is directly caused by an existing lane wrapper, manifest, report, artifact, or budget/trend contract.

Pinned Failure-Classification Categories

Category	Meaning
`ci-signal-restored`	Full suite or lane split is green and usable as a CI signal
`ci-wrapper-or-manifest-regression`	Wrapper, composer script, workflow binding, or lane manifest no longer invokes the intended lane
`artifact-publication-regression`	Required report/JUnit/budget/profile/trend artifacts are not generated or staged as contracted
`budget-or-trend-baseline-drift`	Tests pass or mostly pass, but runtime budget/trend baseline output is stale or no longer interpretable
`product-runtime-or-test-regression`	A real app/test behavior failure outside the CI wrapper/report/artifact contract
`browser-lane-regression`	Existing browser lane or smoke failure needing browser-specific follow-up unless it is a CI artifact issue
`flaky-or-environment`	Nondeterministic, local container, browser runtime, database, queue, or runner issue
`follow-up-spec-required`	Confirmed out-of-scope failure needing a separate spec/lane owner
`resolved-or-not-needed`	Initially suspected group that no longer needs work after rerun or adjacent classification

Pinned CI / Suite Seams

Seam	Meaning
`raw-full-suite`	Direct `sail artisan test --compact` complete suite signal
`fast-feedback-lane`	Existing fast-feedback wrapper and manifest selection
`confidence-lane`	Existing confidence wrapper and manifest selection
`heavy-governance-lane`	Existing heavy-governance wrapper and manifest selection
`browser-lane`	Existing browser wrapper and smoke selection
`profiling-or-junit-support`	Support lanes used for profiling or durable machine-readable output
`lane-reporting`	`scripts/platform-test-report` and `TestLaneReport` output
`artifact-publication`	`scripts/platform-test-artifacts` and lane artifact contracts
`budget-trend-baseline`	`TestLaneBudget`, lane thresholds, and trend-history classification
`legacy-cutover-regression-guard`	Failures that appear to challenge the retired route/panel baseline from Specs `287` to `293`
`provider-verification-regression-guard`	Failures that appear to challenge Spec `294` provider/verification semantics

Baseline Commands

Primary command:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)

Fallback lane split:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser

Report commands:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report browser

Baseline Run Queue

The implementation must run or explicitly skip these targets and then add classified failure or success rows in the classification table below.

Run Target	Expected Command	Expected Seam	Current Status
`raw-full-suite`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)`	`raw-full-suite`	red; 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s; output too broad/truncated for complete group ownership, fallback lane split required
`fast-feedback-lane`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback`	`fast-feedback-lane`	red; 82 failed, 1743 passed, 12151 assertions, 164.11s; report wall clock 171.551792s within 200s warning budget
`confidence-lane`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence`	`confidence-lane`	red; 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s; report wall clock 622.531394s over 450s warning budget
`heavy-governance-lane`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance`	`heavy-governance-lane`	red; 21 failed, 319 passed, 2443 assertions, 314.28s; report wall clock 314.828382s within 315s warning budget
`browser-lane`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser`	`browser-lane`	red; 20 failed, 29 passed, 417 assertions, 285.32s; report wall clock 285.719479s over 150s warning budget

Classification Table

The implementation must append one row per failing group or one ci-signal-restored row for a fully green signal.

Group	Observed Command	Seam	Category	Observed Failure	Candidate Owner	Fix In 295?	Follow-up	Status
`raw-full-suite-red-baseline`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)`	`raw-full-suite`	`follow-up-spec-required`	Raw full suite completed but is not a restored CI signal: 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s. Failure output includes unit RBAC/capability assertions, provider boundary/start gate failures, route generation errors for workspace-aware operation routes, Filament panel URL generation errors, and browser smoke failures; full output was too broad/truncated to classify every group from raw output alone.	suite/lane ownership classification	no	run fallback lane split and classify by lane/report artifacts before any repair	classified
`fast-feedback-lane-product-red`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback`	`fast-feedback-lane`	`product-runtime-or-test-regression`	Lane completed through the wrapper but is red: 82 failed, 1743 passed, 12151 assertions, 164.11s. JUnit and console output show workspace-aware operation route URL generation without `workspace`, Filament `hasTenancy()` calls with no panel context, authorization expectation drift, RBAC/UI action assertions, provider boundary/start-gate assertions, and monitoring/required-permissions surfaces.	workspace route and Filament panel-context follow-up; RBAC/authorization follow-up; provider verification follow-up	no	split product/test failures into focused follow-up specs; keep 295 limited to lane/report/artifact contracts	classified
`confidence-lane-product-red`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence`	`confidence-lane`	`product-runtime-or-test-regression`	Lane completed through the wrapper but is red: 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s. Failure groups include the same workspace-route and Filament panel-context errors, missing/renamed Filament resource routes, bulk-action test helpers returning null actions, deny-as-not-found expectation drift, and legacy admin URL assumptions.	confidence-lane product/runtime owners by resource area	no	create follow-up ownership slices before any product repair; do not absorb broad application repair into 295	classified
`heavy-governance-lane-product-red`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance`	`heavy-governance-lane`	`product-runtime-or-test-regression`	Lane completed through the wrapper but is red: 21 failed, 319 passed, 2443 assertions, 314.28s. Failures are concentrated in canonical operation detail/list tests missing the `workspace` route parameter, Filament URL generation without panel context, one tenant sync summary-count assertion, and RBAC relation-manager UI enforcement.	operations canonical viewer/list follow-up; tenant sync summary follow-up; RBAC UI follow-up	no	follow-up specs should repair these product/test contracts independently of the 295 lane baseline	classified
`browser-lane-red`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser`	`browser-lane`	`browser-lane-regression`	Browser lane completed through the wrapper but is red: 20 failed, 29 passed, 417 assertions, 285.32s. Failures include smoke-login pages not showing `Dashboard`, workspace-aware operation route URL generation errors, Filament `hasTenancy()` panel-context errors, a tenant dashboard layout assertion, a Spec 279 `/admin/t/...` path expectation, and a tenant membership page copy/action expectation.	browser smoke/product UI follow-up owners	no	split browser repairs separately; do not treat the lane as green or restore retired tenant routes in 295	classified
`legacy-cutover-route-expectations`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser` and `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence`	`legacy-cutover-regression-guard`	`follow-up-spec-required`	Browser Spec 279 still expects `/admin/t/spec-279-production`, while the current path is `/admin/workspaces/{workspace}/environments/{environment}`. Confidence output also includes older expectations around `/admin/operations` and admin operation URLs.	tenant cutover regression guard owner	no	create follow-up only if current cutover truth should change; do not restore `/admin/t/...`, TenantPanelProvider behavior, or historical compatibility routes in 295	classified
`provider-verification-regression-guard`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback` and raw full suite	`provider-verification-regression-guard`	`follow-up-spec-required`	Raw and fast-feedback output include provider boundary/status assertions such as unexpected `provider.capability_registry`, `review_required` vs `blocked`, and provider operation start-gate dispatch count drift. These are provider/runtime semantics, not lane wrapper failures.	Spec 294 provider/verification follow-up owner	no	open a provider verification follow-up if the current runtime semantics are wrong; do not rewrite Spec 294 artifacts under 295	classified
`lane-reporting-all-lanes`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback`, `confidence`, `heavy-governance`, `browser`	`lane-reporting`	`resolved-or-not-needed`	All four report commands exited 0 and rendered `summary.md`, `report.json`, `budget.json`, `junit.xml`, and `trend-history.json` references. Reports preserved the red lane status and exposed budget/trend metadata instead of crashing.	TestLaneReport and report wrapper	no	none for report rendering	classified
`budget-trend-baseline-status`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback`, `confidence`, `heavy-governance`, `browser`	`budget-trend-baseline`	`resolved-or-not-needed`	Fast-feedback reported `within-budget` at 171.551792s under 200s. Heavy-governance reported `within-budget` at 314.828382s under 315s. Confidence and browser reported warning-level budget output, respectively 622.531394s over 450s and 285.719479s over 150s, with warning enforcement and no hard budget-blocking failure. Trend windows were either stable, scope-changed, or insufficient-history as documented.	TestLaneBudget/trend baseline owner	no	investigate confidence/browser runtime separately if desired; no 295 budget relaxation or baseline rewrite was justified	classified
`artifact-publication-env-forwarding`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts` and matching commands for confidence, heavy-governance, browser	`artifact-publication`	`artifact-publication-regression`	Initial artifact staging failed for every lane with `Unknown test lane []` because `scripts/platform-test-artifacts` passed lane and staging inputs to the Sail PHP process via host environment variables that were empty inside the container.	`scripts/platform-test-artifacts`	yes	fixed by passing lane, staging directory, and artifact directory as PHP argv through Sail and adding a guard test	resolved
`artifact-publication-after-fix`	`export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts` and matching commands for confidence, heavy-governance, browser	`artifact-publication`	`resolved-or-not-needed`	After the wrapper fix, all four artifact commands exited 0 and staged all five required artifacts with `complete: true`, `primaryFailureClassId: null`, and no missing required artifacts.	`scripts/platform-test-artifacts` and TestLaneReport artifact contract	yes	none for artifact publication	classified
`junit-support-output`	not run separately; lane wrappers produced `apps/platform/storage/logs/test-lanes/*-latest.junit.xml`	`profiling-or-junit-support`	`resolved-or-not-needed`	Separate `./scripts/platform-test-lane junit` was not needed because fast-feedback, confidence, heavy-governance, and browser wrappers already produced machine-readable JUnit artifacts used for classification.	TestLaneManifest JUnit support	no	none unless a future follow-up needs the dedicated JUnit lane	classified

Final Readiness Decision

Current decision: classified-follow-up-required

Allowed values:

restored-ci-signal
classified-follow-up-required
blocked-by-environment

Classification Rules

Every red group must have exactly one pinned category and one pinned seam.
Do not use ci-signal-restored for a partially red lane.
Fix in 295 only if the group is directly tied to CI wrapper, manifest, report, artifact, or budget/trend contract drift.
Split product/runtime failures to follow-up ownership.
Do not restore TenantPanelProvider, /admin/t/..., or retired tenant-scoped fallback routes.
Do not rewrite completed Specs 293 or 294.

15 KiB Raw Permalink Blame History