## Summary - add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work - fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv - add a guard test covering the artifact staging input contract ## Scope guards - no browser screenshot baselines included - no generated test artifacts included - no runtime application code changes included ## Notes - classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md` - this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295 Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #350
15 KiB
15 KiB
Failure Classification: Full Suite Failure Classification & CI Lane Baseline
Purpose
Use this artifact during implementation of Spec 295 to classify the complete platform suite signal after Specs 293 and 294.
This artifact is spec-local workflow truth only. It is not application runtime truth.
Implementation Scope Lock
- Date: 2026-05-11
- Branch:
295-full-suite-ci-baseline - Baseline commit:
eb85b76e Added Skill for Codex - Pre-run working tree: only the active untracked spec directory
specs/295-full-suite-ci-baseline/is present;git diff --statis empty. - Scope confirmation: no runtime application code, Filament UI, routes, provider runtime, TenantPanelProvider behavior,
/admin/t/...behavior, or completed Spec293/294artifacts are in scope unless a narrow CI/lane contract defect is proven by classification evidence. - Forbidden repair confirmation: product/runtime failures, browser UI behavior failures, and provider/verification runtime failures are classification and follow-up candidates only unless the observed failure is directly caused by an existing lane wrapper, manifest, report, artifact, or budget/trend contract.
Pinned Failure-Classification Categories
| Category | Meaning |
|---|---|
ci-signal-restored |
Full suite or lane split is green and usable as a CI signal |
ci-wrapper-or-manifest-regression |
Wrapper, composer script, workflow binding, or lane manifest no longer invokes the intended lane |
artifact-publication-regression |
Required report/JUnit/budget/profile/trend artifacts are not generated or staged as contracted |
budget-or-trend-baseline-drift |
Tests pass or mostly pass, but runtime budget/trend baseline output is stale or no longer interpretable |
product-runtime-or-test-regression |
A real app/test behavior failure outside the CI wrapper/report/artifact contract |
browser-lane-regression |
Existing browser lane or smoke failure needing browser-specific follow-up unless it is a CI artifact issue |
flaky-or-environment |
Nondeterministic, local container, browser runtime, database, queue, or runner issue |
follow-up-spec-required |
Confirmed out-of-scope failure needing a separate spec/lane owner |
resolved-or-not-needed |
Initially suspected group that no longer needs work after rerun or adjacent classification |
Pinned CI / Suite Seams
| Seam | Meaning |
|---|---|
raw-full-suite |
Direct sail artisan test --compact complete suite signal |
fast-feedback-lane |
Existing fast-feedback wrapper and manifest selection |
confidence-lane |
Existing confidence wrapper and manifest selection |
heavy-governance-lane |
Existing heavy-governance wrapper and manifest selection |
browser-lane |
Existing browser wrapper and smoke selection |
profiling-or-junit-support |
Support lanes used for profiling or durable machine-readable output |
lane-reporting |
scripts/platform-test-report and TestLaneReport output |
artifact-publication |
scripts/platform-test-artifacts and lane artifact contracts |
budget-trend-baseline |
TestLaneBudget, lane thresholds, and trend-history classification |
legacy-cutover-regression-guard |
Failures that appear to challenge the retired route/panel baseline from Specs 287 to 293 |
provider-verification-regression-guard |
Failures that appear to challenge Spec 294 provider/verification semantics |
Baseline Commands
Primary command:
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)
Fallback lane split:
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser
Report commands:
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report browser
Baseline Run Queue
The implementation must run or explicitly skip these targets and then add classified failure or success rows in the classification table below.
| Run Target | Expected Command | Expected Seam | Current Status |
|---|---|---|---|
raw-full-suite |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact) |
raw-full-suite |
red; 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s; output too broad/truncated for complete group ownership, fallback lane split required |
fast-feedback-lane |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback |
fast-feedback-lane |
red; 82 failed, 1743 passed, 12151 assertions, 164.11s; report wall clock 171.551792s within 200s warning budget |
confidence-lane |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence |
confidence-lane |
red; 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s; report wall clock 622.531394s over 450s warning budget |
heavy-governance-lane |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance |
heavy-governance-lane |
red; 21 failed, 319 passed, 2443 assertions, 314.28s; report wall clock 314.828382s within 315s warning budget |
browser-lane |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser |
browser-lane |
red; 20 failed, 29 passed, 417 assertions, 285.32s; report wall clock 285.719479s over 150s warning budget |
Classification Table
The implementation must append one row per failing group or one ci-signal-restored row for a fully green signal.
| Group | Observed Command | Seam | Category | Observed Failure | Candidate Owner | Fix In 295? | Follow-up | Status |
|---|---|---|---|---|---|---|---|---|
raw-full-suite-red-baseline |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact) |
raw-full-suite |
follow-up-spec-required |
Raw full suite completed but is not a restored CI signal: 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s. Failure output includes unit RBAC/capability assertions, provider boundary/start gate failures, route generation errors for workspace-aware operation routes, Filament panel URL generation errors, and browser smoke failures; full output was too broad/truncated to classify every group from raw output alone. | suite/lane ownership classification | no | run fallback lane split and classify by lane/report artifacts before any repair | classified |
fast-feedback-lane-product-red |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback |
fast-feedback-lane |
product-runtime-or-test-regression |
Lane completed through the wrapper but is red: 82 failed, 1743 passed, 12151 assertions, 164.11s. JUnit and console output show workspace-aware operation route URL generation without workspace, Filament hasTenancy() calls with no panel context, authorization expectation drift, RBAC/UI action assertions, provider boundary/start-gate assertions, and monitoring/required-permissions surfaces. |
workspace route and Filament panel-context follow-up; RBAC/authorization follow-up; provider verification follow-up | no | split product/test failures into focused follow-up specs; keep 295 limited to lane/report/artifact contracts | classified |
confidence-lane-product-red |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence |
confidence-lane |
product-runtime-or-test-regression |
Lane completed through the wrapper but is red: 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s. Failure groups include the same workspace-route and Filament panel-context errors, missing/renamed Filament resource routes, bulk-action test helpers returning null actions, deny-as-not-found expectation drift, and legacy admin URL assumptions. | confidence-lane product/runtime owners by resource area | no | create follow-up ownership slices before any product repair; do not absorb broad application repair into 295 | classified |
heavy-governance-lane-product-red |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance |
heavy-governance-lane |
product-runtime-or-test-regression |
Lane completed through the wrapper but is red: 21 failed, 319 passed, 2443 assertions, 314.28s. Failures are concentrated in canonical operation detail/list tests missing the workspace route parameter, Filament URL generation without panel context, one tenant sync summary-count assertion, and RBAC relation-manager UI enforcement. |
operations canonical viewer/list follow-up; tenant sync summary follow-up; RBAC UI follow-up | no | follow-up specs should repair these product/test contracts independently of the 295 lane baseline | classified |
browser-lane-red |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser |
browser-lane |
browser-lane-regression |
Browser lane completed through the wrapper but is red: 20 failed, 29 passed, 417 assertions, 285.32s. Failures include smoke-login pages not showing Dashboard, workspace-aware operation route URL generation errors, Filament hasTenancy() panel-context errors, a tenant dashboard layout assertion, a Spec 279 /admin/t/... path expectation, and a tenant membership page copy/action expectation. |
browser smoke/product UI follow-up owners | no | split browser repairs separately; do not treat the lane as green or restore retired tenant routes in 295 | classified |
legacy-cutover-route-expectations |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser and export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence |
legacy-cutover-regression-guard |
follow-up-spec-required |
Browser Spec 279 still expects /admin/t/spec-279-production, while the current path is /admin/workspaces/{workspace}/environments/{environment}. Confidence output also includes older expectations around /admin/operations and admin operation URLs. |
tenant cutover regression guard owner | no | create follow-up only if current cutover truth should change; do not restore /admin/t/..., TenantPanelProvider behavior, or historical compatibility routes in 295 |
classified |
provider-verification-regression-guard |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback and raw full suite |
provider-verification-regression-guard |
follow-up-spec-required |
Raw and fast-feedback output include provider boundary/status assertions such as unexpected provider.capability_registry, review_required vs blocked, and provider operation start-gate dispatch count drift. These are provider/runtime semantics, not lane wrapper failures. |
Spec 294 provider/verification follow-up owner | no | open a provider verification follow-up if the current runtime semantics are wrong; do not rewrite Spec 294 artifacts under 295 | classified |
lane-reporting-all-lanes |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback, confidence, heavy-governance, browser |
lane-reporting |
resolved-or-not-needed |
All four report commands exited 0 and rendered summary.md, report.json, budget.json, junit.xml, and trend-history.json references. Reports preserved the red lane status and exposed budget/trend metadata instead of crashing. |
TestLaneReport and report wrapper | no | none for report rendering | classified |
budget-trend-baseline-status |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback, confidence, heavy-governance, browser |
budget-trend-baseline |
resolved-or-not-needed |
Fast-feedback reported within-budget at 171.551792s under 200s. Heavy-governance reported within-budget at 314.828382s under 315s. Confidence and browser reported warning-level budget output, respectively 622.531394s over 450s and 285.719479s over 150s, with warning enforcement and no hard budget-blocking failure. Trend windows were either stable, scope-changed, or insufficient-history as documented. |
TestLaneBudget/trend baseline owner | no | investigate confidence/browser runtime separately if desired; no 295 budget relaxation or baseline rewrite was justified | classified |
artifact-publication-env-forwarding |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts and matching commands for confidence, heavy-governance, browser |
artifact-publication |
artifact-publication-regression |
Initial artifact staging failed for every lane with Unknown test lane [] because scripts/platform-test-artifacts passed lane and staging inputs to the Sail PHP process via host environment variables that were empty inside the container. |
scripts/platform-test-artifacts |
yes | fixed by passing lane, staging directory, and artifact directory as PHP argv through Sail and adding a guard test | resolved |
artifact-publication-after-fix |
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts and matching commands for confidence, heavy-governance, browser |
artifact-publication |
resolved-or-not-needed |
After the wrapper fix, all four artifact commands exited 0 and staged all five required artifacts with complete: true, primaryFailureClassId: null, and no missing required artifacts. |
scripts/platform-test-artifacts and TestLaneReport artifact contract |
yes | none for artifact publication | classified |
junit-support-output |
not run separately; lane wrappers produced apps/platform/storage/logs/test-lanes/*-latest.junit.xml |
profiling-or-junit-support |
resolved-or-not-needed |
Separate ./scripts/platform-test-lane junit was not needed because fast-feedback, confidence, heavy-governance, and browser wrappers already produced machine-readable JUnit artifacts used for classification. |
TestLaneManifest JUnit support | no | none unless a future follow-up needs the dedicated JUnit lane | classified |
Final Readiness Decision
Current decision: classified-follow-up-required
Allowed values:
restored-ci-signalclassified-follow-up-requiredblocked-by-environment
Classification Rules
- Every red group must have exactly one pinned category and one pinned seam.
- Do not use
ci-signal-restoredfor a partially red lane. - Fix in
295only if the group is directly tied to CI wrapper, manifest, report, artifact, or budget/trend contract drift. - Split product/runtime failures to follow-up ownership.
- Do not restore TenantPanelProvider,
/admin/t/..., or retired tenant-scoped fallback routes. - Do not rewrite completed Specs
293or294.