## Summary - add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work - fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv - add a guard test covering the artifact staging input contract ## Scope guards - no browser screenshot baselines included - no generated test artifacts included - no runtime application code changes included ## Notes - classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md` - this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295 Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #350
123 lines
15 KiB
Markdown
123 lines
15 KiB
Markdown
# Failure Classification: Full Suite Failure Classification & CI Lane Baseline
|
|
|
|
## Purpose
|
|
|
|
Use this artifact during implementation of Spec `295` to classify the complete platform suite signal after Specs `293` and `294`.
|
|
|
|
This artifact is spec-local workflow truth only. It is not application runtime truth.
|
|
|
|
## Implementation Scope Lock
|
|
|
|
- Date: 2026-05-11
|
|
- Branch: `295-full-suite-ci-baseline`
|
|
- Baseline commit: `eb85b76e Added Skill for Codex`
|
|
- Pre-run working tree: only the active untracked spec directory `specs/295-full-suite-ci-baseline/` is present; `git diff --stat` is empty.
|
|
- Scope confirmation: no runtime application code, Filament UI, routes, provider runtime, TenantPanelProvider behavior, `/admin/t/...` behavior, or completed Spec `293` / `294` artifacts are in scope unless a narrow CI/lane contract defect is proven by classification evidence.
|
|
- Forbidden repair confirmation: product/runtime failures, browser UI behavior failures, and provider/verification runtime failures are classification and follow-up candidates only unless the observed failure is directly caused by an existing lane wrapper, manifest, report, artifact, or budget/trend contract.
|
|
|
|
## Pinned Failure-Classification Categories
|
|
|
|
| Category | Meaning |
|
|
|---|---|
|
|
| `ci-signal-restored` | Full suite or lane split is green and usable as a CI signal |
|
|
| `ci-wrapper-or-manifest-regression` | Wrapper, composer script, workflow binding, or lane manifest no longer invokes the intended lane |
|
|
| `artifact-publication-regression` | Required report/JUnit/budget/profile/trend artifacts are not generated or staged as contracted |
|
|
| `budget-or-trend-baseline-drift` | Tests pass or mostly pass, but runtime budget/trend baseline output is stale or no longer interpretable |
|
|
| `product-runtime-or-test-regression` | A real app/test behavior failure outside the CI wrapper/report/artifact contract |
|
|
| `browser-lane-regression` | Existing browser lane or smoke failure needing browser-specific follow-up unless it is a CI artifact issue |
|
|
| `flaky-or-environment` | Nondeterministic, local container, browser runtime, database, queue, or runner issue |
|
|
| `follow-up-spec-required` | Confirmed out-of-scope failure needing a separate spec/lane owner |
|
|
| `resolved-or-not-needed` | Initially suspected group that no longer needs work after rerun or adjacent classification |
|
|
|
|
## Pinned CI / Suite Seams
|
|
|
|
| Seam | Meaning |
|
|
|---|---|
|
|
| `raw-full-suite` | Direct `sail artisan test --compact` complete suite signal |
|
|
| `fast-feedback-lane` | Existing fast-feedback wrapper and manifest selection |
|
|
| `confidence-lane` | Existing confidence wrapper and manifest selection |
|
|
| `heavy-governance-lane` | Existing heavy-governance wrapper and manifest selection |
|
|
| `browser-lane` | Existing browser wrapper and smoke selection |
|
|
| `profiling-or-junit-support` | Support lanes used for profiling or durable machine-readable output |
|
|
| `lane-reporting` | `scripts/platform-test-report` and `TestLaneReport` output |
|
|
| `artifact-publication` | `scripts/platform-test-artifacts` and lane artifact contracts |
|
|
| `budget-trend-baseline` | `TestLaneBudget`, lane thresholds, and trend-history classification |
|
|
| `legacy-cutover-regression-guard` | Failures that appear to challenge the retired route/panel baseline from Specs `287` to `293` |
|
|
| `provider-verification-regression-guard` | Failures that appear to challenge Spec `294` provider/verification semantics |
|
|
|
|
## Baseline Commands
|
|
|
|
Primary command:
|
|
|
|
```bash
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)
|
|
```
|
|
|
|
Fallback lane split:
|
|
|
|
```bash
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser
|
|
```
|
|
|
|
Report commands:
|
|
|
|
```bash
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report confidence
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report heavy-governance
|
|
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report browser
|
|
```
|
|
|
|
## Baseline Run Queue
|
|
|
|
The implementation must run or explicitly skip these targets and then add classified failure or success rows in the classification table below.
|
|
|
|
| Run Target | Expected Command | Expected Seam | Current Status |
|
|
|---|---|---|---|
|
|
| `raw-full-suite` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)` | `raw-full-suite` | red; 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s; output too broad/truncated for complete group ownership, fallback lane split required |
|
|
| `fast-feedback-lane` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback` | `fast-feedback-lane` | red; 82 failed, 1743 passed, 12151 assertions, 164.11s; report wall clock 171.551792s within 200s warning budget |
|
|
| `confidence-lane` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence` | `confidence-lane` | red; 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s; report wall clock 622.531394s over 450s warning budget |
|
|
| `heavy-governance-lane` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance` | `heavy-governance-lane` | red; 21 failed, 319 passed, 2443 assertions, 314.28s; report wall clock 314.828382s within 315s warning budget |
|
|
| `browser-lane` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser` | `browser-lane` | red; 20 failed, 29 passed, 417 assertions, 285.32s; report wall clock 285.719479s over 150s warning budget |
|
|
|
|
## Classification Table
|
|
|
|
The implementation must append one row per failing group or one `ci-signal-restored` row for a fully green signal.
|
|
|
|
| Group | Observed Command | Seam | Category | Observed Failure | Candidate Owner | Fix In 295? | Follow-up | Status |
|
|
|---|---|---|---|---|---|---|---|---|
|
|
| `raw-full-suite-red-baseline` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)` | `raw-full-suite` | `follow-up-spec-required` | Raw full suite completed but is not a restored CI signal: 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s. Failure output includes unit RBAC/capability assertions, provider boundary/start gate failures, route generation errors for workspace-aware operation routes, Filament panel URL generation errors, and browser smoke failures; full output was too broad/truncated to classify every group from raw output alone. | suite/lane ownership classification | no | run fallback lane split and classify by lane/report artifacts before any repair | classified |
|
|
| `fast-feedback-lane-product-red` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback` | `fast-feedback-lane` | `product-runtime-or-test-regression` | Lane completed through the wrapper but is red: 82 failed, 1743 passed, 12151 assertions, 164.11s. JUnit and console output show workspace-aware operation route URL generation without `workspace`, Filament `hasTenancy()` calls with no panel context, authorization expectation drift, RBAC/UI action assertions, provider boundary/start-gate assertions, and monitoring/required-permissions surfaces. | workspace route and Filament panel-context follow-up; RBAC/authorization follow-up; provider verification follow-up | no | split product/test failures into focused follow-up specs; keep 295 limited to lane/report/artifact contracts | classified |
|
|
| `confidence-lane-product-red` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence` | `confidence-lane` | `product-runtime-or-test-regression` | Lane completed through the wrapper but is red: 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s. Failure groups include the same workspace-route and Filament panel-context errors, missing/renamed Filament resource routes, bulk-action test helpers returning null actions, deny-as-not-found expectation drift, and legacy admin URL assumptions. | confidence-lane product/runtime owners by resource area | no | create follow-up ownership slices before any product repair; do not absorb broad application repair into 295 | classified |
|
|
| `heavy-governance-lane-product-red` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance` | `heavy-governance-lane` | `product-runtime-or-test-regression` | Lane completed through the wrapper but is red: 21 failed, 319 passed, 2443 assertions, 314.28s. Failures are concentrated in canonical operation detail/list tests missing the `workspace` route parameter, Filament URL generation without panel context, one tenant sync summary-count assertion, and RBAC relation-manager UI enforcement. | operations canonical viewer/list follow-up; tenant sync summary follow-up; RBAC UI follow-up | no | follow-up specs should repair these product/test contracts independently of the 295 lane baseline | classified |
|
|
| `browser-lane-red` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser` | `browser-lane` | `browser-lane-regression` | Browser lane completed through the wrapper but is red: 20 failed, 29 passed, 417 assertions, 285.32s. Failures include smoke-login pages not showing `Dashboard`, workspace-aware operation route URL generation errors, Filament `hasTenancy()` panel-context errors, a tenant dashboard layout assertion, a Spec 279 `/admin/t/...` path expectation, and a tenant membership page copy/action expectation. | browser smoke/product UI follow-up owners | no | split browser repairs separately; do not treat the lane as green or restore retired tenant routes in 295 | classified |
|
|
| `legacy-cutover-route-expectations` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser` and `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence` | `legacy-cutover-regression-guard` | `follow-up-spec-required` | Browser Spec 279 still expects `/admin/t/spec-279-production`, while the current path is `/admin/workspaces/{workspace}/environments/{environment}`. Confidence output also includes older expectations around `/admin/operations` and admin operation URLs. | tenant cutover regression guard owner | no | create follow-up only if current cutover truth should change; do not restore `/admin/t/...`, TenantPanelProvider behavior, or historical compatibility routes in 295 | classified |
|
|
| `provider-verification-regression-guard` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback` and raw full suite | `provider-verification-regression-guard` | `follow-up-spec-required` | Raw and fast-feedback output include provider boundary/status assertions such as unexpected `provider.capability_registry`, `review_required` vs `blocked`, and provider operation start-gate dispatch count drift. These are provider/runtime semantics, not lane wrapper failures. | Spec 294 provider/verification follow-up owner | no | open a provider verification follow-up if the current runtime semantics are wrong; do not rewrite Spec 294 artifacts under 295 | classified |
|
|
| `lane-reporting-all-lanes` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback`, `confidence`, `heavy-governance`, `browser` | `lane-reporting` | `resolved-or-not-needed` | All four report commands exited 0 and rendered `summary.md`, `report.json`, `budget.json`, `junit.xml`, and `trend-history.json` references. Reports preserved the red lane status and exposed budget/trend metadata instead of crashing. | TestLaneReport and report wrapper | no | none for report rendering | classified |
|
|
| `budget-trend-baseline-status` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback`, `confidence`, `heavy-governance`, `browser` | `budget-trend-baseline` | `resolved-or-not-needed` | Fast-feedback reported `within-budget` at 171.551792s under 200s. Heavy-governance reported `within-budget` at 314.828382s under 315s. Confidence and browser reported warning-level budget output, respectively 622.531394s over 450s and 285.719479s over 150s, with warning enforcement and no hard budget-blocking failure. Trend windows were either stable, scope-changed, or insufficient-history as documented. | TestLaneBudget/trend baseline owner | no | investigate confidence/browser runtime separately if desired; no 295 budget relaxation or baseline rewrite was justified | classified |
|
|
| `artifact-publication-env-forwarding` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts` and matching commands for confidence, heavy-governance, browser | `artifact-publication` | `artifact-publication-regression` | Initial artifact staging failed for every lane with `Unknown test lane []` because `scripts/platform-test-artifacts` passed lane and staging inputs to the Sail PHP process via host environment variables that were empty inside the container. | `scripts/platform-test-artifacts` | yes | fixed by passing lane, staging directory, and artifact directory as PHP argv through Sail and adding a guard test | resolved |
|
|
| `artifact-publication-after-fix` | `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts` and matching commands for confidence, heavy-governance, browser | `artifact-publication` | `resolved-or-not-needed` | After the wrapper fix, all four artifact commands exited 0 and staged all five required artifacts with `complete: true`, `primaryFailureClassId: null`, and no missing required artifacts. | `scripts/platform-test-artifacts` and TestLaneReport artifact contract | yes | none for artifact publication | classified |
|
|
| `junit-support-output` | not run separately; lane wrappers produced `apps/platform/storage/logs/test-lanes/*-latest.junit.xml` | `profiling-or-junit-support` | `resolved-or-not-needed` | Separate `./scripts/platform-test-lane junit` was not needed because fast-feedback, confidence, heavy-governance, and browser wrappers already produced machine-readable JUnit artifacts used for classification. | TestLaneManifest JUnit support | no | none unless a future follow-up needs the dedicated JUnit lane | classified |
|
|
|
|
## Final Readiness Decision
|
|
|
|
Current decision: `classified-follow-up-required`
|
|
|
|
Allowed values:
|
|
|
|
- `restored-ci-signal`
|
|
- `classified-follow-up-required`
|
|
- `blocked-by-environment`
|
|
|
|
## Classification Rules
|
|
|
|
- Every red group must have exactly one pinned category and one pinned seam.
|
|
- Do not use `ci-signal-restored` for a partially red lane.
|
|
- Fix in `295` only if the group is directly tied to CI wrapper, manifest, report, artifact, or budget/trend contract drift.
|
|
- Split product/runtime failures to follow-up ownership.
|
|
- Do not restore TenantPanelProvider, `/admin/t/...`, or retired tenant-scoped fallback routes.
|
|
- Do not rewrite completed Specs `293` or `294`.
|