TenantAtlas/specs/295-full-suite-ci-baseline/failure-classification.md
ahmido f03555eae1 Spec 295: full suite CI lane baseline (#350)
## Summary
- add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work
- fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv
- add a guard test covering the artifact staging input contract

## Scope guards
- no browser screenshot baselines included
- no generated test artifacts included
- no runtime application code changes included

## Notes
- classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md`
- this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #350
2026-05-11 11:14:56 +00:00

15 KiB

Failure Classification: Full Suite Failure Classification & CI Lane Baseline

Purpose

Use this artifact during implementation of Spec 295 to classify the complete platform suite signal after Specs 293 and 294.

This artifact is spec-local workflow truth only. It is not application runtime truth.

Implementation Scope Lock

  • Date: 2026-05-11
  • Branch: 295-full-suite-ci-baseline
  • Baseline commit: eb85b76e Added Skill for Codex
  • Pre-run working tree: only the active untracked spec directory specs/295-full-suite-ci-baseline/ is present; git diff --stat is empty.
  • Scope confirmation: no runtime application code, Filament UI, routes, provider runtime, TenantPanelProvider behavior, /admin/t/... behavior, or completed Spec 293 / 294 artifacts are in scope unless a narrow CI/lane contract defect is proven by classification evidence.
  • Forbidden repair confirmation: product/runtime failures, browser UI behavior failures, and provider/verification runtime failures are classification and follow-up candidates only unless the observed failure is directly caused by an existing lane wrapper, manifest, report, artifact, or budget/trend contract.

Pinned Failure-Classification Categories

Category Meaning
ci-signal-restored Full suite or lane split is green and usable as a CI signal
ci-wrapper-or-manifest-regression Wrapper, composer script, workflow binding, or lane manifest no longer invokes the intended lane
artifact-publication-regression Required report/JUnit/budget/profile/trend artifacts are not generated or staged as contracted
budget-or-trend-baseline-drift Tests pass or mostly pass, but runtime budget/trend baseline output is stale or no longer interpretable
product-runtime-or-test-regression A real app/test behavior failure outside the CI wrapper/report/artifact contract
browser-lane-regression Existing browser lane or smoke failure needing browser-specific follow-up unless it is a CI artifact issue
flaky-or-environment Nondeterministic, local container, browser runtime, database, queue, or runner issue
follow-up-spec-required Confirmed out-of-scope failure needing a separate spec/lane owner
resolved-or-not-needed Initially suspected group that no longer needs work after rerun or adjacent classification

Pinned CI / Suite Seams

Seam Meaning
raw-full-suite Direct sail artisan test --compact complete suite signal
fast-feedback-lane Existing fast-feedback wrapper and manifest selection
confidence-lane Existing confidence wrapper and manifest selection
heavy-governance-lane Existing heavy-governance wrapper and manifest selection
browser-lane Existing browser wrapper and smoke selection
profiling-or-junit-support Support lanes used for profiling or durable machine-readable output
lane-reporting scripts/platform-test-report and TestLaneReport output
artifact-publication scripts/platform-test-artifacts and lane artifact contracts
budget-trend-baseline TestLaneBudget, lane thresholds, and trend-history classification
legacy-cutover-regression-guard Failures that appear to challenge the retired route/panel baseline from Specs 287 to 293
provider-verification-regression-guard Failures that appear to challenge Spec 294 provider/verification semantics

Baseline Commands

Primary command:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)

Fallback lane split:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser

Report commands:

export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report confidence
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report heavy-governance
export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report browser

Baseline Run Queue

The implementation must run or explicitly skip these targets and then add classified failure or success rows in the classification table below.

Run Target Expected Command Expected Seam Current Status
raw-full-suite export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact) raw-full-suite red; 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s; output too broad/truncated for complete group ownership, fallback lane split required
fast-feedback-lane export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback fast-feedback-lane red; 82 failed, 1743 passed, 12151 assertions, 164.11s; report wall clock 171.551792s within 200s warning budget
confidence-lane export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence confidence-lane red; 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s; report wall clock 622.531394s over 450s warning budget
heavy-governance-lane export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance heavy-governance-lane red; 21 failed, 319 passed, 2443 assertions, 314.28s; report wall clock 314.828382s within 315s warning budget
browser-lane export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser browser-lane red; 20 failed, 29 passed, 417 assertions, 285.32s; report wall clock 285.719479s over 150s warning budget

Classification Table

The implementation must append one row per failing group or one ci-signal-restored row for a fully green signal.

Group Observed Command Seam Category Observed Failure Candidate Owner Fix In 295? Follow-up Status
raw-full-suite-red-baseline export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact) raw-full-suite follow-up-spec-required Raw full suite completed but is not a restored CI signal: 450 failed, 8 skipped, 4194 passed, 28831 assertions, 4686.08s. Failure output includes unit RBAC/capability assertions, provider boundary/start gate failures, route generation errors for workspace-aware operation routes, Filament panel URL generation errors, and browser smoke failures; full output was too broad/truncated to classify every group from raw output alone. suite/lane ownership classification no run fallback lane split and classify by lane/report artifacts before any repair classified
fast-feedback-lane-product-red export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback fast-feedback-lane product-runtime-or-test-regression Lane completed through the wrapper but is red: 82 failed, 1743 passed, 12151 assertions, 164.11s. JUnit and console output show workspace-aware operation route URL generation without workspace, Filament hasTenancy() calls with no panel context, authorization expectation drift, RBAC/UI action assertions, provider boundary/start-gate assertions, and monitoring/required-permissions surfaces. workspace route and Filament panel-context follow-up; RBAC/authorization follow-up; provider verification follow-up no split product/test failures into focused follow-up specs; keep 295 limited to lane/report/artifact contracts classified
confidence-lane-product-red export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence confidence-lane product-runtime-or-test-regression Lane completed through the wrapper but is red: 409 failed, 8 skipped, 3853 passed, 25994 assertions, 605.10s. Failure groups include the same workspace-route and Filament panel-context errors, missing/renamed Filament resource routes, bulk-action test helpers returning null actions, deny-as-not-found expectation drift, and legacy admin URL assumptions. confidence-lane product/runtime owners by resource area no create follow-up ownership slices before any product repair; do not absorb broad application repair into 295 classified
heavy-governance-lane-product-red export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance heavy-governance-lane product-runtime-or-test-regression Lane completed through the wrapper but is red: 21 failed, 319 passed, 2443 assertions, 314.28s. Failures are concentrated in canonical operation detail/list tests missing the workspace route parameter, Filament URL generation without panel context, one tenant sync summary-count assertion, and RBAC relation-manager UI enforcement. operations canonical viewer/list follow-up; tenant sync summary follow-up; RBAC UI follow-up no follow-up specs should repair these product/test contracts independently of the 295 lane baseline classified
browser-lane-red export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser browser-lane browser-lane-regression Browser lane completed through the wrapper but is red: 20 failed, 29 passed, 417 assertions, 285.32s. Failures include smoke-login pages not showing Dashboard, workspace-aware operation route URL generation errors, Filament hasTenancy() panel-context errors, a tenant dashboard layout assertion, a Spec 279 /admin/t/... path expectation, and a tenant membership page copy/action expectation. browser smoke/product UI follow-up owners no split browser repairs separately; do not treat the lane as green or restore retired tenant routes in 295 classified
legacy-cutover-route-expectations export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser and export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence legacy-cutover-regression-guard follow-up-spec-required Browser Spec 279 still expects /admin/t/spec-279-production, while the current path is /admin/workspaces/{workspace}/environments/{environment}. Confidence output also includes older expectations around /admin/operations and admin operation URLs. tenant cutover regression guard owner no create follow-up only if current cutover truth should change; do not restore /admin/t/..., TenantPanelProvider behavior, or historical compatibility routes in 295 classified
provider-verification-regression-guard export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback and raw full suite provider-verification-regression-guard follow-up-spec-required Raw and fast-feedback output include provider boundary/status assertions such as unexpected provider.capability_registry, review_required vs blocked, and provider operation start-gate dispatch count drift. These are provider/runtime semantics, not lane wrapper failures. Spec 294 provider/verification follow-up owner no open a provider verification follow-up if the current runtime semantics are wrong; do not rewrite Spec 294 artifacts under 295 classified
lane-reporting-all-lanes export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback, confidence, heavy-governance, browser lane-reporting resolved-or-not-needed All four report commands exited 0 and rendered summary.md, report.json, budget.json, junit.xml, and trend-history.json references. Reports preserved the red lane status and exposed budget/trend metadata instead of crashing. TestLaneReport and report wrapper no none for report rendering classified
budget-trend-baseline-status export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback, confidence, heavy-governance, browser budget-trend-baseline resolved-or-not-needed Fast-feedback reported within-budget at 171.551792s under 200s. Heavy-governance reported within-budget at 314.828382s under 315s. Confidence and browser reported warning-level budget output, respectively 622.531394s over 450s and 285.719479s over 150s, with warning enforcement and no hard budget-blocking failure. Trend windows were either stable, scope-changed, or insufficient-history as documented. TestLaneBudget/trend baseline owner no investigate confidence/browser runtime separately if desired; no 295 budget relaxation or baseline rewrite was justified classified
artifact-publication-env-forwarding export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts and matching commands for confidence, heavy-governance, browser artifact-publication artifact-publication-regression Initial artifact staging failed for every lane with Unknown test lane [] because scripts/platform-test-artifacts passed lane and staging inputs to the Sail PHP process via host environment variables that were empty inside the container. scripts/platform-test-artifacts yes fixed by passing lane, staging directory, and artifact directory as PHP argv through Sail and adding a guard test resolved
artifact-publication-after-fix export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-artifacts fast-feedback /tmp/tenantpilot-295-fast-feedback-artifacts and matching commands for confidence, heavy-governance, browser artifact-publication resolved-or-not-needed After the wrapper fix, all four artifact commands exited 0 and staged all five required artifacts with complete: true, primaryFailureClassId: null, and no missing required artifacts. scripts/platform-test-artifacts and TestLaneReport artifact contract yes none for artifact publication classified
junit-support-output not run separately; lane wrappers produced apps/platform/storage/logs/test-lanes/*-latest.junit.xml profiling-or-junit-support resolved-or-not-needed Separate ./scripts/platform-test-lane junit was not needed because fast-feedback, confidence, heavy-governance, and browser wrappers already produced machine-readable JUnit artifacts used for classification. TestLaneManifest JUnit support no none unless a future follow-up needs the dedicated JUnit lane classified

Final Readiness Decision

Current decision: classified-follow-up-required

Allowed values:

  • restored-ci-signal
  • classified-follow-up-required
  • blocked-by-environment

Classification Rules

  • Every red group must have exactly one pinned category and one pinned seam.
  • Do not use ci-signal-restored for a partially red lane.
  • Fix in 295 only if the group is directly tied to CI wrapper, manifest, report, artifact, or budget/trend contract drift.
  • Split product/runtime failures to follow-up ownership.
  • Do not restore TenantPanelProvider, /admin/t/..., or retired tenant-scoped fallback routes.
  • Do not rewrite completed Specs 293 or 294.