ahmido f03555eae1 Spec 295: full suite CI lane baseline (#350 )

## Summary
- add the Spec 295 artifacts for full-suite failure classification and CI lane baseline work
- fix `scripts/platform-test-artifacts` so Sail passes artifact staging inputs into the embedded PHP script via argv
- add a guard test covering the artifact staging input contract

## Scope guards
- no browser screenshot baselines included
- no generated test artifacts included
- no runtime application code changes included

## Notes
- classification evidence and follow-up ownership are documented in `specs/295-full-suite-ci-baseline/failure-classification.md`
- this PR is intentionally limited to the CI/lane/artifact contract slice for Spec 295

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #350

2026-05-11 11:14:56 +00:00

32 KiB

Raw Permalink Blame History

Feature Specification: Full Suite Failure Classification & CI Lane Baseline

Feature Branch: 295-full-suite-ci-baseline
Created: 2026-05-11
Status: Ready
Input: User description: "Spec 295 - Full Suite Failure Classification & CI Lane Baseline. After Specs 293 and 294, run a full-suite classification to determine whether the full platform suite is again a reliable CI signal or whether remaining failures must be classified into separate follow-up specs or lanes. Do not blindly fix the full suite, do not scope-creep, do not reopen tenant cutover, do not restore legacy /admin/t/... or TenantPanelProvider behavior, and perform only small clearly in-scope fixes."

Spec Candidate Check (mandatory - SPEC-GATE-001)

Problem: Specs 293 and 294 closed the known post-cutover route/action-surface and ProviderConnections/Verification failure blocks, but the complete platform suite has not yet been classified as a restored CI signal. Maintainers need one bounded pass that distinguishes green signal, CI wrapper or lane baseline failures, remaining product regressions, flaky or environment failures, and follow-up-spec debt.
Today's failure: targeted lanes can be green while the raw full suite or CI lane wrappers may still fail for unrelated product debt, wrapper/report/artifact drift, budget baseline changes, browser-specific fallout, or environment-only failures. Without classification, future work cannot tell whether a red run means "fix this PR", "rerun because infrastructure failed", "update lane baseline", or "open a follow-up spec".
User-visible improvement: maintainers get an attributable CI readiness decision: either the complete platform suite is a reliable blocking signal again, or every remaining red group is explicitly assigned to the right lane, owner, and follow-up path without reviving retired tenant routes or reopening Specs 293 and 294.
Smallest enterprise-capable version: one classification-first package that runs the raw full suite or its explicit fallback lane split, records every failing group in failure-classification.md, validates existing lane wrappers/report/artifact contracts, applies only small CI-signal fixes when the failure is clearly in scope, and records all product/runtime failures as follow-up candidates instead of absorbing them.
Explicit non-goals: no broad full-suite repair, no tenant-cutover rework, no TenantPanelProvider reactivation, no /admin/t/... route restoration, no provider/verification runtime expansion beyond Spec 294, no new CI framework, no new permanent test lane by default, no new browser family, no new runtime persistence, no UI redesign, no product feature work, no unrelated failing-test cleanup, and no historical-spec rewrites.
Permanent complexity imported: one spec-local failure-classification.md artifact, one bounded failure-category inventory, one bounded CI/lane seam inventory, and focused tasks against existing test lane scripts, lane manifest/report support, and current Pest lane commands. No runtime table, model, enum, provider abstraction, Filament resource, or product surface is introduced.
Why now: after 293 and 294, the next quality question is no longer one known red cluster. It is whether CI can be trusted again as a whole. If this is not classified now, later specs will either over-trust a partially red suite or keep rediscovering unrelated failures as local surprises.
Why not local: the signal spans raw Pest execution, scripts/platform-test-lane, scripts/platform-test-report, scripts/platform-test-artifacts, Tests\Support\TestLaneManifest, Tests\Support\TestLaneReport, browser isolation, heavy-governance budget/reporting, and current workflow profiles. A one-file patch would not prove CI readiness.
Approval class: Cleanup
Red flags triggered: full-suite scope, cross-cutting test governance, and possible temptation to repair unrelated product failures. Defense: this spec is classification-first, uses existing lane/failure-class contracts, imports only a spec-local artifact, and forbids broad repair or legacy route restoration.
Score: Nutzen: 2 | Dringlichkeit: 2 | Scope: 2 | Komplexitaet: 1 | Produktnaehe: 1 | Wiederverwendung: 2 | Gesamt: 10/12
Decision: approve

Review Outcome

Outcome class: acceptable-special-case
Workflow outcome: keep
Test-governance outcome: keep
Reason: full-suite work is normally too broad, but this package is justified because it is a classification and CI-signal baseline pass after two completed stabilization slices, not a fix-all implementation.
Workflow result: Ready for implementation as one bounded suite-signal classification package after Specs 293 and 294.

Candidate Selection Gate

Selected candidate: Full Suite Failure Classification & CI Lane Baseline
Source location: explicit user-provided manual follow-up after specs/293-post-cutover-suite-stabilization/ and specs/294-provider-verification-runtime-semantics/
Why selected now: the known cutover and provider/verification red blocks have been stabilized, so the remaining decision is whether the full platform suite and lane wrappers now form a trustworthy CI signal.
Why close alternatives were deferred:
- reopening Spec 293 would blur route/action-surface cutover cleanup with full-suite CI readiness
- reopening Spec 294 would blur provider/verification runtime semantics with unrelated suite failures
- starting Package Execution, Guided Operations, Microsoft Starter Pack, or Virtual Consultant would hide CI uncertainty under new product work
- creating a new permanent full-suite lane would import CI framework complexity before proving the existing lanes are insufficient
- fixing every failing test in one pass would scope-creep beyond classification and make follow-up ownership unclear
Roadmap relationship: test-governance and platform quality follow-through under TEST-GOV-001; this is not a new product roadmap lane and not an automatic active queue promotion.
Completed-spec guardrail result: Specs 293 and 294 are context only and are excluded from refresh. Spec 294 carries implementation close-out evidence. Spec 293 is treated as the completed post-cutover baseline described by the user and its failure-classification history is preserved; this spec does not rewrite 293 tasks or close-out history. Specs 287 and 288 remain prior cutover and no-legacy guard context only.
Smallest viable implementation slice: run the full suite or explicit lane split, classify every remaining failure group, validate CI wrapper/report/artifact contracts, and perform only small CI-signal fixes that do not change product behavior.
Proposed concise feature description to feed into specify: Classify the full platform test suite after Specs 293 and 294 and establish whether existing CI lanes provide a trustworthy baseline, while splitting unrelated failures into explicit follow-up ownership instead of repairing the suite blindly.

Pinned CI / Suite Seams

raw-full-suite
fast-feedback-lane
confidence-lane
heavy-governance-lane
browser-lane
profiling-or-junit-support
lane-reporting
artifact-publication
budget-trend-baseline
legacy-cutover-regression-guard
provider-verification-regression-guard

Spec Scope Fields (mandatory)

Scope: repository / CI test-governance workflow
Primary Routes: N/A - no application routes or operator-facing navigation are added or restored. Retired /admin/t/... routes and TenantPanelProvider behavior remain forbidden.
Data Ownership:
- no new application persistence is introduced
- no runtime source of truth is introduced
- failure-classification.md is a spec-local implementation artifact and is not product/runtime truth
- existing test lane truth remains in apps/platform/tests/Support/TestLaneManifest.php, apps/platform/tests/Support/TestLaneReport.php, and the wrapper scripts under scripts/
RBAC:
- no authorization model changes are introduced
- existing workspace and managed-environment isolation tests remain ordinary suite participants
- if a failing group concerns RBAC, it must be classified as product/runtime debt or a follow-up spec unless it is clearly only a stale CI/lane assertion

For canonical-view specs, the spec MUST define:

Default filter behavior when tenant-context is active: N/A - no canonical-view application surface is added or changed.
Explicit entitlement checks preventing cross-tenant leakage: N/A for this prep package. Any suite failure suggesting leakage must be classified as product-runtime debt and not hidden as a lane issue.

Cross-Cutting / Shared Pattern Reuse (mandatory when the feature touches notifications, status messaging, action links, header actions, dashboard signals/cards, alerts, navigation entry points, evidence/report viewers, or any other existing shared operator interaction family; otherwise write `N/A - no shared interaction family touched`)

Cross-cutting feature?: yes
Interaction class(es): CI lane execution, full-suite signal classification, lane report generation, artifact publication, budget/trend baseline review, and follow-up-spec routing
Systems touched:
- scripts/platform-test-lane
- scripts/platform-test-report
- scripts/platform-test-artifacts
- apps/platform/composer.json
- apps/platform/tests/Support/TestLaneManifest.php
- apps/platform/tests/Support/TestLaneReport.php
- apps/platform/tests/Support/TestLaneBudget.php
- apps/platform/tests/Feature/Guards/TestLaneManifestTest.php
- apps/platform/tests/Feature/Guards/CiLaneFailureClassificationContractTest.php
- apps/platform/tests/Feature/Guards/CiFastFeedbackWorkflowContractTest.php
- apps/platform/tests/Feature/Guards/CiConfidenceWorkflowContractTest.php
- apps/platform/tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php
- existing lane-selected Pest tests and browser smoke files only as classification inputs unless a small CI-signal fix is proven
Existing pattern(s) to extend: existing TestLaneManifest lane definitions, existing TestLaneReport failure classes, existing lane wrapper scripts, existing Gitea workflow profile metadata, existing report/artifact publication contracts
Shared contract / presenter / builder / renderer to reuse: TestLaneManifest::lanes(), TestLaneManifest::workflowProfiles(), TestLaneManifest::failureClasses(), TestLaneReport::classifyPrimaryFailure(), TestLaneReport::buildCiSummary(), TestLaneReport::artifactPublicationStatus(), and scripts/platform-test-*
Why the existing shared path is sufficient or insufficient: the repo already has explicit lane, failure-class, artifact, and budget contracts. Spec 295 must prove whether they are currently enough and fix only small contract drift; it must not create a new CI orchestration layer before existing contracts are classified.
Allowed deviation and why: only a bounded CI/lane contract correction is allowed when a wrapper, manifest, report, artifact, or budget baseline defect prevents classification. Product/runtime failures must be classified and split instead of fixed here.
Consistency impact: raw suite output, lane wrapper output, report artifacts, budget/trend summaries, and final follow-up classification must tell the same story about whether the suite is green, blocked, flaky, or split.
Review focus: reviewers must verify that this spec does not become a general failing-test cleanup, does not restore tenant-cutover legacy behavior, and does not add a new permanent lane unless the artifacts explicitly prove existing lanes are insufficient.

OperationRun UX Impact (mandatory when the feature creates, queues, deduplicates, resumes, blocks, completes, or deep-links to an `OperationRun`; otherwise write `N/A - no OperationRun start or link semantics touched`)

Touches OperationRun start/completion/link UX?: no
Shared OperationRun UX contract/layer reused: N/A
Delegated start/completion UX behaviors: N/A
Local surface-owned behavior that remains: N/A
Queued DB-notification policy: N/A
Terminal notification path: N/A
Exception required?: none

Provider Boundary / Platform Core Check (mandatory when the feature changes shared provider/platform seams, identity scope, governed-subject taxonomy, compare strategy selection, provider connection descriptors, or operator vocabulary that may leak provider-specific semantics into platform-core truth; otherwise write `N/A - no shared provider/platform boundary touched`)

Shared provider/platform boundary touched?: no product boundary change
Boundary classification: N/A
Seams affected: provider and verification tests may fail during classification, but this spec may only classify them as regression or follow-up debt unless the failure is purely a CI/lane contract issue.
Neutral platform terms preserved or introduced: workspace, managed environment, provider connection, operation, lane, failure group, CI signal
Provider-specific semantics retained and why: N/A
Why this does not deepen provider coupling accidentally: Spec 295 does not change provider runtime, provider identity, target-scope semantics, or provider copy. It treats provider-specific failures as test/runtime debt requiring explicit follow-up unless they are already covered by the completed Spec 294 seam and proven to be a small regression in the CI contract.
Follow-up path: any real provider/verification product failure after Spec 294 must become a follow-up spec or explicitly named failure group, not hidden in this classification pass.

UI / Surface Guardrail Impact (mandatory when operator-facing surfaces are changed; otherwise write `N/A`)

N/A - no operator-facing surface change. Browser tests may be run as a lane signal only; visible UI repair is out of scope unless a later implementation explicitly stops and opens a follow-up spec.

Decision-First Surface Role (mandatory when operator-facing surfaces are changed)

N/A - no application decision surface is added or changed.

Audience-Aware Disclosure (mandatory when operator-facing surfaces are changed)

N/A - no application disclosure layer is added or changed.

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

N/A - no Filament screen, table, widget, relation manager, or resource is added or materially refactored.

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

N/A - no operator-facing page contract is introduced.

Proportionality Review (mandatory when structural complexity is introduced)

New source of truth?: no runtime source of truth
New persisted entity/table/artifact?: no application persistence; one spec-local failure-classification.md artifact is added for implementation tracking only
New abstraction?: no
New enum/state/reason family?: yes, one spec-local failure-classification category set used only inside this spec package
New cross-domain UI framework/taxonomy?: no
Current operator problem: maintainers need one reliable answer to whether the full suite is a usable CI signal after Specs 293 and 294, and if not, exactly which lane or follow-up owns the remaining failures.
Existing structure is insufficient because: targeted green lanes do not prove full-suite readiness, while raw red output without classification does not tell maintainers whether to fix, split, rerun, or update lane baseline artifacts.
Narrowest correct implementation: add one spec-local failure-classification artifact, use existing lane wrappers and support classes, classify all remaining groups, and fix only small CI-signal defects that block classification.
Ownership cost: low to moderate; maintain one temporary classification artifact and any small lane contract correction made during implementation.
Alternative intentionally rejected: a new full-suite framework, broad test rewrite, or permanent new lane. Those options import durable complexity before the existing lane system is proven insufficient.
Release truth: current-release CI/test-governance readiness only

Compatibility posture

This feature assumes a pre-production environment.

Backward compatibility, legacy aliases, route shims, TenantPanelProvider restoration, and compatibility-specific tests are out of scope. Canonical replacement remains preferred over preservation.

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

Test purpose / classification: Heavy-Governance, Feature, Browser, Support/JUnit, and full-suite classification
Validation lane(s): raw full suite, fast-feedback, confidence, heavy-governance, browser, profiling/support when needed, junit/report/artifact publication when needed
Why this classification and these lanes are sufficient: the goal is not one feature behavior. The proving purpose is whether the complete platform suite and existing CI lanes produce a trustworthy pass/fail signal after the known stabilization work.
New or expanded test families: none by default. Any new test must be limited to a small CI/lane contract guard if a wrapper/report/artifact regression is proven.
Fixture / helper cost impact: no new expensive fixture defaults are allowed. If fixture drift appears in the full suite, classify it by failing family and split to follow-up unless a one-line lane/guard baseline is the direct cause.
Heavy-family visibility / justification: explicit. Heavy-governance and browser lanes are signal inputs, not automatic repair ownership.
Special surface test profile: global-context-shell, standard-native-filament, shared-detail-family, browser-smoke, surface-guard, discovery-heavy
Standard-native relief or required special coverage: no UI coverage expansion; browser lane reruns are used only to classify the existing smoke baseline.
Reviewer handoff: reviewers must confirm that Livewire remains v4.0+, Filament remains v5, provider registration stays in apps/platform/bootstrap/providers.php, globally searchable resources are not changed, destructive actions are not changed, no assets are registered, every remaining failure is classified, and any in-scope fix is tied directly to a CI/lane contract defect.
Budget / baseline / trend impact: the classification may update the documented status of budget or trend baseline drift, but it must not silently relax lane budgets or create a new baseline without an explicit row in failure-classification.md.
Escalation needed: document-in-feature for contained lane baseline findings; follow-up-spec for product/runtime failures, fixture-family debt, new heavy cost centers, browser fallout, or any repair that exceeds CI/lane contract correction.
Active feature PR close-out entry: FullSuiteClassification
Planned validation commands:
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && git status --short --branch
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && git diff --stat
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail artisan test --compact)
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane fast-feedback
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane confidence
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane heavy-governance
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane browser
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report fast-feedback
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report confidence
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report heavy-governance
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-report browser
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && ./scripts/platform-test-lane junit
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && (cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent)

User Scenarios & Testing (mandatory)

User Story 1 - Classify the Full Suite Before Any Repair (Priority: P1)

As a maintainer, I want the complete platform suite run or explicit fallback lane split classified before any fixes so the project knows whether CI is green, blocked, flaky, or split into follow-up work.

Why this priority: without classification first, Spec 295 would become an uncontrolled full-suite repair pass.

Independent Test: Run the raw full suite or fallback lane split and prove every failing group has exactly one category, one seam, one owner/follow-up decision, and one status row in failure-classification.md.

Acceptance Scenarios:

Given the repo after Specs 293 and 294, When the raw full suite passes, Then failure-classification.md records ci-signal-restored with the command, date, and pass counts.
Given the raw full suite fails, When the failure groups are reviewed, Then each group is classified before any repair is attempted.
Given a failing group points at /admin/t/..., TenantPanelProvider, or legacy tenant route behavior, When it is classified, Then the remedy must not restore that behavior and must be split or fixed only through current workspace-first truth.

User Story 2 - Validate CI Lane and Artifact Signal (Priority: P1)

As a maintainer, I want each existing CI lane wrapper, report, artifact, and failure class to produce a trustworthy signal so Gitea CI failures can be interpreted without reading raw terminal output first.

Why this priority: a green or red Pest run is not enough if wrapper, report, artifact, budget, or failure-class summaries are stale.

Independent Test: Run the existing lane wrappers and report commands, then verify each lane either passes with complete artifacts or fails with the correct primary failure class.

Acceptance Scenarios:

Given a lane fails because tests fail, When its report summary is generated, Then the primary failure class is test-failure rather than wrapper, artifact, or infrastructure failure.
Given a lane wrapper or manifest no longer resolves to the intended lane, When the lane is classified, Then it is marked ci-wrapper-or-manifest-regression and may be fixed in 295.
Given required report artifacts are missing after a lane run, When publication is checked, Then it is classified as artifact-publication-regression and may be fixed in 295.

User Story 3 - Split Product Failures Instead of Absorbing Them (Priority: P1)

As a maintainer, I want remaining product/runtime failures to become explicit follow-up ownership instead of being silently fixed under a CI-baseline spec.

Why this priority: this protects scope discipline and keeps test-governance decisions attributable.

Independent Test: Review every non-CI failure group and prove it either has a targeted follow-up recommendation or is demonstrably flaky/environmental.

Acceptance Scenarios:

Given a failing group requires a runtime product fix, When classification finishes, Then it is marked follow-up-spec-required or product-runtime-or-test-regression and not repaired under 295 unless the user explicitly starts that implementation scope later.
Given a failing group belongs to browser-only behavior, When classification finishes, Then it is marked browser-lane-regression with the existing smoke file and follow-up path.
Given a failing group disappears on rerun or is environment-specific, When classification finishes, Then it is marked flaky-or-environment with rerun evidence instead of treated as restored CI.

User Story 4 - Publish the Final CI Readiness Decision (Priority: P2)

As a maintainer, I want a final readiness statement that says whether the full suite can be used as a CI baseline now, and what exact follow-up remains if it cannot.

Why this priority: the output must be actionable for future specs and Gitea workflows, not just a local debugging note.

Independent Test: Inspect failure-classification.md, lane report outputs, and final validation commands to confirm there are no unclassified failure groups and no hidden scope expansion.

Acceptance Scenarios:

Given all raw suite and lane signals pass, When close-out is prepared, Then the readiness decision is restored-ci-signal.
Given any group remains red, When close-out is prepared, Then the readiness decision is classified-follow-up-required and each group has an owner/follow-up.
Given a small CI/lane contract fix was applied, When final validation runs, Then the directly affected lane/report/artifact guard passes and unrelated failures remain classified rather than hidden.

Edge Cases

The raw full suite times out or produces output too large to classify directly.
A lane passes tests but fails report or artifact publication.
A lane fails only because budget/trend baselines drifted, not because tests failed.
Browser lane failures expose stale screenshots or environment-specific browser state.
A failure appears to touch Spec 293 or 294 seams but would require reopening retired legacy behavior.
A failure disappears on rerun, suggesting flaky or environment-only behavior.
A small lane manifest fix changes which tests run in a lane, which could accidentally widen CI cost.

Requirements (mandatory)

Constitution alignment (required): This spec introduces no Microsoft Graph calls, no write/change behavior, no long-running application work, and no new OperationRun. It must preserve workspace/tenant isolation expectations while classifying test failures. Any failure suggesting isolation, RBAC, or audit regressions must be classified as product/runtime debt and not hidden as a CI wrapper issue.

Constitution alignment (PROP-001 / ABSTR-001 / PERSIST-001 / STATE-001 / BLOAT-001): The only structural addition is one spec-local failure-classification vocabulary and artifact. It solves the current CI readiness problem after two stabilization specs; no runtime persistence, CI framework, test engine, or new lane abstraction is introduced.

Constitution alignment (TEST-GOV-001): Spec 295 must explicitly classify the proving purpose of every lane run, preserve the existing lane family boundaries, keep expensive fixture/context setup opt-in, and end with one review outcome: keep, split, document-in-feature, follow-up-spec, or reject-or-split.

Functional Requirements

FR-295-001: The implementation MUST run the raw full suite once when feasible using cd apps/platform && ./vendor/bin/sail artisan test --compact.
FR-295-002: If the raw full suite is too slow, noisy, or environment-blocked to classify reliably, the implementation MUST run the explicit fallback lane split: fast-feedback, confidence, heavy-governance, and browser.
FR-295-003: Every failing group MUST be recorded in failure-classification.md with exactly one pinned category, one pinned seam, observed command, candidate owner, fix-in-295 decision, follow-up decision, and status.
FR-295-004: Lane wrapper, report, artifact, budget, and failure-class problems MAY be fixed in 295 only when the failure is clearly isolated to scripts/platform-test-lane, scripts/platform-test-report, scripts/platform-test-artifacts, TestLaneManifest, TestLaneReport, TestLaneBudget, or their guard tests.
FR-295-005: Product/runtime failures MUST NOT be repaired under 295 unless they are also a small, proven CI/lane contract defect; otherwise they must be assigned to a follow-up spec or classified as unrelated existing debt.
FR-295-006: Any failure related to Specs 293 or 294 MUST be classified without rewriting those completed specs or restoring legacy behavior.
FR-295-007: The implementation MUST NOT restore TenantPanelProvider, /admin/t/..., tenant-scoped provider fallback routes, or other retired cutover behavior.
FR-295-008: The implementation MUST validate existing lane failure classes: test-failure, wrapper-failure, budget-breach, artifact-publication-failure, and infrastructure-failure.
FR-295-009: The implementation MUST produce a final CI readiness decision in failure-classification.md: restored-ci-signal, classified-follow-up-required, or blocked-by-environment.
FR-295-010: Any new or changed tests MUST be limited to CI/lane contract proof and must use Pest.

Non-Functional Requirements

NFR-295-001: No new runtime persistence, queue, model, service abstraction, provider registry, Filament resource, or browser family is introduced.
NFR-295-002: Test lane classification must follow actual proving purpose, not file location.
NFR-295-003: Existing lane budget and trend baselines must not be relaxed silently.
NFR-295-004: Classification output must be concise enough for future implementers to route work without re-running the entire suite first.
NFR-295-005: The final package must preserve Filament v5 / Livewire v4 compatibility and must not change panel provider registration.

Key Entities (include if feature involves data)

Failure Group: one failing test file, failing assertion cluster, wrapper error, artifact error, budget breach, or environment failure sharing one cause and one owner.
CI Lane Signal: the pass/fail/report/artifact/budget outcome for one lane in TestLaneManifest.
Classification Decision: the spec-local row assigning one category, seam, owner, fix-in-295 decision, and follow-up path.
Readiness Decision: the final status of the full suite and lane baseline after classification.

Success Criteria (mandatory)

SC-295-001: failure-classification.md exists and contains the pinned category and seam definitions.
SC-295-002: Raw full suite output or fallback lane split output is represented by classified groups with no unclassified red group remaining.
SC-295-003: Existing lane wrappers and report/artifact contracts either pass or have a classified failure class and fix/follow-up decision.
SC-295-004: No implementation step restores TenantPanelProvider, /admin/t/..., or retired tenant-scoped fallback behavior.
SC-295-005: The final readiness decision is explicit and actionable: restored-ci-signal, classified-follow-up-required, or blocked-by-environment.
SC-295-006: If a product/runtime failure remains, the classification identifies a separate follow-up owner instead of treating the full suite as green.

Assumptions

Specs 293 and 294 have completed the targeted stabilization work described by the user and are context only.
The repo's existing Gitea-compatible lane system remains the preferred CI shape.
Local implementation will use Sail-first commands unless a non-Docker fallback is explicitly needed.
Full-suite execution may be expensive; lane split is an allowed fallback only when the raw full suite is not classifiable.

Risks

Full-suite output may be too large or slow to classify directly.
Environment-specific Sail/browser failures may obscure real suite status.
A tempting product fix may be small locally but still outside this CI-baseline scope.
Budget/trend drift may be real but not appropriate to fix by silently raising thresholds.
Multiple failing groups may share a fixture root cause and need careful grouping to avoid duplicate follow-up specs.

Open Questions

None blocking preparation. During implementation, actual failing groups determine whether follow-up specs are needed.

32 KiB Raw Permalink Blame History

Feature Specification: Full Suite Failure Classification & CI Lane Baseline

Spec Candidate Check (mandatory - SPEC-GATE-001)

Review Outcome

Candidate Selection Gate

Pinned Failure-Classification Categories

Pinned CI / Suite Seams

Spec Scope Fields (mandatory)

OperationRun UX Impact (mandatory when the feature creates, queues, deduplicates, resumes, blocks, completes, or deep-links to an OperationRun; otherwise write N/A - no OperationRun start or link semantics touched)

UI / Surface Guardrail Impact (mandatory when operator-facing surfaces are changed; otherwise write N/A)

Decision-First Surface Role (mandatory when operator-facing surfaces are changed)

Audience-Aware Disclosure (mandatory when operator-facing surfaces are changed)

UI/UX Surface Classification (mandatory when operator-facing surfaces are changed)

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Proportionality Review (mandatory when structural complexity is introduced)

Compatibility posture

Testing / Lane / Runtime Impact (mandatory for runtime behavior changes)

User Scenarios & Testing (mandatory)

User Story 1 - Classify the Full Suite Before Any Repair (Priority: P1)

User Story 2 - Validate CI Lane and Artifact Signal (Priority: P1)

User Story 3 - Split Product Failures Instead of Absorbing Them (Priority: P1)

User Story 4 - Publish the Final CI Readiness Decision (Priority: P2)

Edge Cases

Requirements (mandatory)

Functional Requirements

Non-Functional Requirements

Key Entities (include if feature involves data)

Success Criteria (mandatory)

Assumptions

Risks

Open Questions

32 KiB

Raw Permalink Blame History

OperationRun UX Impact (mandatory when the feature creates, queues, deduplicates, resumes, blocks, completes, or deep-links to an `OperationRun`; otherwise write `N/A - no OperationRun start or link semantics touched`)

UI / Surface Guardrail Impact (mandatory when operator-facing surfaces are changed; otherwise write `N/A`)