TenantAtlas/specs/210-ci-matrix-budget-enforcement/research.md
ahmido bf38ec1780
Some checks failed
Main Confidence / confidence (push) Failing after 3m36s
Spec 210: implement CI test matrix budget enforcement (#243)
## Summary
- add explicit Gitea workflow files for PR Fast Feedback, `dev` Confidence, Heavy Governance, and Browser lanes
- extend the repo-truth lane support seams with workflow profiles, trigger-aware budget enforcement, artifact publication contracts, CI summaries, and failure classification
- add deterministic artifact staging, new CI governance guard coverage, and Spec 210 planning/contracts/docs updates

## Validation
- `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`
- `cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Guards/CiFastFeedbackWorkflowContractTest.php tests/Feature/Guards/CiConfidenceWorkflowContractTest.php tests/Feature/Guards/CiHeavyBrowserWorkflowContractTest.php tests/Feature/Guards/CiLaneFailureClassificationContractTest.php tests/Feature/Guards/FastFeedbackLaneContractTest.php tests/Feature/Guards/ConfidenceLaneContractTest.php tests/Feature/Guards/HeavyGovernanceLaneContractTest.php tests/Feature/Guards/BrowserLaneIsolationTest.php tests/Feature/Guards/FixtureLaneImpactBudgetTest.php tests/Feature/Guards/TestLaneManifestTest.php tests/Feature/Guards/TestLaneArtifactsContractTest.php tests/Feature/Guards/TestLaneCommandContractTest.php`
- `./scripts/platform-test-lane fast-feedback`
- `./scripts/platform-test-lane confidence`
- `./scripts/platform-test-lane heavy-governance`
- `./scripts/platform-test-lane browser`
- `./scripts/platform-test-report fast-feedback`
- `./scripts/platform-test-report confidence`

## Notes
- scheduled Heavy Governance and Browser workflows stay gated behind `TENANTATLAS_ENABLE_HEAVY_GOVERNANCE_SCHEDULE=1` and `TENANTATLAS_ENABLE_BROWSER_SCHEDULE=1`
- the remaining rollout evidence task is capturing the live Gitea run set this PR enables: PR Fast Feedback, `dev` Confidence, manual and scheduled Heavy Governance, and manual and scheduled Browser runs

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #243
2026-04-17 18:04:35 +00:00

6.9 KiB

Research: CI Test Matrix & Runtime Budget Enforcement

Decision 1: Use explicit workflow files per trigger class

  • Decision: Implement four explicit Gitea workflow files under .gitea/workflows/: one for pull request Fast Feedback, one for dev push Confidence, one for Heavy Governance, and one for Browser.
  • Rationale: The repo currently has no checked-in CI workflows, and Gitea Actions ignores or limits several GitHub workflow features that would otherwise encourage a single dynamic matrix file (concurrency, continue-on-error, timeout-minutes, complex multi-label runs-on, problem matchers, and annotation-heavy UX). Separate workflows keep trigger policy, lane ownership, and failure semantics legible.
  • Alternatives considered:
    • A single GitHub-style matrix workflow with conditional jobs: rejected because it depends on more workflow indirection and weaker Gitea compatibility.
    • A reusable workflow plus caller stubs: rejected because the repo does not yet need that abstraction and the initial rollout benefits from explicit files.

Decision 2: Treat dev as the mainline confidence branch

  • Decision: Use push to dev as the mainline Confidence trigger and keep pull request validation limited to Fast Feedback by default.
  • Rationale: Repository process already uses dev as the integration branch. Gitea pull request refs point to refs/pull/:number/head, not a merge-preview ref, so the pull request path should stay fast and deterministic while dev push Confidence validates integrated state.
  • Alternatives considered:
    • Run Confidence on every pull request: rejected because it would widen the highest-frequency path and duplicate the broader validation already expected on integration.
    • Run Confidence on every branch push: rejected because it would over-trigger broad validation on work-in-progress branches and dilute the meaning of mainline evidence.

Decision 3: Keep repo wrappers and TestLaneManifest as the execution source of truth

  • Decision: Make scripts/platform-test-lane and scripts/platform-test-report the only CI execution entry points and extend TestLaneManifest, TestLaneBudget, and TestLaneReport for CI policy metadata.
  • Rationale: The repo already owns lane membership, command refs, budgets, report building, and artifact paths in checked-in support code. CI should consume that truth rather than rebuilding lane selection or budget logic in YAML.
  • Alternatives considered:
    • Inline sail composer run ... commands in workflow files: rejected because it would duplicate repo truth and invite lane drift.
    • A second CI-only manifest file: rejected because it would create two governance sources for the same lane model.

Decision 4: Publish JUnit from the existing lane run, not a second CI rerun

  • Decision: Use the JUnit XML already emitted by the governing lane run, especially for Confidence, instead of adding a second dedicated junit CI execution by default.
  • Rationale: Fast Feedback, Confidence, Browser, and Heavy Governance already write JUnit XML alongside summary and budget artifacts. Re-running the same non-browser scope only for machine-readable output would inflate CI cost without increasing trust.
  • Alternatives considered:
    • Always run the dedicated junit support lane in CI: rejected because it duplicates Confidence-shaped scope and adds avoidable runtime.
    • Publish terminal output only: rejected because Spec 210 requires machine-readable artifacts as part of the CI contract.

Decision 5: Stage per-lane artifacts into a CI export directory before upload

  • Decision: Add one narrow repo-root helper, scripts/platform-test-artifacts, to copy lane-local *-latest.* outputs into a deterministic per-run export directory with stable upload names.
  • Rationale: The local artifact contract intentionally uses lane-latest.* files under apps/platform/storage/logs/test-lanes. CI upload needs a per-run directory and naming scheme that remains comparable and lane-specific without changing the local contract or duplicating file-copy logic in every workflow.
  • Alternatives considered:
    • Upload the raw *-latest.* files directly: rejected because upload naming would stay ambiguous and harder to compare across runs.
    • Persist artifacts into a database table: rejected because the feature is repository CI governance, not product persistence.

Decision 6: Make budget enforcement trigger-aware and tolerance-aware

  • Decision: Add trigger-aware budget enforcement profiles that derive from the existing lane budgets or heavy-governance contract and apply a documented CI variance allowance before classifying outcomes as hard-fail, soft-warn, or trend-only.
  • Rationale: Existing budgets are all warn today and were measured in local or pre-CI conditions. Fast Feedback can become blocking only if CI runner noise is accounted for explicitly; Confidence, Heavy Governance, and Browser need softer rollout semantics until their CI baselines are proven stable.
  • Alternatives considered:
    • Hard-fail every budget from day one: rejected because it would overreact to runner variance and immature heavy-lane baselines.
    • Keep all budget results as warnings forever: rejected because it would fail to institutionalize the new governance model.

Decision 7: Encode failure classes in repo-produced artifacts, not UI annotations

  • Decision: Emit a single primary failure classification per non-success run through repo-produced summary and JSON artifacts, distinguishing test failure, wrapper or manifest failure, budget breach, artifact publication failure, and infrastructure failure.
  • Rationale: Gitea ignores GitHub problem matchers and annotation-centric UX, so the failure contract must be legible through uploaded artifacts and ordinary job logs. This also keeps the classification logic versioned in the repo.
  • Alternatives considered:
    • Depend on GitHub-style annotations: rejected because Gitea ignores them.
    • Treat every non-success as a generic failed run: rejected because Spec 210 requires failure classes to remain distinguishable.

Decision 8: Keep profiling out of the initial CI matrix and schedule heavy lanes separately

  • Decision: Leave profiling as a manual or follow-up support lane, while Heavy Governance and Browser get their own manual plus scheduled workflows rather than sharing PR or mainline triggers.
  • Rationale: Profiling exists to explain drift, not to gate ordinary contributor flow. Heavy Governance and Browser are important but intentionally expensive, so they should be visible and reproducible in CI without silently widening the fast path.
  • Alternatives considered:
    • Add profiling to the first CI rollout: rejected because it adds extra cost without being part of the core acceptance path for Spec 210.
    • Fold Heavy Governance or Browser into PR or dev workflows: rejected because it would reintroduce the very cost drift that Specs 206 through 209 separated.