TenantAtlas/specs/206-test-suite-governance/research.md

# Research: Test Suite Governance & Performance Foundation

## Decision 1: Use Sail-wrapped `artisan test` as the canonical lane runner

- Decision: Checked-in execution paths should be expressed as Sail-compatible repository commands that ultimately run through Laravel's `artisan test` entry point.
- Rationale: Repo rules already require Sail-first execution for PHP and Artisan commands, and the current stack exposes parallel and profiling support through Laravel's test runner. Reusing the same runner keeps output, environment boot, and contributor ergonomics aligned with existing conventions.
- Alternatives considered:
  - Raw `vendor/bin/pest` as the primary interface: rejected because it would bypass the repo's standard Sail/Artisan workflow and fragment contributor guidance.
  - Repo-root JavaScript scripts: rejected because PHP test orchestration in this repo currently lives in `apps/platform`, not in the root package manager flow.
  - CI-only wrappers: rejected because the spec is explicitly about local defaults and shared authoring discipline, not just future CI.

## Decision 2: Build lane membership from a hybrid of suites, directories, files, and curated groups

- Decision: The lane model should combine existing PHPUnit suites (`Unit`, `Feature`, `Browser`, `Pgsql`), directory selectors, explicit file selectors, and curated Pest groups instead of forcing a one-time folder reorganization.
- Rationale: The repository already has clear suite directories, natural browser separation, and domain-heavy subfolders, but existing Pest grouping is sparse. A hybrid manifest lets the repo separate lanes immediately while leaving broad classification cleanup for follow-up slices.
- Alternatives considered:
  - Full directory reorganization before lane rollout: rejected because it adds high churn and delays the governance layer the spec is meant to establish.
  - Group-only lane control: rejected because only a small part of the suite currently uses groups, so the immediate payoff would be too limited.
  - Suite-only lane control: rejected because `Feature` is too large and heterogeneous to stand in for both fast-feedback and confidence lanes.

## Decision 3: Keep fast-feedback and confidence lanes parallel, but require serial profiling

- Decision: Fast-feedback and confidence lanes should default to parallel execution, while the profiling lane must remain serial.
- Rationale: Laravel's test runner supports `--parallel`, which directly addresses wall-clock time for the most common contributor flows. At the same time, the local Pest runtime explicitly rejects `--profile` when `--parallel` is present, so profiling must be its own serial command path.
- Alternatives considered:
  - Keep all lanes serial for simplicity: rejected because it would ignore an officially supported performance lever and preserve the current feedback bottleneck.
  - Run profiling in parallel anyway: rejected because the local Pest runtime forbids the combination.
  - Make every lane parallel including browser by default: rejected for the first slice because browser cost and environment contention deserve their own governance boundary.

## Decision 4: Keep `RefreshDatabase` as the first-slice reset default and optimize fixture cost first

- Decision: The first slice should leave the global `RefreshDatabase` bootstrap in place for `tests/Feature` and `tests/Browser`, while helper and factory defaults are slimmed before any attempt at suite-wide reset-strategy changes.
- Rationale: `tests/Pest.php` currently applies `RefreshDatabase` across the broadest and most expensive suites, and Laravel documents heavier full-reset strategies as slower than `RefreshDatabase`. The more urgent current cost driver is unnecessary fixture work layered on top of the reset strategy.
- Alternatives considered:
  - Replace `RefreshDatabase` with heavier full-reset traits: rejected because Laravel documents those as significantly slower and they would worsen the near-term problem.
  - Remove the global reset default immediately: rejected because the suite is too large and would likely break in noisy, low-signal ways before cheap defaults are fixed.
  - Ignore DB strategy entirely: rejected because the spec explicitly requires written guidance for DB-backed tests, seeds, and later schema-baseline evaluation.

## Decision 5: Split the dominant shared tenant-user helper into minimal and explicit heavy profiles

- Decision: `createUserWithTenant()` should gain a truly cheap default profile and an explicitly named heavy or provider-enabled path, instead of continuing to provision provider connection state by default.
- Rationale: The helper is referenced by roughly 607 test files and currently creates a user, tenant, workspace, workspace membership, session context, tenant membership link, capability cache clears, and a default Microsoft provider connection unless told otherwise. That makes it the highest-leverage fixture seam in the current suite.
- Alternatives considered:
  - Keep the current helper and rely on optional boolean flags: rejected because the expensive behavior remains the default and the cost stays visually hidden.
  - Inline setup in every test: rejected because that would duplicate fixture logic and reduce consistency.
  - Introduce a generic fixture framework first: rejected because PROP-001 and ABSTR-001 favor the smallest targeted change that fixes the current cost hotspot.

## Decision 6: Introduce explicit minimal and heavy factory states for cascading graphs

- Decision: At least one additional factory cluster beyond the shared tenant-user helper must expose documented minimal and heavy states, with the minimal state becoming the preferred default for new tests.
- Rationale: The suite already uses stateful factories heavily, and expensive object graphs often hide behind convenient defaults. The governance model needs one concrete factory discipline seam in addition to the shared helper split so new tests stop inheriting unnecessary relationships by accident.
- Alternatives considered:
  - Leave factory defaults untouched and only optimize helpers: rejected because helpers are not the only source of hidden cost.
  - Rewrite all factories at once: rejected because it is too broad for the first slice and would create review noise.
  - Document factory guidance without changing any states: rejected because the spec requires at least the largest known setup paths to gain minimal modes.

## Decision 7: Standardize report artifacts under `apps/platform/storage/logs/test-lanes`

- Decision: Lane reports, JUnit XML, slow-test summaries, and budget evaluations should be written under `apps/platform/storage/logs/test-lanes`.
- Rationale: Existing ad-hoc test logs already live under `apps/platform/storage/logs`, and `scripts/platform-sail` changes the working directory into `apps/platform` before execution. Keeping artifacts under the app's existing log tree avoids new repo roots, matches current workflow, and keeps path handling predictable.
- Alternatives considered:
  - Repo-root `tmp/` or custom top-level directories: rejected because repo guidance discourages new base folders and because the wrapper already anchors execution inside `apps/platform`.
  - CI-only artifact paths: rejected because the spec requires everyday local visibility.
  - Writing artifacts into committed spec directories: rejected because runtime artifacts are ephemeral execution outputs, not planning artifacts.

## Decision 8: Treat schema-baseline adoption as a follow-up evaluation, not a first-slice requirement

- Decision: The first slice should document when a prebuilt schema baseline may help but defer actual schema-dump adoption to a later hardening step.
- Rationale: The default test configuration uses in-memory SQLite, where migration history cost behaves differently than persistent database setups. The PostgreSQL configuration is currently isolated to a single suite, so schema-baseline work is not the highest-leverage first move.
- Alternatives considered:
  - Adopt schema dumps immediately: rejected because the current default runner is not primarily constrained by the dedicated PostgreSQL path.
  - Rule out schema dumps entirely: rejected because Laravel documents them as relevant to growing suites and the spec explicitly calls for guidance.
  - Expand PostgreSQL usage first and then revisit: rejected because that would increase cost before governance improves the current defaults.

## Decision 9: Initial heavy lanes should start with obviously separable families, then harden through profiling

- Decision: The first heavy selectors should start with the already separate browser suite plus clearly broad scan or governance families such as architecture, deprecation, and high-fan-out guard or smoke clusters, then tighten based on measured profiling output.
- Rationale: The repository already isolates browser tests by directory, and some governance-oriented families are visibly broader than ordinary authoring loops. Starting from obvious boundaries gives the first slice immediate value without pretending the full heavy inventory is already perfect.
- Alternatives considered:
  - Reclassify every current file before lane rollout: rejected because the spec explicitly avoids a full-suite rewrite in one pass.
  - Guess heavy families only from intuition: rejected because the spec requires slow-test observability and measurable drift control.
  - Treat only browser as heavy: rejected because broad discovery, smoke, and guard suites can also dominate wall-clock time.

## Decision 10: Initial budgets should start as documented report thresholds, not immediate hard CI gates

- Decision: The first runtime budgets should be documented and emitted in machine-readable reports before they become hard CI failure conditions.
- Rationale: The repo needs shared visibility and stable baselines before hard-failing contributors on performance regressions. This keeps the first slice actionable while leaving room for later CI enforcement.
- Alternatives considered:
  - Hard-fail budgets immediately: rejected because current lane baselines are not yet stabilized and would create noisy adoption risk.
  - Avoid budgets until CI exists: rejected because the spec explicitly requires runtime budgets and early regression visibility.
  - Use purely narrative budgets with no report output: rejected because that would not create measurable governance.