Quickstart: Heavy Governance Lane Cost Reduction

Goal

Stabilize the heavy-governance lane so its dominant costs are visible, intentionally sliced, and either brought back within the authoritative heavy-lane budget, which starts at 300s before normalization, or consciously recalibrated with evidence.

Current Outcome

The latest honest rerun ends in explicit recalibration rather than recovery.

Signal	Current value	Meaning
Final wall clock	`329.305382s`	Current heavy-governance lane runtime after the slimming pass
Final authoritative threshold	`330s`	Normalized threshold used consistently by summary, budget, and report artifacts
Outcome	`recalibrated`	The lane no longer has dual active thresholds, but it still needs a slightly higher honest contract
Baseline delta	`+11.008420s` (`+3.458905%`)	Current rerun versus the preserved pre-slimming baseline
Legacy drift signal	`200s`	Preserved as historical detailed-budget evidence only
Pre-normalization summary threshold	`300s`	Preserved as the rollout acceptance contract before normalization

The final reconciled rationale is: workflow-heavy duplication was reduced, but the settled lane still retains intentional surface-guard depth plus the workspace settings residual helper cost, so the contract is now 330s.

Implementation Order

Capture a fresh heavy-governance baseline through the existing lane wrappers and preserve the current summary, report, and budget artifacts. The checked-in wrappers now support --capture-baseline for heavy-governance baseline copies.
Build or refresh the hotspot inventory for the current top 5 families by runtime, or enough families to explain at least 80% of lane runtime, whichever set is larger.
Decompose the primary ui-workflow hotspots first: baseline-profile-start-surfaces, findings-workflow-surfaces, and finding-bulk-actions-workflow.
Decide per family whether the right move is split, centralize repeated work, trim duplicate assertions, or retain as intentionally heavy.
Audit second-wave surface-guard families such as action-surface-contract and ops-ux-governance only after the workflow-heavy hotspots are understood.
Extend or adjust manifest and report seams so decomposition, residual causes, and the final budget outcome remain visible.
Normalize the heavy-governance budget contract so the authoritative pre-normalization 300s summary threshold and the legacy 200s budget-target evaluation describe one intentional rule after the honest lane shape is established.
Rerun the focused hotspot packs and the full heavy-governance lane.
Record the final outcome as budget recovery or explicit recalibration and add short reviewer guidance for future heavy tests.

Suggested Code Touches

apps/platform/tests/Support/TestLaneBudget.php
apps/platform/tests/Support/TestLaneManifest.php
apps/platform/tests/Support/TestLaneReport.php
apps/platform/tests/Feature/Baselines/*
apps/platform/tests/Feature/Filament/BaselineActionAuthorizationTest.php
apps/platform/tests/Feature/Filament/BaselineProfileCaptureStartSurfaceTest.php
apps/platform/tests/Feature/Filament/BaselineProfileCompareStartSurfaceTest.php
apps/platform/tests/Feature/Findings/*
apps/platform/tests/Feature/Guards/ActionSurfaceContractTest.php
apps/platform/tests/Feature/OpsUx/*
apps/platform/tests/Feature/SettingsFoundation/WorkspaceSettingsManageTest.php
scripts/platform-test-lane
scripts/platform-test-report

Validation Flow

Use the existing checked-in lane wrappers first:

./scripts/platform-test-report heavy-governance --capture-baseline
./scripts/platform-test-lane heavy-governance --capture-baseline
./scripts/platform-test-report heavy-governance
./scripts/platform-test-lane heavy-governance
./scripts/platform-test-report heavy-governance
cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent

Keep the implementation loop tight with the most relevant focused suites before rerunning the whole lane:

cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament --filter=BaselineProfileCaptureStartSurfaceTest
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament --filter=BaselineProfileCompareStartSurfaceTest
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament --filter=BaselineActionAuthorizationTest
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Findings --filter=FindingBulkActionsTest
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Findings --filter=FindingWorkflow
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Guards --filter=ActionSurfaceContractTest
cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/OpsUx

Current Baseline

Use the checked-in heavy-governance artifacts under apps/platform/storage/logs/test-lanes as the starting point.

Signal	Current value	Planning note
Lane wall clock	`318.296962s`	Current measured overrun
Lane summary threshold	`300s`	Authoritative pre-normalization contract for Spec 209 acceptance
Budget target evaluation threshold	`200s`	Legacy drift evidence that must remain visible until the contract is normalized
`ui-workflow` total	`190.606431s`	Dominant class; first slimming target
`surface-guard` total	`106.845887s`	Second-wave analysis target
`discovery-heavy` total	`0.863003s`	Already bounded; not the main cost problem

Current Canonical Inventory

The canonical inventory now covers six families because the top five alone do not clear the required 80% runtime threshold.

Family	Baseline measured time	Current status	Driver
`baseline-profile-start-surfaces`	`98.112193s`	`slimmed`	workflow-heavy
`action-surface-contract`	`40.841552s`	`retained`	intentionally-heavy
`ops-ux-governance`	`38.794861s`	`retained`	intentionally-heavy
`findings-workflow-surfaces`	`36.459493s`	`slimmed`	workflow-heavy
`finding-bulk-actions-workflow`	`26.491446s`	`slimmed`	redundant
`workspace-settings-slice-management`	`21.740839s`	`follow-up`	helper-driven

Together these six families explain 263.617244s, or 80.052516%, of the latest heavy-governance runtime.

Latest Rerun Hotspots

Family	Latest measured time	Current intent
`baseline-profile-start-surfaces`	`101.895415s`	Still dominant after slimming; trust retained
`action-surface-contract`	`38.323501s`	Intentionally heavy and retained
`ops-ux-governance`	`36.497049s`	Intentionally heavy and retained
`findings-workflow-surfaces`	`35.990272s`	Slimmed, but still a meaningful workflow-heavy slice
`finding-bulk-actions-workflow`	`30.145259s`	Slimmed fixture fan-out, still a top single test family
`workspace-settings-slice-management`	`20.765748s`	Recorded as explicit follow-up debt

Decomposition Checklist

For each primary hotspot family, answer these questions before changing file structure:

What governance trust does this family deliver?
What breadth is genuinely required for that trust?
Which repeated work sources dominate runtime?
Is the main cost family-breadth, helper-driven setup, or fixture-driven setup?
Is the correct fix a split, a centralization, a duplicate-assertion trim, or intentional retention?
What focused tests and lane reruns prove the change did not hollow out governance trust?

Reviewer Guidance Targets

The implementation should leave behind short rules that cover:

When a new heavy family is justified.
When a test should join an existing heavy family instead.
When discovery, workflow, and surface trust must be separated.
When a family should stay intentionally heavy.
When a helper or fixture cost must be recorded as residual debt instead of disguised as family improvement.

The canonical reviewer rules now live in TestLaneManifest::heavyGovernanceAuthorGuidance() and are:

heavy-family-reuse-before-creation
heavy-family-create-only-for-new-trust
split-discovery-workflow-surface-concerns
retain-intentional-heavy-depth-explicitly
record-helper-or-fixture-residuals

Exit Criteria

The heavy-governance budget contract is normalized to one authoritative threshold, and the summary, budget, and report artifacts do not disagree about it.
The primary hotspot families have decomposition records and explicit slimming decisions.
The heavy-governance lane has fresh before and after evidence in the standard artifact paths, including inventory coverage for the top 5 families or at least 80% of runtime, whichever is larger.
The final outcome is explicit: recovered within the authoritative threshold for the rollout or consciously recalibrated.
Reviewer guidance exists for future heavy-family authoring.

9.0 KiB Raw Blame History