Main Confidence / confidence (push) Failing after 1m23s

Details

Remove Findings lifecycle backfill operational surface (controls slice) (#280 )

Removes the Findings lifecycle backfill from the Operational Controls UI and OperationalControlCatalog.

This patch is a safe, controls-only change; runbooks, jobs and other runtime artifacts are NOT removed yet. Follow-up work will delete the runbook service/scope, jobs, commands, and update tests.

Files changed:
- apps/platform/app/Filament/System/Pages/Ops/Controls.php
- apps/platform/app/Support/OperationalControls/OperationalControlCatalog.php
- apps/platform/tests/Feature/System/OpsControls/OperationalControlManagementTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #280

2026-04-26 15:43:47 +00:00

23 KiB

Raw Blame History

Implementation Plan: Operational Controls

Branch: 242-operational-controls | Date: 2026-04-26 | Spec: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

Replace the ad-hoc allow_admin_maintenance_actions environment gate with one product-owned operational-control path for the first-slice keys findings.lifecycle.backfill and restore.execute.
Introduce one platform-operated activation record plus one shared evaluator that plugs into the existing system runbook, tenant findings-maintenance, and restore-execution start seams without becoming a generic experimentation platform.
Reuse existing enforcement and UX seams - UiEnforcement, ProviderOperationStartGate, OperationRunService, OperationUxPresenter, ProviderOperationStartResultPresenter, AuditRecorder, WorkspaceAuditLogger, and AuditActionId - so the slice stays small, auditable, and server-side enforced.

Technical Context

Language/Version: PHP 8.4 (Laravel 12)
Primary Dependencies: Laravel 12 + Filament v5 + Livewire v4 + Pest; existing UiEnforcement, ProviderOperationStartGate, OperationRunService, AuditRecorder, WorkspaceAuditLogger, AuditActionId, PlatformCapabilities
Storage: PostgreSQL via existing product tables plus one new platform-operated operational_control_activations table; no tenant-owned control tables
Testing: Pest unit + feature tests only
Validation Lanes: fast-feedback, confidence Target Platform: Sail-backed Laravel admin surfaces under /admin/t/{tenant} and system surfaces under /system Project Type: web
Performance Goals: effective-control resolution remains DB-only and cheap at action start time, adds no outbound HTTP, and blocks in-scope starts before queue or provider execution begins
Constraints: no generic feature-flag platform, no new browser or heavy-governance suite, no break-glass bypass in v1, no parallel env gate for in-scope controls, global pauses win over workspace pauses, preserve 404 vs 403 semantics, keep provider-specific restore behavior out of platform-core control vocabulary
Scale/Scope: 2 control keys, 2 scope levels (global and workspace), 1 system management surface, and 3 concrete enforcement families across 4 touched UI surfaces

UI / Surface Guardrail Plan

Guardrail scope: changed surfaces
Native vs custom classification summary: native Filament + shared start/result primitives
Shared-family relevance: header actions, runbook launch actions, provider-backed start results, audit-backed control changes
State layers in scope: page, detail, action/modal
Handling modes by drift class or surface: review-mandatory
Repository-signal treatment: review-mandatory
Special surface test profiles: standard-native-filament, monitoring-state-page
Required tests or manual smoke: functional-core, state-contract
Exception path and spread control: none; v1 must not allow a second local runtime-control dialect
Active feature PR close-out entry: Guardrail

Shared Pattern & System Fit

Cross-cutting feature marker: yes
Systems touched: App\Filament\System\Pages\Ops\Runbooks, new system ops controls page, App\Filament\Resources\FindingResource\Pages\ListFindings, App\Filament\Resources\RestoreRunResource, App\Support\Rbac\UiEnforcement, App\Services\Providers\ProviderOperationStartGate, App\Support\OpsUx\OperationUxPresenter, App\Support\OpsUx\ProviderOperationStartResultPresenter, App\Services\Audit\AuditRecorder, App\Services\Audit\WorkspaceAuditLogger, App\Support\Audit\AuditActionId
Shared abstractions reused: UiEnforcement, ProviderOperationStartGate, ProviderOperationStartResultPresenter, OperationRunService, OperationUxPresenter, OpsUxBrowserEvents, OperationRunLinks, SystemOperationRunLinks, AuditRecorder, WorkspaceAuditLogger
New abstraction introduced? why?: one bounded OperationalControlCatalog plus one OperationalControlEvaluator are justified because the feature now has two real concrete control keys that must evaluate consistently across system-plane and tenant-plane start paths. No registry lattice, provider strategy system, or customer-facing flag DSL is introduced.
Why the existing abstraction was sufficient or insufficient: existing abstractions already own auth, queue start UX, and audit writing; they are insufficient because none presently carries a reusable runtime-safety decision that can pause an action before it starts, and WorkspaceAuditLogger alone cannot truthfully own global platform-plane mutations.
Bounded deviation / spread control: no deviation is allowed for in-scope controls; every affected surface must route through the shared evaluator rather than direct config(...) reads or page-local booleans.

OperationRun UX Impact

Touches OperationRun start/completion/link UX?: yes
Central contract reused: shared OperationRun start UX plus provider-start result helpers
Delegated UX behaviors: queued toast, Open operation / View run links, run-enqueued browser event, dedupe-or-blocked messaging, and tenant/workspace-safe URL resolution remain on existing shared paths
Surface-owned behavior kept local: initiation inputs, confirmation copy, and control-management forms only
Queued DB-notification policy: unchanged explicit opt-in only
Terminal notification path: existing central lifecycle mechanism for starts that are allowed
Exception path: none

Provider Boundary & Portability Fit

Shared provider/platform boundary touched?: yes
Provider-owned seams: provider-backed restore.execute dispatch, provider binding resolution, provider reason translation, existing restore safety and dry-run behavior
Platform-core seams: operational-control vocabulary, scope/effective-state evaluation, control management surface, audit labels, blocked-state semantics
Neutral platform terms / contracts preserved: operational control, activation, effective state, scope, reason, expiry, blocked execution
Retained provider-specific semantics and why: restore.execute remains Microsoft-specific provider behavior in the current release because the control feature governs only start allowance, not provider execution semantics
Bounded extraction or follow-up path: none in this slice; future catalog growth or provider-neutral expansions require a follow-up spec instead of implicit widening here

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

Read/write separation: PASS - control management is an explicit platform-plane mutation with confirmation, audit, and focused tests; blocked execution paths remain non-mutating except for audit logging.
RBAC-UX: PASS - platform management stays on /system; tenant/admin execution surfaces stay on /admin/t/{tenant}; cross-plane access remains 404; entitled-but-paused users get explicit control feedback while membership and capability failures keep 404/403 semantics.
Workspace isolation / tenant isolation: PASS - workspace-targeted controls apply only within the chosen workspace; tenant surfaces still resolve tenant/workspace entitlement before control-state disclosure.
Run observability / Ops-UX: PASS - allowed starts reuse existing OperationRun paths; blocked starts create no run and no new lifecycle dialect; later control activation does not retroactively mutate already accepted runs; shared start/result helpers remain authoritative.
Shared path reuse / XCUT-001: PASS - the design extends existing UI enforcement, provider-start gating, audit logging, and operation start UX instead of introducing page-local flags.
Provider boundary / PROV-001: PASS - control language stays provider-neutral while restore execution remains provider-owned.
Proportionality / PROP-001 and ABSTR-001: PASS - the only new structure is justified by two current-release controls and three existing enforcement surfaces; no experimentation platform or generalized remote-config system is planned.
Persisted truth / PERSIST-001: PASS - active control activations represent independent runtime-safety truth with their own scope, reason, expiry, and audit obligations; convenience UI state remains derived.
Behavioral state / STATE-001: PASS - paused/enabled semantics change whether execution may start and therefore justify one bounded effective-state model.
Filament-native UI / UI-FIL-001: PASS - all touched surfaces remain native Filament pages/resources/actions; no custom UI framework is introduced.
Global search rule: N/A - no new globally searchable resource is added.
Panel/provider registration: PASS - Filament v5 remains on Livewire v4 and no new panel/provider registration is required; Laravel 12 provider registration stays in bootstrap/providers.php if any provider change becomes necessary.
Test governance / TEST-GOV-001: PASS - proof stays in focused unit and feature lanes with no browser or heavy-governance expansion.

Test Governance Check

Test purpose / classification by changed surface: Unit for catalog/evaluator/scope precedence/expiry logic; Feature for system control management, runbook enforcement, findings header-action enforcement, restore-execution enforcement, audit logging, and 404/403 semantics
Affected validation lanes: fast-feedback, confidence
Why this lane mix is the narrowest sufficient proof: the business truth is server-side effective-state resolution plus enforcement at existing Filament and service seams. Browser tests would duplicate modal choreography without proving additional runtime safety truth.
Narrowest proving command(s):
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php
- export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php
Fixture / helper / factory / seed / context cost risks: add one local factory for active control activations plus platform-user and workspace-scoped setup helpers reused only by operational-control tests; avoid new shared browser or provider-fixture defaults
Expensive defaults or shared helper growth introduced?: no; control fixtures stay opt-in and local to the new test family
Heavy-family additions, promotions, or visibility changes: none
Surface-class relief / special coverage rule: standard-native-filament and monitoring-state-page relief are sufficient; assert disabled/blocked behavior and no side effects instead of browser-only choreography
Closing validation and reviewer handoff: reviewers should rerun the targeted unit/feature commands, verify the env gate is removed from the in-scope findings action, confirm restore execution is blocked before queue/provider start, confirm blocked-execution audit entries exist for runbook/findings/restore paths, confirm global control changes audit without false workspace ownership, confirm /system/ops/controls returns 403 for system users missing platform.ops.controls.manage, and confirm non-members still receive 404 while missing capabilities still receive 403 with the existing capability-denied UX rather than paused-state helper text
Budget / baseline / trend follow-up: low-to-moderate increase in focused unit/feature coverage only
Review-stop questions: did implementation add a second control persistence shape, leave the env gate in place, introduce a local blocked-state dialect, or widen into browser/heavy-governance lanes?
Escalation path: reject-or-split if the implementation widens into generic feature-flagging or customer-managed controls; document-in-feature for small shared-helper extensions that remain local to this slice
Active feature PR close-out entry: Guardrail
Why no dedicated follow-up spec is needed: the planned new model, evaluator, and tests stay local to the first-slice control family; recurring growth beyond the two bounded control keys would require its own follow-up spec

Project Structure

Documentation (this feature)

specs/242-operational-controls/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── checklists/
│   └── requirements.md
├── contracts/
│   └── operational-controls.contract.yaml
└── tasks.md

Source Code (repository root)

apps/platform/
├── app/
│   ├── Filament/System/Pages/Ops/
│   │   ├── Controls.php
│   │   └── Runbooks.php
│   ├── Filament/Resources/FindingResource/Pages/ListFindings.php
│   ├── Filament/Resources/RestoreRunResource.php
│   ├── Models/
│   │   └── OperationalControlActivation.php
│   ├── Services/Audit/AuditRecorder.php
│   ├── Services/Audit/WorkspaceAuditLogger.php
│   ├── Services/Providers/ProviderOperationStartGate.php
│   ├── Support/Audit/AuditActionId.php
│   ├── Support/Auth/PlatformCapabilities.php
│   └── Support/OperationalControls/
│       ├── OperationalControlCatalog.php
│       ├── OperationalControlDecision.php
│       └── OperationalControlEvaluator.php
├── database/
│   ├── factories/
│   │   └── OperationalControlActivationFactory.php
│   └── migrations/
│       └── *_create_operational_control_activations_table.php
└── tests/
    ├── Feature/
    │   ├── Findings/OperationalControlFindingsBackfillGateTest.php
    │   ├── OperationalControls/
    │   │   ├── NoAdHocOperationalControlBypassTest.php
    │   │   └── OperationalControlAuthorizationSemanticsTest.php
    │   ├── Restore/OperationalControlRestoreExecutionGateTest.php
    │   ├── System/OpsControls/OperationalControlManagementTest.php
    │   └── System/OpsRunbooks/OperationalControlRunbookGateTest.php
    └── Unit/Support/OperationalControls/
      ├── OperationalControlCatalogTest.php
      ├── OperationalControlEvaluatorTest.php
      └── OperationalControlScopeResolutionTest.php

Structure Decision: Single Laravel web application. The feature adds one bounded platform-operated model and one small support namespace for operational-control evaluation, then plugs that into existing system and tenant Filament surfaces.

Complexity Tracking

No unapproved constitution violations are required. The only new persistence and abstraction are the justified control-activation record plus evaluator/catalog pair described below.

Proportionality Review

Current operator problem: founders and platform operators need a safe runtime way to pause already-existing risky actions without editing environment variables or relying on inconsistent per-surface logic.
Existing structure is insufficient because: UiEnforcement decides RBAC, ProviderOperationStartGate decides provider readiness, and env flags decide hidden page-local runtime behavior. None of those alone gives one auditable runtime-safety truth across both system and tenant surfaces.
Narrowest correct implementation: persist only explicit active control activations, derive the enabled state from absence of an activation, evaluate one effective decision through a shared catalog/evaluator, and wire that into the three concrete existing start paths.
Ownership cost created: one new table/model/factory, one small support namespace, one system page, new audit action IDs and capability constants, and focused unit/feature coverage.
Alternative intentionally rejected: keep env/config flags, reuse workspace settings, or build a generalized feature-flag system. Env/config flags are invisible product truth, workspace settings do not cleanly represent one global control truth, and a generic flag platform is far too broad.
Release truth: current-release truth

Phase 0 — Research (output: `research.md`)

See: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/research.md

Goals:

Confirm the narrowest persistence shape for runtime-safety truth and explicitly reject env-only or workspace-settings-only alternatives.
Confirm the smallest shared seam where control evaluation belongs for system runbooks, tenant findings lifecycle backfill, and provider-backed restore execution.
Define v1 scoping, global-first precedence, expiry, and audit expectations without inventing a generic flag taxonomy.
Document the v1 decision that break-glass and broad platform capabilities do not bypass an active operational control.

Phase 1 — Design & Contracts (outputs: `data-model.md`, `contracts/`, `quickstart.md`)

See:

/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/data-model.md
/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/contracts/operational-controls.contract.yaml
/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/quickstart.md

Design focus:

Add one platform-operated activation record that can pause a control globally or for one workspace, with optional expiry, auditable reason, global-first precedence, and partial unique indexes that enforce one active global row per control and one active workspace row per control/workspace pair; the write path deletes expired conflicting rows before inserting a new activation, and this table is not used as an archive.
Add one new system ops controls page that lists the two bounded control keys, their effective state, scope, owner, expiry, change actions, and on-demand audit history links, and uses a staged scope-impact preview before control mutations are confirmed.
Use OperationalControlDecision as the shared control-state presentation primitive for controls, runbooks, findings, and restore surfaces.
Route findings.lifecycle.backfill through the new evaluator in both ListFindings and Runbooks, removing the existing env gate.
Route findings.lifecycle.backfill through FindingsLifecycleBackfillRunbookService::start() so the system runbooks page, tenant findings page, CLI command, and deploy-hook command all honor the same control decision.
Route restore.execute through the same evaluator before provider-backed or non-provider-backed queued restore execution is created.
Add dedicated audit action IDs and a dedicated platform capability for control management, using AuditRecorder for global control changes and blocked system-plane all-tenant attempts, and WorkspaceAuditLogger for workspace/tenant-scoped changes and blocked-execution evidence with concrete scope.
Keep blocked-state messaging on existing shared start/result helpers and avoid custom control-state UI frameworks.

Phase 1 — Agent Context Update

After Phase 1 artifacts are generated, update Copilot context from the plan:

/Users/ahmeddarrazi/Documents/projects/wt-plattform/.specify/scripts/bash/update-agent-context.sh copilot

Phase 2 — Implementation Outline (tasks created in `/speckit.tasks`)

Add the operational_control_activations persistence, model, and local factory for active pause records.
Introduce the bounded operational-controls support namespace (OperationalControlCatalog, OperationalControlDecision, OperationalControlEvaluator) and keep enabled-state derived from active rows.
Add the dedicated controls-manage capability and its local grant path in the seeded platform operator setup.
Add the system-plane controls page and wire it into the existing system ops navigation with staged preview-plus-confirm pause/resume actions, audit logging, and on-demand audit history links.
Replace the findings env gate with evaluator-driven control checks on the tenant findings header action and the system runbooks start path.
Integrate the same evaluator into restore execution before any queued execution OperationRun, queued execution RestoreRun, queue dispatch, or provider-backed execution starts.
Add focused unit and feature tests, plus a guard test that blocks new ad-hoc runtime-control bypasses for in-scope controls and one proving path that activating a control does not rewrite previously accepted runs.

Constitution Check (Post-Design)

Re-check target: PASS. The post-design shape must still use one bounded control catalog, one active-row persistence model, one evaluator, existing auth/start/audit helpers, and no second runtime-control dialect.

Implementation Close-out

Delivered the bounded operational-controls slice end-to-end: one operational_control_activations truth model, one catalog/evaluator/decision support path, a new /system/ops/controls management page, findings lifecycle enforcement through FindingsLifecycleBackfillRunbookService::start(), and restore execution blocking before any queued execution OperationRun, queued execution RestoreRun, job dispatch, or provider-backed start.
Runtime cleanup landed with the in-scope findings env gate removed from config/tenantpilot.php, a source-scanning guard against ad-hoc bypasses, and workspace-isolation proof showing a workspace-scoped pause blocks only the targeted workspace while a second workspace remains unaffected.
Validation passed on the narrow feature lane: export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php tests/Feature/Filament/Spec113/AdminFindingsNoMaintenanceActionsTest.php tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php with 20 passed (253 assertions).
Formatting passed with export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent.
Manual smoke passed in the integrated browser: the staged pause/resume flow on /system/ops/controls for Findings lifecycle backfill rendered scope-impact previews, applied the global pause, and returned to Enabled inside the SC-001 budget after bringing the local database up to date.

23 KiB Raw Blame History