TenantAtlas/specs/242-operational-controls/plan.md
ahmido d96abc65fb
Some checks failed
Main Confidence / confidence (push) Failing after 1m23s
Remove Findings lifecycle backfill operational surface (controls slice) (#280)
Removes the Findings lifecycle backfill from the Operational Controls UI and OperationalControlCatalog.

This patch is a safe, controls-only change; runbooks, jobs and other runtime artifacts are NOT removed yet. Follow-up work will delete the runbook service/scope, jobs, commands, and update tests.

Files changed:
- apps/platform/app/Filament/System/Pages/Ops/Controls.php
- apps/platform/app/Support/OperationalControls/OperationalControlCatalog.php
- apps/platform/tests/Feature/System/OpsControls/OperationalControlManagementTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #280
2026-04-26 15:43:47 +00:00

23 KiB

Implementation Plan: Operational Controls

Branch: 242-operational-controls | Date: 2026-04-26 | Spec: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md Input: Feature specification from /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md

Note: This template is filled in by the /speckit.plan command. See .specify/scripts/ for helper scripts.

Summary

  • Replace the ad-hoc allow_admin_maintenance_actions environment gate with one product-owned operational-control path for the first-slice keys findings.lifecycle.backfill and restore.execute.
  • Introduce one platform-operated activation record plus one shared evaluator that plugs into the existing system runbook, tenant findings-maintenance, and restore-execution start seams without becoming a generic experimentation platform.
  • Reuse existing enforcement and UX seams - UiEnforcement, ProviderOperationStartGate, OperationRunService, OperationUxPresenter, ProviderOperationStartResultPresenter, AuditRecorder, WorkspaceAuditLogger, and AuditActionId - so the slice stays small, auditable, and server-side enforced.

Technical Context

Language/Version: PHP 8.4 (Laravel 12)
Primary Dependencies: Laravel 12 + Filament v5 + Livewire v4 + Pest; existing UiEnforcement, ProviderOperationStartGate, OperationRunService, AuditRecorder, WorkspaceAuditLogger, AuditActionId, PlatformCapabilities
Storage: PostgreSQL via existing product tables plus one new platform-operated operational_control_activations table; no tenant-owned control tables
Testing: Pest unit + feature tests only
Validation Lanes: fast-feedback, confidence Target Platform: Sail-backed Laravel admin surfaces under /admin/t/{tenant} and system surfaces under /system Project Type: web
Performance Goals: effective-control resolution remains DB-only and cheap at action start time, adds no outbound HTTP, and blocks in-scope starts before queue or provider execution begins
Constraints: no generic feature-flag platform, no new browser or heavy-governance suite, no break-glass bypass in v1, no parallel env gate for in-scope controls, global pauses win over workspace pauses, preserve 404 vs 403 semantics, keep provider-specific restore behavior out of platform-core control vocabulary
Scale/Scope: 2 control keys, 2 scope levels (global and workspace), 1 system management surface, and 3 concrete enforcement families across 4 touched UI surfaces

UI / Surface Guardrail Plan

  • Guardrail scope: changed surfaces
  • Native vs custom classification summary: native Filament + shared start/result primitives
  • Shared-family relevance: header actions, runbook launch actions, provider-backed start results, audit-backed control changes
  • State layers in scope: page, detail, action/modal
  • Handling modes by drift class or surface: review-mandatory
  • Repository-signal treatment: review-mandatory
  • Special surface test profiles: standard-native-filament, monitoring-state-page
  • Required tests or manual smoke: functional-core, state-contract
  • Exception path and spread control: none; v1 must not allow a second local runtime-control dialect
  • Active feature PR close-out entry: Guardrail

Shared Pattern & System Fit

  • Cross-cutting feature marker: yes
  • Systems touched: App\Filament\System\Pages\Ops\Runbooks, new system ops controls page, App\Filament\Resources\FindingResource\Pages\ListFindings, App\Filament\Resources\RestoreRunResource, App\Support\Rbac\UiEnforcement, App\Services\Providers\ProviderOperationStartGate, App\Support\OpsUx\OperationUxPresenter, App\Support\OpsUx\ProviderOperationStartResultPresenter, App\Services\Audit\AuditRecorder, App\Services\Audit\WorkspaceAuditLogger, App\Support\Audit\AuditActionId
  • Shared abstractions reused: UiEnforcement, ProviderOperationStartGate, ProviderOperationStartResultPresenter, OperationRunService, OperationUxPresenter, OpsUxBrowserEvents, OperationRunLinks, SystemOperationRunLinks, AuditRecorder, WorkspaceAuditLogger
  • New abstraction introduced? why?: one bounded OperationalControlCatalog plus one OperationalControlEvaluator are justified because the feature now has two real concrete control keys that must evaluate consistently across system-plane and tenant-plane start paths. No registry lattice, provider strategy system, or customer-facing flag DSL is introduced.
  • Why the existing abstraction was sufficient or insufficient: existing abstractions already own auth, queue start UX, and audit writing; they are insufficient because none presently carries a reusable runtime-safety decision that can pause an action before it starts, and WorkspaceAuditLogger alone cannot truthfully own global platform-plane mutations.
  • Bounded deviation / spread control: no deviation is allowed for in-scope controls; every affected surface must route through the shared evaluator rather than direct config(...) reads or page-local booleans.

OperationRun UX Impact

  • Touches OperationRun start/completion/link UX?: yes
  • Central contract reused: shared OperationRun start UX plus provider-start result helpers
  • Delegated UX behaviors: queued toast, Open operation / View run links, run-enqueued browser event, dedupe-or-blocked messaging, and tenant/workspace-safe URL resolution remain on existing shared paths
  • Surface-owned behavior kept local: initiation inputs, confirmation copy, and control-management forms only
  • Queued DB-notification policy: unchanged explicit opt-in only
  • Terminal notification path: existing central lifecycle mechanism for starts that are allowed
  • Exception path: none

Provider Boundary & Portability Fit

  • Shared provider/platform boundary touched?: yes
  • Provider-owned seams: provider-backed restore.execute dispatch, provider binding resolution, provider reason translation, existing restore safety and dry-run behavior
  • Platform-core seams: operational-control vocabulary, scope/effective-state evaluation, control management surface, audit labels, blocked-state semantics
  • Neutral platform terms / contracts preserved: operational control, activation, effective state, scope, reason, expiry, blocked execution
  • Retained provider-specific semantics and why: restore.execute remains Microsoft-specific provider behavior in the current release because the control feature governs only start allowance, not provider execution semantics
  • Bounded extraction or follow-up path: none in this slice; future catalog growth or provider-neutral expansions require a follow-up spec instead of implicit widening here

Constitution Check

GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.

  • Read/write separation: PASS - control management is an explicit platform-plane mutation with confirmation, audit, and focused tests; blocked execution paths remain non-mutating except for audit logging.
  • RBAC-UX: PASS - platform management stays on /system; tenant/admin execution surfaces stay on /admin/t/{tenant}; cross-plane access remains 404; entitled-but-paused users get explicit control feedback while membership and capability failures keep 404/403 semantics.
  • Workspace isolation / tenant isolation: PASS - workspace-targeted controls apply only within the chosen workspace; tenant surfaces still resolve tenant/workspace entitlement before control-state disclosure.
  • Run observability / Ops-UX: PASS - allowed starts reuse existing OperationRun paths; blocked starts create no run and no new lifecycle dialect; later control activation does not retroactively mutate already accepted runs; shared start/result helpers remain authoritative.
  • Shared path reuse / XCUT-001: PASS - the design extends existing UI enforcement, provider-start gating, audit logging, and operation start UX instead of introducing page-local flags.
  • Provider boundary / PROV-001: PASS - control language stays provider-neutral while restore execution remains provider-owned.
  • Proportionality / PROP-001 and ABSTR-001: PASS - the only new structure is justified by two current-release controls and three existing enforcement surfaces; no experimentation platform or generalized remote-config system is planned.
  • Persisted truth / PERSIST-001: PASS - active control activations represent independent runtime-safety truth with their own scope, reason, expiry, and audit obligations; convenience UI state remains derived.
  • Behavioral state / STATE-001: PASS - paused/enabled semantics change whether execution may start and therefore justify one bounded effective-state model.
  • Filament-native UI / UI-FIL-001: PASS - all touched surfaces remain native Filament pages/resources/actions; no custom UI framework is introduced.
  • Global search rule: N/A - no new globally searchable resource is added.
  • Panel/provider registration: PASS - Filament v5 remains on Livewire v4 and no new panel/provider registration is required; Laravel 12 provider registration stays in bootstrap/providers.php if any provider change becomes necessary.
  • Test governance / TEST-GOV-001: PASS - proof stays in focused unit and feature lanes with no browser or heavy-governance expansion.

Test Governance Check

  • Test purpose / classification by changed surface: Unit for catalog/evaluator/scope precedence/expiry logic; Feature for system control management, runbook enforcement, findings header-action enforcement, restore-execution enforcement, audit logging, and 404/403 semantics
  • Affected validation lanes: fast-feedback, confidence
  • Why this lane mix is the narrowest sufficient proof: the business truth is server-side effective-state resolution plus enforcement at existing Filament and service seams. Browser tests would duplicate modal choreography without proving additional runtime safety truth.
  • Narrowest proving command(s):
    • export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php
    • export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php
    • export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php
  • Fixture / helper / factory / seed / context cost risks: add one local factory for active control activations plus platform-user and workspace-scoped setup helpers reused only by operational-control tests; avoid new shared browser or provider-fixture defaults
  • Expensive defaults or shared helper growth introduced?: no; control fixtures stay opt-in and local to the new test family
  • Heavy-family additions, promotions, or visibility changes: none
  • Surface-class relief / special coverage rule: standard-native-filament and monitoring-state-page relief are sufficient; assert disabled/blocked behavior and no side effects instead of browser-only choreography
  • Closing validation and reviewer handoff: reviewers should rerun the targeted unit/feature commands, verify the env gate is removed from the in-scope findings action, confirm restore execution is blocked before queue/provider start, confirm blocked-execution audit entries exist for runbook/findings/restore paths, confirm global control changes audit without false workspace ownership, confirm /system/ops/controls returns 403 for system users missing platform.ops.controls.manage, and confirm non-members still receive 404 while missing capabilities still receive 403 with the existing capability-denied UX rather than paused-state helper text
  • Budget / baseline / trend follow-up: low-to-moderate increase in focused unit/feature coverage only
  • Review-stop questions: did implementation add a second control persistence shape, leave the env gate in place, introduce a local blocked-state dialect, or widen into browser/heavy-governance lanes?
  • Escalation path: reject-or-split if the implementation widens into generic feature-flagging or customer-managed controls; document-in-feature for small shared-helper extensions that remain local to this slice
  • Active feature PR close-out entry: Guardrail
  • Why no dedicated follow-up spec is needed: the planned new model, evaluator, and tests stay local to the first-slice control family; recurring growth beyond the two bounded control keys would require its own follow-up spec

Project Structure

Documentation (this feature)

specs/242-operational-controls/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── checklists/
│   └── requirements.md
├── contracts/
│   └── operational-controls.contract.yaml
└── tasks.md

Source Code (repository root)

apps/platform/
├── app/
│   ├── Filament/System/Pages/Ops/
│   │   ├── Controls.php
│   │   └── Runbooks.php
│   ├── Filament/Resources/FindingResource/Pages/ListFindings.php
│   ├── Filament/Resources/RestoreRunResource.php
│   ├── Models/
│   │   └── OperationalControlActivation.php
│   ├── Services/Audit/AuditRecorder.php
│   ├── Services/Audit/WorkspaceAuditLogger.php
│   ├── Services/Providers/ProviderOperationStartGate.php
│   ├── Support/Audit/AuditActionId.php
│   ├── Support/Auth/PlatformCapabilities.php
│   └── Support/OperationalControls/
│       ├── OperationalControlCatalog.php
│       ├── OperationalControlDecision.php
│       └── OperationalControlEvaluator.php
├── database/
│   ├── factories/
│   │   └── OperationalControlActivationFactory.php
│   └── migrations/
│       └── *_create_operational_control_activations_table.php
└── tests/
    ├── Feature/
    │   ├── Findings/OperationalControlFindingsBackfillGateTest.php
    │   ├── OperationalControls/
    │   │   ├── NoAdHocOperationalControlBypassTest.php
    │   │   └── OperationalControlAuthorizationSemanticsTest.php
    │   ├── Restore/OperationalControlRestoreExecutionGateTest.php
    │   ├── System/OpsControls/OperationalControlManagementTest.php
    │   └── System/OpsRunbooks/OperationalControlRunbookGateTest.php
    └── Unit/Support/OperationalControls/
      ├── OperationalControlCatalogTest.php
      ├── OperationalControlEvaluatorTest.php
      └── OperationalControlScopeResolutionTest.php

Structure Decision: Single Laravel web application. The feature adds one bounded platform-operated model and one small support namespace for operational-control evaluation, then plugs that into existing system and tenant Filament surfaces.

Complexity Tracking

No unapproved constitution violations are required. The only new persistence and abstraction are the justified control-activation record plus evaluator/catalog pair described below.

Proportionality Review

  • Current operator problem: founders and platform operators need a safe runtime way to pause already-existing risky actions without editing environment variables or relying on inconsistent per-surface logic.
  • Existing structure is insufficient because: UiEnforcement decides RBAC, ProviderOperationStartGate decides provider readiness, and env flags decide hidden page-local runtime behavior. None of those alone gives one auditable runtime-safety truth across both system and tenant surfaces.
  • Narrowest correct implementation: persist only explicit active control activations, derive the enabled state from absence of an activation, evaluate one effective decision through a shared catalog/evaluator, and wire that into the three concrete existing start paths.
  • Ownership cost created: one new table/model/factory, one small support namespace, one system page, new audit action IDs and capability constants, and focused unit/feature coverage.
  • Alternative intentionally rejected: keep env/config flags, reuse workspace settings, or build a generalized feature-flag system. Env/config flags are invisible product truth, workspace settings do not cleanly represent one global control truth, and a generic flag platform is far too broad.
  • Release truth: current-release truth

Phase 0 — Research (output: research.md)

See: /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/research.md

Goals:

  • Confirm the narrowest persistence shape for runtime-safety truth and explicitly reject env-only or workspace-settings-only alternatives.
  • Confirm the smallest shared seam where control evaluation belongs for system runbooks, tenant findings lifecycle backfill, and provider-backed restore execution.
  • Define v1 scoping, global-first precedence, expiry, and audit expectations without inventing a generic flag taxonomy.
  • Document the v1 decision that break-glass and broad platform capabilities do not bypass an active operational control.

Phase 1 — Design & Contracts (outputs: data-model.md, contracts/, quickstart.md)

See:

  • /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/data-model.md
  • /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/contracts/operational-controls.contract.yaml
  • /Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/quickstart.md

Design focus:

  • Add one platform-operated activation record that can pause a control globally or for one workspace, with optional expiry, auditable reason, global-first precedence, and partial unique indexes that enforce one active global row per control and one active workspace row per control/workspace pair; the write path deletes expired conflicting rows before inserting a new activation, and this table is not used as an archive.
  • Add one new system ops controls page that lists the two bounded control keys, their effective state, scope, owner, expiry, change actions, and on-demand audit history links, and uses a staged scope-impact preview before control mutations are confirmed.
  • Use OperationalControlDecision as the shared control-state presentation primitive for controls, runbooks, findings, and restore surfaces.
  • Route findings.lifecycle.backfill through the new evaluator in both ListFindings and Runbooks, removing the existing env gate.
  • Route findings.lifecycle.backfill through FindingsLifecycleBackfillRunbookService::start() so the system runbooks page, tenant findings page, CLI command, and deploy-hook command all honor the same control decision.
  • Route restore.execute through the same evaluator before provider-backed or non-provider-backed queued restore execution is created.
  • Add dedicated audit action IDs and a dedicated platform capability for control management, using AuditRecorder for global control changes and blocked system-plane all-tenant attempts, and WorkspaceAuditLogger for workspace/tenant-scoped changes and blocked-execution evidence with concrete scope.
  • Keep blocked-state messaging on existing shared start/result helpers and avoid custom control-state UI frameworks.

Phase 1 — Agent Context Update

After Phase 1 artifacts are generated, update Copilot context from the plan:

  • /Users/ahmeddarrazi/Documents/projects/wt-plattform/.specify/scripts/bash/update-agent-context.sh copilot

Phase 2 — Implementation Outline (tasks created in /speckit.tasks)

  • Add the operational_control_activations persistence, model, and local factory for active pause records.
  • Introduce the bounded operational-controls support namespace (OperationalControlCatalog, OperationalControlDecision, OperationalControlEvaluator) and keep enabled-state derived from active rows.
  • Add the dedicated controls-manage capability and its local grant path in the seeded platform operator setup.
  • Add the system-plane controls page and wire it into the existing system ops navigation with staged preview-plus-confirm pause/resume actions, audit logging, and on-demand audit history links.
  • Replace the findings env gate with evaluator-driven control checks on the tenant findings header action and the system runbooks start path.
  • Integrate the same evaluator into restore execution before any queued execution OperationRun, queued execution RestoreRun, queue dispatch, or provider-backed execution starts.
  • Add focused unit and feature tests, plus a guard test that blocks new ad-hoc runtime-control bypasses for in-scope controls and one proving path that activating a control does not rewrite previously accepted runs.

Constitution Check (Post-Design)

Re-check target: PASS. The post-design shape must still use one bounded control catalog, one active-row persistence model, one evaluator, existing auth/start/audit helpers, and no second runtime-control dialect.

Implementation Close-out

  • Delivered the bounded operational-controls slice end-to-end: one operational_control_activations truth model, one catalog/evaluator/decision support path, a new /system/ops/controls management page, findings lifecycle enforcement through FindingsLifecycleBackfillRunbookService::start(), and restore execution blocking before any queued execution OperationRun, queued execution RestoreRun, job dispatch, or provider-backed start.
  • Runtime cleanup landed with the in-scope findings env gate removed from config/tenantpilot.php, a source-scanning guard against ad-hoc bypasses, and workspace-isolation proof showing a workspace-scoped pause blocks only the targeted workspace while a second workspace remains unaffected.
  • Validation passed on the narrow feature lane: export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php tests/Feature/Filament/Spec113/AdminFindingsNoMaintenanceActionsTest.php tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php with 20 passed (253 assertions).
  • Formatting passed with export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent.
  • Manual smoke passed in the integrated browser: the staged pause/resume flow on /system/ops/controls for Findings lifecycle backfill rendered scope-impact previews, applied the global pause, and returned to Enabled inside the SC-001 budget after bringing the local database up to date.