TenantAtlas/specs/242-operational-controls/plan.md
ahmido d96abc65fb
Some checks failed
Main Confidence / confidence (push) Failing after 1m23s
Remove Findings lifecycle backfill operational surface (controls slice) (#280)
Removes the Findings lifecycle backfill from the Operational Controls UI and OperationalControlCatalog.

This patch is a safe, controls-only change; runbooks, jobs and other runtime artifacts are NOT removed yet. Follow-up work will delete the runbook service/scope, jobs, commands, and update tests.

Files changed:
- apps/platform/app/Filament/System/Pages/Ops/Controls.php
- apps/platform/app/Support/OperationalControls/OperationalControlCatalog.php
- apps/platform/tests/Feature/System/OpsControls/OperationalControlManagementTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php
- apps/platform/tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #280
2026-04-26 15:43:47 +00:00

233 lines
23 KiB
Markdown

# Implementation Plan: Operational Controls
**Branch**: `242-operational-controls` | **Date**: 2026-04-26 | **Spec**: `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md`
**Input**: Feature specification from `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/spec.md`
**Note**: This template is filled in by the `/speckit.plan` command. See `.specify/scripts/` for helper scripts.
## Summary
- Replace the ad-hoc `allow_admin_maintenance_actions` environment gate with one product-owned operational-control path for the first-slice keys `findings.lifecycle.backfill` and `restore.execute`.
- Introduce one platform-operated activation record plus one shared evaluator that plugs into the existing system runbook, tenant findings-maintenance, and restore-execution start seams without becoming a generic experimentation platform.
- Reuse existing enforcement and UX seams - `UiEnforcement`, `ProviderOperationStartGate`, `OperationRunService`, `OperationUxPresenter`, `ProviderOperationStartResultPresenter`, `AuditRecorder`, `WorkspaceAuditLogger`, and `AuditActionId` - so the slice stays small, auditable, and server-side enforced.
## Technical Context
**Language/Version**: PHP 8.4 (Laravel 12)
**Primary Dependencies**: Laravel 12 + Filament v5 + Livewire v4 + Pest; existing `UiEnforcement`, `ProviderOperationStartGate`, `OperationRunService`, `AuditRecorder`, `WorkspaceAuditLogger`, `AuditActionId`, `PlatformCapabilities`
**Storage**: PostgreSQL via existing product tables plus one new platform-operated `operational_control_activations` table; no tenant-owned control tables
**Testing**: Pest unit + feature tests only
**Validation Lanes**: fast-feedback, confidence
**Target Platform**: Sail-backed Laravel admin surfaces under `/admin/t/{tenant}` and system surfaces under `/system`
**Project Type**: web
**Performance Goals**: effective-control resolution remains DB-only and cheap at action start time, adds no outbound HTTP, and blocks in-scope starts before queue or provider execution begins
**Constraints**: no generic feature-flag platform, no new browser or heavy-governance suite, no break-glass bypass in v1, no parallel env gate for in-scope controls, global pauses win over workspace pauses, preserve 404 vs 403 semantics, keep provider-specific restore behavior out of platform-core control vocabulary
**Scale/Scope**: 2 control keys, 2 scope levels (global and workspace), 1 system management surface, and 3 concrete enforcement families across 4 touched UI surfaces
## UI / Surface Guardrail Plan
- **Guardrail scope**: changed surfaces
- **Native vs custom classification summary**: native Filament + shared start/result primitives
- **Shared-family relevance**: header actions, runbook launch actions, provider-backed start results, audit-backed control changes
- **State layers in scope**: page, detail, action/modal
- **Handling modes by drift class or surface**: review-mandatory
- **Repository-signal treatment**: review-mandatory
- **Special surface test profiles**: standard-native-filament, monitoring-state-page
- **Required tests or manual smoke**: functional-core, state-contract
- **Exception path and spread control**: none; v1 must not allow a second local runtime-control dialect
- **Active feature PR close-out entry**: Guardrail
## Shared Pattern & System Fit
- **Cross-cutting feature marker**: yes
- **Systems touched**: `App\Filament\System\Pages\Ops\Runbooks`, new system ops controls page, `App\Filament\Resources\FindingResource\Pages\ListFindings`, `App\Filament\Resources\RestoreRunResource`, `App\Support\Rbac\UiEnforcement`, `App\Services\Providers\ProviderOperationStartGate`, `App\Support\OpsUx\OperationUxPresenter`, `App\Support\OpsUx\ProviderOperationStartResultPresenter`, `App\Services\Audit\AuditRecorder`, `App\Services\Audit\WorkspaceAuditLogger`, `App\Support\Audit\AuditActionId`
- **Shared abstractions reused**: `UiEnforcement`, `ProviderOperationStartGate`, `ProviderOperationStartResultPresenter`, `OperationRunService`, `OperationUxPresenter`, `OpsUxBrowserEvents`, `OperationRunLinks`, `SystemOperationRunLinks`, `AuditRecorder`, `WorkspaceAuditLogger`
- **New abstraction introduced? why?**: one bounded `OperationalControlCatalog` plus one `OperationalControlEvaluator` are justified because the feature now has two real concrete control keys that must evaluate consistently across system-plane and tenant-plane start paths. No registry lattice, provider strategy system, or customer-facing flag DSL is introduced.
- **Why the existing abstraction was sufficient or insufficient**: existing abstractions already own auth, queue start UX, and audit writing; they are insufficient because none presently carries a reusable runtime-safety decision that can pause an action before it starts, and `WorkspaceAuditLogger` alone cannot truthfully own global platform-plane mutations.
- **Bounded deviation / spread control**: no deviation is allowed for in-scope controls; every affected surface must route through the shared evaluator rather than direct `config(...)` reads or page-local booleans.
## OperationRun UX Impact
- **Touches OperationRun start/completion/link UX?**: yes
- **Central contract reused**: shared OperationRun start UX plus provider-start result helpers
- **Delegated UX behaviors**: queued toast, `Open operation` / `View run` links, run-enqueued browser event, dedupe-or-blocked messaging, and tenant/workspace-safe URL resolution remain on existing shared paths
- **Surface-owned behavior kept local**: initiation inputs, confirmation copy, and control-management forms only
- **Queued DB-notification policy**: unchanged explicit opt-in only
- **Terminal notification path**: existing central lifecycle mechanism for starts that are allowed
- **Exception path**: none
## Provider Boundary & Portability Fit
- **Shared provider/platform boundary touched?**: yes
- **Provider-owned seams**: provider-backed `restore.execute` dispatch, provider binding resolution, provider reason translation, existing restore safety and dry-run behavior
- **Platform-core seams**: operational-control vocabulary, scope/effective-state evaluation, control management surface, audit labels, blocked-state semantics
- **Neutral platform terms / contracts preserved**: operational control, activation, effective state, scope, reason, expiry, blocked execution
- **Retained provider-specific semantics and why**: `restore.execute` remains Microsoft-specific provider behavior in the current release because the control feature governs only start allowance, not provider execution semantics
- **Bounded extraction or follow-up path**: none in this slice; future catalog growth or provider-neutral expansions require a follow-up spec instead of implicit widening here
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
- Read/write separation: PASS - control management is an explicit platform-plane mutation with confirmation, audit, and focused tests; blocked execution paths remain non-mutating except for audit logging.
- RBAC-UX: PASS - platform management stays on `/system`; tenant/admin execution surfaces stay on `/admin/t/{tenant}`; cross-plane access remains 404; entitled-but-paused users get explicit control feedback while membership and capability failures keep 404/403 semantics.
- Workspace isolation / tenant isolation: PASS - workspace-targeted controls apply only within the chosen workspace; tenant surfaces still resolve tenant/workspace entitlement before control-state disclosure.
- Run observability / Ops-UX: PASS - allowed starts reuse existing `OperationRun` paths; blocked starts create no run and no new lifecycle dialect; later control activation does not retroactively mutate already accepted runs; shared start/result helpers remain authoritative.
- Shared path reuse / `XCUT-001`: PASS - the design extends existing UI enforcement, provider-start gating, audit logging, and operation start UX instead of introducing page-local flags.
- Provider boundary / `PROV-001`: PASS - control language stays provider-neutral while restore execution remains provider-owned.
- Proportionality / `PROP-001` and `ABSTR-001`: PASS - the only new structure is justified by two current-release controls and three existing enforcement surfaces; no experimentation platform or generalized remote-config system is planned.
- Persisted truth / `PERSIST-001`: PASS - active control activations represent independent runtime-safety truth with their own scope, reason, expiry, and audit obligations; convenience UI state remains derived.
- Behavioral state / `STATE-001`: PASS - paused/enabled semantics change whether execution may start and therefore justify one bounded effective-state model.
- Filament-native UI / `UI-FIL-001`: PASS - all touched surfaces remain native Filament pages/resources/actions; no custom UI framework is introduced.
- Global search rule: N/A - no new globally searchable resource is added.
- Panel/provider registration: PASS - Filament v5 remains on Livewire v4 and no new panel/provider registration is required; Laravel 12 provider registration stays in `bootstrap/providers.php` if any provider change becomes necessary.
- Test governance / `TEST-GOV-001`: PASS - proof stays in focused unit and feature lanes with no browser or heavy-governance expansion.
## Test Governance Check
- **Test purpose / classification by changed surface**: Unit for catalog/evaluator/scope precedence/expiry logic; Feature for system control management, runbook enforcement, findings header-action enforcement, restore-execution enforcement, audit logging, and `404`/`403` semantics
- **Affected validation lanes**: fast-feedback, confidence
- **Why this lane mix is the narrowest sufficient proof**: the business truth is server-side effective-state resolution plus enforcement at existing Filament and service seams. Browser tests would duplicate modal choreography without proving additional runtime safety truth.
- **Narrowest proving command(s)**:
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php`
- `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php`
- **Fixture / helper / factory / seed / context cost risks**: add one local factory for active control activations plus platform-user and workspace-scoped setup helpers reused only by operational-control tests; avoid new shared browser or provider-fixture defaults
- **Expensive defaults or shared helper growth introduced?**: no; control fixtures stay opt-in and local to the new test family
- **Heavy-family additions, promotions, or visibility changes**: none
- **Surface-class relief / special coverage rule**: standard-native-filament and monitoring-state-page relief are sufficient; assert disabled/blocked behavior and no side effects instead of browser-only choreography
- **Closing validation and reviewer handoff**: reviewers should rerun the targeted unit/feature commands, verify the env gate is removed from the in-scope findings action, confirm restore execution is blocked before queue/provider start, confirm blocked-execution audit entries exist for runbook/findings/restore paths, confirm global control changes audit without false workspace ownership, confirm `/system/ops/controls` returns 403 for system users missing `platform.ops.controls.manage`, and confirm non-members still receive 404 while missing capabilities still receive 403 with the existing capability-denied UX rather than paused-state helper text
- **Budget / baseline / trend follow-up**: low-to-moderate increase in focused unit/feature coverage only
- **Review-stop questions**: did implementation add a second control persistence shape, leave the env gate in place, introduce a local blocked-state dialect, or widen into browser/heavy-governance lanes?
- **Escalation path**: `reject-or-split` if the implementation widens into generic feature-flagging or customer-managed controls; `document-in-feature` for small shared-helper extensions that remain local to this slice
- **Active feature PR close-out entry**: Guardrail
- **Why no dedicated follow-up spec is needed**: the planned new model, evaluator, and tests stay local to the first-slice control family; recurring growth beyond the two bounded control keys would require its own follow-up spec
## Project Structure
### Documentation (this feature)
```text
specs/242-operational-controls/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── checklists/
│ └── requirements.md
├── contracts/
│ └── operational-controls.contract.yaml
└── tasks.md
```
### Source Code (repository root)
```text
apps/platform/
├── app/
│ ├── Filament/System/Pages/Ops/
│ │ ├── Controls.php
│ │ └── Runbooks.php
│ ├── Filament/Resources/FindingResource/Pages/ListFindings.php
│ ├── Filament/Resources/RestoreRunResource.php
│ ├── Models/
│ │ └── OperationalControlActivation.php
│ ├── Services/Audit/AuditRecorder.php
│ ├── Services/Audit/WorkspaceAuditLogger.php
│ ├── Services/Providers/ProviderOperationStartGate.php
│ ├── Support/Audit/AuditActionId.php
│ ├── Support/Auth/PlatformCapabilities.php
│ └── Support/OperationalControls/
│ ├── OperationalControlCatalog.php
│ ├── OperationalControlDecision.php
│ └── OperationalControlEvaluator.php
├── database/
│ ├── factories/
│ │ └── OperationalControlActivationFactory.php
│ └── migrations/
│ └── *_create_operational_control_activations_table.php
└── tests/
├── Feature/
│ ├── Findings/OperationalControlFindingsBackfillGateTest.php
│ ├── OperationalControls/
│ │ ├── NoAdHocOperationalControlBypassTest.php
│ │ └── OperationalControlAuthorizationSemanticsTest.php
│ ├── Restore/OperationalControlRestoreExecutionGateTest.php
│ ├── System/OpsControls/OperationalControlManagementTest.php
│ └── System/OpsRunbooks/OperationalControlRunbookGateTest.php
└── Unit/Support/OperationalControls/
├── OperationalControlCatalogTest.php
├── OperationalControlEvaluatorTest.php
└── OperationalControlScopeResolutionTest.php
```
**Structure Decision**: Single Laravel web application. The feature adds one bounded platform-operated model and one small support namespace for operational-control evaluation, then plugs that into existing system and tenant Filament surfaces.
## Complexity Tracking
No unapproved constitution violations are required. The only new persistence and abstraction are the justified control-activation record plus evaluator/catalog pair described below.
## Proportionality Review
- **Current operator problem**: founders and platform operators need a safe runtime way to pause already-existing risky actions without editing environment variables or relying on inconsistent per-surface logic.
- **Existing structure is insufficient because**: `UiEnforcement` decides RBAC, `ProviderOperationStartGate` decides provider readiness, and env flags decide hidden page-local runtime behavior. None of those alone gives one auditable runtime-safety truth across both system and tenant surfaces.
- **Narrowest correct implementation**: persist only explicit active control activations, derive the enabled state from absence of an activation, evaluate one effective decision through a shared catalog/evaluator, and wire that into the three concrete existing start paths.
- **Ownership cost created**: one new table/model/factory, one small support namespace, one system page, new audit action IDs and capability constants, and focused unit/feature coverage.
- **Alternative intentionally rejected**: keep env/config flags, reuse workspace settings, or build a generalized feature-flag system. Env/config flags are invisible product truth, workspace settings do not cleanly represent one global control truth, and a generic flag platform is far too broad.
- **Release truth**: current-release truth
## Phase 0 — Research (output: `research.md`)
See: `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/research.md`
Goals:
- Confirm the narrowest persistence shape for runtime-safety truth and explicitly reject env-only or workspace-settings-only alternatives.
- Confirm the smallest shared seam where control evaluation belongs for system runbooks, tenant findings lifecycle backfill, and provider-backed restore execution.
- Define v1 scoping, global-first precedence, expiry, and audit expectations without inventing a generic flag taxonomy.
- Document the v1 decision that break-glass and broad platform capabilities do not bypass an active operational control.
## Phase 1 — Design & Contracts (outputs: `data-model.md`, `contracts/`, `quickstart.md`)
See:
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/data-model.md`
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/contracts/operational-controls.contract.yaml`
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/specs/242-operational-controls/quickstart.md`
Design focus:
- Add one platform-operated activation record that can pause a control globally or for one workspace, with optional expiry, auditable reason, global-first precedence, and partial unique indexes that enforce one active global row per control and one active workspace row per control/workspace pair; the write path deletes expired conflicting rows before inserting a new activation, and this table is not used as an archive.
- Add one new system ops controls page that lists the two bounded control keys, their effective state, scope, owner, expiry, change actions, and on-demand audit history links, and uses a staged scope-impact preview before control mutations are confirmed.
- Use `OperationalControlDecision` as the shared control-state presentation primitive for controls, runbooks, findings, and restore surfaces.
- Route `findings.lifecycle.backfill` through the new evaluator in both `ListFindings` and `Runbooks`, removing the existing env gate.
- Route `findings.lifecycle.backfill` through `FindingsLifecycleBackfillRunbookService::start()` so the system runbooks page, tenant findings page, CLI command, and deploy-hook command all honor the same control decision.
- Route `restore.execute` through the same evaluator before provider-backed or non-provider-backed queued restore execution is created.
- Add dedicated audit action IDs and a dedicated platform capability for control management, using `AuditRecorder` for global control changes and blocked system-plane all-tenant attempts, and `WorkspaceAuditLogger` for workspace/tenant-scoped changes and blocked-execution evidence with concrete scope.
- Keep blocked-state messaging on existing shared start/result helpers and avoid custom control-state UI frameworks.
## Phase 1 — Agent Context Update
After Phase 1 artifacts are generated, update Copilot context from the plan:
- `/Users/ahmeddarrazi/Documents/projects/wt-plattform/.specify/scripts/bash/update-agent-context.sh copilot`
## Phase 2 — Implementation Outline (tasks created in `/speckit.tasks`)
- Add the `operational_control_activations` persistence, model, and local factory for active pause records.
- Introduce the bounded operational-controls support namespace (`OperationalControlCatalog`, `OperationalControlDecision`, `OperationalControlEvaluator`) and keep enabled-state derived from active rows.
- Add the dedicated controls-manage capability and its local grant path in the seeded platform operator setup.
- Add the system-plane controls page and wire it into the existing system ops navigation with staged preview-plus-confirm pause/resume actions, audit logging, and on-demand audit history links.
- Replace the findings env gate with evaluator-driven control checks on the tenant findings header action and the system runbooks start path.
- Integrate the same evaluator into restore execution before any queued execution `OperationRun`, queued execution `RestoreRun`, queue dispatch, or provider-backed execution starts.
- Add focused unit and feature tests, plus a guard test that blocks new ad-hoc runtime-control bypasses for in-scope controls and one proving path that activating a control does not rewrite previously accepted runs.
## Constitution Check (Post-Design)
Re-check target: PASS. The post-design shape must still use one bounded control catalog, one active-row persistence model, one evaluator, existing auth/start/audit helpers, and no second runtime-control dialect.
## Implementation Close-out
- Delivered the bounded operational-controls slice end-to-end: one `operational_control_activations` truth model, one catalog/evaluator/decision support path, a new `/system/ops/controls` management page, findings lifecycle enforcement through `FindingsLifecycleBackfillRunbookService::start()`, and restore execution blocking before any queued execution `OperationRun`, queued execution `RestoreRun`, job dispatch, or provider-backed start.
- Runtime cleanup landed with the in-scope findings env gate removed from `config/tenantpilot.php`, a source-scanning guard against ad-hoc bypasses, and workspace-isolation proof showing a workspace-scoped pause blocks only the targeted workspace while a second workspace remains unaffected.
- Validation passed on the narrow feature lane: `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Unit/Support/OperationalControls/OperationalControlCatalogTest.php tests/Unit/Support/OperationalControls/OperationalControlEvaluatorTest.php tests/Unit/Support/OperationalControls/OperationalControlScopeResolutionTest.php tests/Feature/Filament/Spec113/AdminFindingsNoMaintenanceActionsTest.php tests/Feature/System/OpsControls/OperationalControlManagementTest.php tests/Feature/System/OpsRunbooks/OperationalControlRunbookGateTest.php tests/Feature/Findings/OperationalControlFindingsBackfillGateTest.php tests/Feature/Restore/OperationalControlRestoreExecutionGateTest.php tests/Feature/OperationalControls/OperationalControlAuthorizationSemanticsTest.php tests/Feature/OperationalControls/NoAdHocOperationalControlBypassTest.php` with `20 passed (253 assertions)`.
- Formatting passed with `export PATH="/bin:/usr/bin:/usr/local/bin:$PATH" && cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`.
- Manual smoke passed in the integrated browser: the staged pause/resume flow on `/system/ops/controls` for `Findings lifecycle backfill` rendered scope-impact previews, applied the global pause, and returned to `Enabled` inside the SC-001 budget after bringing the local database up to date.