TenantAtlas/specs/113-platform-ops-runbooks/tasks.md

13 KiB

description
Task list for Spec 113 implementation

Tasks: Platform Ops Runbooks (Operator Control Plane)

Input: Design documents from specs/113-platform-ops-runbooks/ Prerequisites: specs/113-platform-ops-runbooks/plan.md, specs/113-platform-ops-runbooks/spec.md, plus specs/113-platform-ops-runbooks/research.md, specs/113-platform-ops-runbooks/data-model.md, specs/113-platform-ops-runbooks/contracts/system-ops-runbooks.openapi.yaml, specs/113-platform-ops-runbooks/quickstart.md.

Tests: REQUIRED (Pest) for all runtime behavior changes.


Phase 1: Setup (Shared Infrastructure)

Purpose: Confirm touch points and keep spec artifacts aligned.

  • T001 Confirm spec UI Action Matrix is complete in specs/113-platform-ops-runbooks/spec.md
  • T002 Confirm System panel provider registration in bootstrap/providers.php (Laravel 11+/12 provider registration)
  • T003 [P] Capture current legacy /admin trigger location in app/Filament/Resources/FindingResource/Pages/ListFindings.php ("Backfill findings lifecycle" header action)
  • T004 [P] Review existing single-tenant backfill pipeline entry points in app/Console/Commands/TenantpilotBackfillFindingLifecycle.php and app/Jobs/BackfillFindingLifecycleJob.php

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Security semantics, session isolation, and auth hardening that block all user stories.

  • T005 Add platform runbook capability constants to app/Support/Auth/PlatformCapabilities.php (e.g., platform.ops.view, platform.runbooks.view, platform.runbooks.run, platform.runbooks.findings.lifecycle_backfill)

  • T006 Update System panel access control to use capability registry constants in app/Providers/Filament/SystemPanelProvider.php (keep ACCESS_SYSTEM_PANEL gate, add per-page capability checks)

  • T007 Change platform capability denial semantics to 403 (member-but-missing-capability) in app/Http/Middleware/EnsurePlatformCapability.php (keep wrong-plane 404 handled by ensure-correct-guard)

  • T008 [P] Add SR-002 regression tests for 404 vs 403 semantics in tests/Feature/System/Spec113/AuthorizationSemanticsTest.php (tenant user -> 404 on /system/*, platform user without capability -> 403, platform user with capability -> 200)

  • T009 Define and enforce the “allowed tenant universe” for System runbooks in app/Services/System/AllowedTenantUniverse.php (v1: exclude platform tenant; provide tenant query for pickers and runtime guard)

  • T010 [P] Add allowed tenant universe tests in tests/Feature/System/Spec113/AllowedTenantUniverseTest.php (picker excludes platform tenant; attempts to target excluded tenant are rejected; no OperationRun created)

  • T011 Create System session cookie isolation middleware in app/Http/Middleware/UseSystemSessionCookie.php (set dedicated session cookie name before StartSession)

  • T012 Wire System session cookie middleware before StartSession in app/Providers/Filament/SystemPanelProvider.php (SR-004)

  • T013 [P] Add System session isolation test in tests/Feature/System/Spec113/SystemSessionIsolationTest.php (assert response sets the System session cookie name for /system)

  • T014 Implement /system/login throttling (10/min per IP + username key) in app/Filament/System/Pages/Auth/Login.php (SR-003; use RateLimiter and clear on success)

  • T015 [P] Add /system/login throttling tests in tests/Feature/System/Spec113/SystemLoginThrottleTest.php (assert throttled after N failures; ensure failures still emit audit via AuditLogger)


Phase 3: User Story 1 — Operator runs a runbook safely (Priority: P1) 🎯 MVP

Goal: /system/ops/runbooks supports preflight + explicit confirmation + reason capture + typed confirmation for all-tenants; starts a tracked OperationRun and links to “View run”.

Independent Test: Visit /system/ops/runbooks, run preflight, start run, follow “View run” to /system/ops/runs/{id}, and confirm audit/run records exist.

Tests for User Story 1

  • T016 [P] [US1] Add runbook preflight tests in tests/Feature/System/OpsRunbooks/FindingsLifecycleBackfillPreflightTest.php (single tenant + all tenants preflight returns affected_count)
  • T017 [P] [US1] Add runbook start/confirmation tests in tests/Feature/System/OpsRunbooks/FindingsLifecycleBackfillStartTest.php (typed confirmation + reason required for all_tenants; disabled when affected_count=0)
  • T018 [P] [US1] Add break-glass reason enforcement + recording tests in tests/Feature/System/OpsRunbooks/FindingsLifecycleBackfillBreakGlassTest.php (reason required when break-glass active; break-glass marker and reason recorded on run + audit)
  • T019 [P] [US1] Add Ops-UX feedback contract test for start surface in tests/Feature/System/OpsRunbooks/OpsUxStartSurfaceContractTest.php (toast intent-only + “View run” link; no DB queued/running notifications)
  • T020 [P] [US1] Add audit fail-safe test in tests/Feature/System/OpsRunbooks/FindingsLifecycleBackfillAuditFailSafeTest.php (audit logger failure does not crash run; run still records failure outcome)

Implementation for User Story 1

  • T021 [US1] Create runbook service app/Services/Runbooks/FindingsLifecycleBackfillRunbookService.php with methods preflight(scope) and start(scope, initiator, reason, source)

  • T022 [P] [US1] Create runbook scope/value objects in app/Services/Runbooks/FindingsLifecycleBackfillScope.php and app/Services/Runbooks/RunbookReason.php (validate reason_code and reason_text max 500 chars; include break-glass reason requirements)

  • T023 [US1] Add audit events for preflight/start/completed/failed using AuditLogger in app/Services/Runbooks/FindingsLifecycleBackfillRunbookService.php (action IDs per specs/113-platform-ops-runbooks/data-model.md; must be fail-safe)

  • T024 [US1] Record break-glass marker + reason on OperationRun context and audit in app/Services/Runbooks/FindingsLifecycleBackfillRunbookService.php (SR-005)

  • T025 [US1] Implement all-tenants orchestration job in app/Jobs/BackfillFindingLifecycleWorkspaceJob.php (create/lock workspace-scoped OperationRun; dispatch tenant fan-out; set summary_counts[tenants/total/processed])

  • T026 [US1] Implement tenant worker job that updates the shared workspace run in app/Jobs/BackfillFindingLifecycleTenantIntoWorkspaceRunJob.php (chunk writes; increment summary_counts keys from OperationSummaryKeys::all(); append failures; call maybeCompleteBulkRun())

  • T027 [US1] Ensure scope-level lock prevents concurrent all-tenants runs in app/Services/Runbooks/FindingsLifecycleBackfillRunbookService.php (lock key includes workspace + scope)

  • T028 [US1] Enable platform in-app notifications for run completion/failure by turning on database notifications in app/Providers/Filament/SystemPanelProvider.php (ensure terminal notification is OperationRunCompleted, initiator-only)

  • T029 [P] [US1] Add System “View run” URL helper in app/Support/System/SystemOperationRunLinks.php and use it for UI + alerts/notifications (avoid admin-plane links)

  • T030 [US1] Dispatch Alerts event on failure using app/Services/Alerts/AlertDispatchService.php from app/Services/Runbooks/FindingsLifecycleBackfillRunbookService.php (event_type operations.run.failed; include System “View run” URL)

  • T031 [US1] Create System runbooks page class app/Filament/System/Pages/Ops/Runbooks.php (capability-gated; scope selector uses AllowedTenantUniverse; Preflight action; Run action with confirmation + typed confirm + reason)

  • T032 [P] [US1] Create System runbooks page view resources/views/filament/system/pages/ops/runbooks.blade.php (operator warning; show preflight results + disable Run when nothing to do)

  • T033 [US1] Create System runs list page class app/Filament/System/Pages/Ops/Runs.php (table listing operation runs for runbook types; default sort newest)

  • T034 [P] [US1] Create System runs list view resources/views/filament/system/pages/ops/runs.blade.php (record inspection affordance: clickable row -> run detail)

  • T035 [US1] Create System run detail page class app/Filament/System/Pages/Ops/ViewRun.php (infolist rendering of OperationRun; show scope/actor/counts/failures)

  • T036 [P] [US1] Create System run detail view resources/views/filament/system/pages/ops/view-run.blade.php


Phase 4: User Story 2 — Customers never see maintenance actions (Priority: P1)

Goal: No /admin maintenance/backfill affordances by default; tenant users cannot access /system/* (404).

Independent Test: As a tenant user, /system/* returns 404; in /admin Findings list there is no backfill action when the feature flag is defaulted off.

Tests for User Story 2

  • T037 [P] [US2] Add regression test asserting /admin Findings list has no backfill action by default in tests/Feature/Filament/Spec113/AdminFindingsNoMaintenanceActionsTest.php (targets app/Filament/Resources/FindingResource/Pages/ListFindings.php)
  • T038 [P] [US2] Add tenant-plane 404 test for /system/ops/runbooks in tests/Feature/System/Spec113/TenantPlaneCannotAccessSystemTest.php

Implementation for User Story 2

  • T039 [US2] Remove or feature-flag off the legacy header action in app/Filament/Resources/FindingResource/Pages/ListFindings.php (FR-001; default off in production-like envs)
  • T040 [US2] Add a config-backed feature flag defaulting to false in config/tenantpilot.php (e.g., allow_admin_maintenance_actions) and wire it in app/Filament/Resources/FindingResource/Pages/ListFindings.php

Phase 5: User Story 3 — Same logic for deploy-time and operator re-run (Priority: P2)

Goal: One implementation path for preflight/start that is reused by System UI, CLI, and deploy-time automation.

Independent Test: Run the runbook twice with the same scope; second run produces updated_count=0; deploy-time entry point calls the same service.

Tests for User Story 3

  • T041 [P] [US3] Add idempotency test in tests/Feature/System/OpsRunbooks/FindingsLifecycleBackfillIdempotencyTest.php (second run updated=0 and/or preflight affected_count=0)
  • T042 [P] [US3] Add deploy-time entry point test in tests/Feature/Console/Spec113/DeployRunbooksCommandTest.php (command delegates to FindingsLifecycleBackfillRunbookService)

Implementation for User Story 3

  • T043 [US3] Refactor CLI command to call shared runbook service in app/Console/Commands/TenantpilotBackfillFindingLifecycle.php (single-tenant scope, source=cli)
  • T044 [US3] Add deploy-time runbooks command in app/Console/Commands/TenantpilotRunDeployRunbooks.php (source=deploy_hook; initiator null; uses FindingsLifecycleBackfillRunbookService)
  • T045 [US3] Ensure System UI uses the same runbook service start() call path in app/Filament/System/Pages/Ops/Runbooks.php (source=system_ui)
  • T046 [US3] Ensure initiator-null runs do not emit terminal DB notification in app/Services/OperationRunService.php (system-run behavior; audit/alerts still apply)

Phase 6: Polish & Cross-Cutting Concerns

  • T047 [P] Run new Spec 113 tests via vendor/bin/sail artisan test --compact tests/Feature/System/Spec113/ (ensure all new tests pass)
  • T048 [P] Run Ops Runbooks tests via vendor/bin/sail artisan test --compact tests/Feature/System/OpsRunbooks/ (ensure US1/US3 tests pass)
  • T049 [P] Run formatting on touched files via vendor/bin/sail bin pint --dirty --format agent (targets app/Http/Middleware/, app/Filament/System/Pages/, app/Services/Runbooks/, tests/Feature/System/)

Dependencies & Execution Order

Phase Dependencies

  • Setup (Phase 1): no dependencies
  • Foundational (Phase 2): depends on Setup; BLOCKS all story work
  • US1 (Phase 3): depends on Foundational
  • US2 (Phase 4): depends on Foundational
  • US3 (Phase 5): depends on US1 shared runbook service (T021) + Foundational
  • Polish (Phase 6): depends on desired stories being complete

User Story Dependencies

  • US1 (P1): foundational security + session isolation + login throttle must be in place first
  • US2 (P1): can be implemented after Foundational; independent of US1 UI
  • US3 (P2): depends on the shared runbook service created in US1

Parallel Execution Examples

US1 parallelizable tasks

  • T016, T017, T018, T019, T020 can be drafted in parallel (tests in separate files under tests/Feature/System/OpsRunbooks/)
  • T031/T032, T033/T034, and T035/T036 can be built in parallel (separate System page classes/views)
  • T025 and T026 can be built in parallel once the service contract (T021) is agreed

US2 parallelizable tasks

  • T037 and T038 can run in parallel (tests)
  • T039 and T040 can run in parallel if T040 lands first (feature flag), otherwise keep sequential

US3 parallelizable tasks

  • T041 and T042 can run in parallel (tests)
  • T043 and T044 can be implemented in parallel once T021 exists

Implementation Strategy (MVP First)

  1. Complete Phase 2 (security semantics + session isolation + login throttle)
  2. Deliver US1 (System runbooks page + OperationRun tracking + System runs detail)
  3. Deliver US2 (remove/disable /admin maintenance UI)
  4. Deliver US3 (shared logic reused by CLI + deploy-time automation)