TenantAtlas/specs/160-operation-lifecycle-guarantees/tasks.md
2026-03-23 22:52:37 +01:00

28 KiB

Tasks: Operation Lifecycle Guarantees & Queue-to-Domain Failure Reconciliation

Input: Design documents from /specs/160-operation-lifecycle-guarantees/
Prerequisites: plan.md (required), spec.md (required), research.md, data-model.md, contracts/, quickstart.md

Tests: Runtime behavior changes in this repo require Pest coverage. This feature changes queue lifecycle handling, Monitoring semantics, authorization-adjacent Monitoring truth, and Ops-UX guarantees, so tests are required for every user story. Operations: This feature hardens long-running and queued OperationRun execution. Tasks below preserve the Ops-UX 3-surface feedback contract, keep terminal truth service-owned through OperationRunService, keep summary_counts numeric-only, prevent queued or running DB notifications, preserve initiator-null notification behavior for system runs, and keep canonical View run navigation pointed at /admin/operations/{run}. RBAC: This feature changes Monitoring truth semantics in the admin /admin plane. Tasks below preserve deny-as-not-found for non-entitled workspace or tenant access, keep capability denial as 403 where applicable, continue using the capability registry, and add positive and negative authorization coverage for canonical run viewing. UI Naming: Lifecycle copy must use operator-safe domain language such as stale, reconciled, and infrastructure failure in primary UI surfaces and keep low-level queue exceptions in diagnostics only. Filament UI Action Surfaces: This feature modifies existing Filament Monitoring pages and resources without changing their core action inventory. Tasks below preserve existing inspect affordances, keep destructive-action rules unchanged, and retrofit lifecycle semantics into current header, row, and empty-state behavior. Filament UI UX-001: This feature is not a layout redesign. Tasks below keep the current Operations layouts intact while updating badges, diagnostics, and operator-first truth messaging inside the existing pages. Badges: Status-like semantics must continue to flow through BadgeCatalog-backed domain badge mappers in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Badges/Domains/OperationRunStatusBadge.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Badges/Domains/OperationRunOutcomeBadge.php. Contract Artifact: /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/160-operation-lifecycle-guarantees/contracts/operation-run-lifecycle.openapi.yaml is an internal Monitoring contract for freshness and reconciliation semantics, not a requirement to add new public controller endpoints.

Organization: Tasks are grouped by user story so each story can be implemented and tested independently.

Phase 1: Setup (Shared Infrastructure)

Purpose: Prepare the regression targets and touchpoints for lifecycle hardening.

  • T001 [P] Create or extend lifecycle service and middleware regression targets in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceStaleQueuedRunTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/TrackOperationRunMiddlewareTest.php
  • T002 [P] Create or extend reconciliation and scheduler regression targets in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileBackupScheduleOperationRunsCommandTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OpsUx/AdapterRunReconcilerTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/ReconcileAdapterRunsJobTrackingTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleReconciliationTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileOperationRunsCommandTest.php
  • T003 [P] Create or extend Monitoring and badge regression targets in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/MonitoringOperationsTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/OperationsDbOnlyRenderTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Unit/Badges/OperationRunBadgesTest.php
  • T004 [P] Create or extend authorization, queued-intent, canonical View run, and Ops-UX guard targets in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/TenantlessOperationRunViewerTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/RunAuthorizationTenantIsolationTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Notifications/OperationRunNotificationTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OpsUx/QueuedToastCopyTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OpsUx/NotificationViewRunLinkTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Guards/OperationLifecycleOpsUxGuardTest.php

Phase 2: Foundational (Blocking Prerequisites)

Purpose: Build the shared lifecycle policy and reconciliation infrastructure that all user stories depend on.

⚠️ CRITICAL: No user story work should begin until this phase is complete.

  • T005 Define the config-backed lifecycle coverage, terminal-truth-path matrix, and threshold registry for baseline_capture, baseline_compare, inventory_sync, policy.sync, policy.sync_one, entra_group_sync, directory_role_definitions.sync, backup_schedule_run, restore.execute, tenant.review_pack.generate, tenant.review.compose, and tenant.evidence.snapshot.generate in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/tenantpilot.php and align queue timing defaults in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/queue.php
  • T006 Create shared lifecycle policy and freshness support types in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Operations/OperationLifecyclePolicy.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Operations/OperationRunFreshnessState.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Operations/LifecycleReconciliationReason.php
  • T007 Extend /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/OperationRunService.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Models/OperationRun.php with generic stale-running assessment, standardized reconciliation metadata, and idempotent service-owned force-fail helpers for non-terminal runs
  • T008 Create the generic active-run reconciler in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Operations/OperationLifecycleReconciler.php and reuse existing legitimacy signals from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Operations/QueuedExecutionLegitimacyGate.php
  • T009 Register the generic reconciliation entry point in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Console/Commands/TenantpilotReconcileOperationRuns.php and schedule it from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/routes/console.php
  • T010 Add foundational coverage for lifecycle policy parsing, stale-running service transitions, and idempotent reconciliation in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceStaleQueuedRunTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleReconciliationTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileOperationRunsCommandTest.php

Checkpoint: Foundation ready. The repo has one shared lifecycle policy, one generic reconciliation seam, and service-owned APIs that stories can adopt independently.


Phase 3: User Story 1 - Force Terminal Truth For Orphaned Runs (Priority: P1) 🎯 MVP

Goal: Ensure every covered queued operation converges to deterministic terminal truth when normal queue cleanup does not.

Independent Test: Create covered OperationRun records in queued and running, prevent normal finalization, advance time past the configured threshold, run the generic reconciler or direct failure bridge, and verify the run becomes completed/failed with operator-safe reconciliation evidence.

Tests for User Story 1

  • T011 [P] [US1] Add stale queued, stale running, fresh-run non-interference, and idempotency coverage in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceStaleQueuedRunTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleReconciliationTest.php
  • T012 [P] [US1] Add direct queue-failure bridge coverage for exhausted-attempt and timeout paths in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/TrackOperationRunMiddlewareTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationRunFailedJobBridgeTest.php
  • T013 [P] [US1] Add reconciliation command and coexistence coverage for scheduled healing paths in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileOperationRunsCommandTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileBackupScheduleOperationRunsCommandTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OpsUx/AdapterRunReconcilerTest.php

Implementation for User Story 1

  • T014 [US1] Implement service-owned stale queued and stale running force-fail transitions in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/OperationRunService.php
  • T015 [US1] Implement the generic lifecycle reconciliation flow and structured reconciliation payloads in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Operations/OperationLifecycleReconciler.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Console/Commands/TenantpilotReconcileOperationRuns.php
  • T016 [US1] Integrate the new generic reconciler with existing type-specific healing in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Console/Commands/TenantpilotReconcileBackupScheduleOperationRuns.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/AdapterRunReconciler.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/ReconcileAdapterRunsJob.php
  • T017 [US1] Create a reusable failed-job bridge in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/Concerns/BridgesFailedOperationRun.php, normalize the existing direct bridge in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/BulkBackupSetRestoreJob.php, and add missing direct bridges for /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CaptureBaselineSnapshotJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CompareBaselineToTenantJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/BulkTenantSyncJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/SyncPoliciesJob.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/ComposeTenantReviewJob.php while preserving scheduled reconciliation for covered types marked fallback-only in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/tenantpilot.php
  • T018 [US1] Preserve queued-intent, canonical View run, completion, and initiator-only notification guarantees across representative start surfaces in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Pages/BaselineCompareLanding.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/BackupScheduleResource.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/RestoreRunResource.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/InventoryItemResource/Pages/ListInventoryItems.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/ReviewPackResource.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/Middleware/TrackOperationRun.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Notifications/OperationRunCompleted.php

Checkpoint: User Story 1 is complete when orphaned covered runs no longer require manual DB repair and always converge to terminal failed truth through direct failure bridging or reconciliation.


Phase 4: User Story 2 - Show Honest Liveness In Monitoring (Priority: P2)

Goal: Make Monitoring distinguish fresh activity, likely stale activity, and reconciled failure without implying indefinite normal progress.

Independent Test: Seed fresh active runs, stale runs, and reconciled-failed runs, then verify the Operations index and canonical run detail show distinct operator-safe semantics while canonical authorization remains intact.

Tests for User Story 2

  • T019 [P] [US2] Add Operations index, aggregate reconciliation visibility, and run-detail truth-semantics coverage in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/MonitoringOperationsTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/OperationLifecycleAggregateVisibilityTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php
  • T020 [P] [US2] Add freshness-state and badge mapping coverage in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Unit/Badges/OperationRunBadgesTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php
  • T021 [P] [US2] Add positive and negative canonical Monitoring authorization coverage for stale or reconciled runs in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/TenantlessOperationRunViewerTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/RunAuthorizationTenantIsolationTest.php

Implementation for User Story 2

  • T022 [US2] Extend centralized lifecycle presentation in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/OpsUx/OperationUxPresenter.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/OpsUx/RunDurationInsights.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/ReasonTranslation/ReasonPresenter.php
  • T023 [US2] Implement fresh, stale, and reconciled badge semantics in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Badges/Domains/OperationRunStatusBadge.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Badges/Domains/OperationRunOutcomeBadge.php
  • T024 [US2] Update Operations list filtering, minimal aggregate reconciliation visibility, query semantics, and default-visible lifecycle truth in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Pages/Monitoring/Operations.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/OperationRunResource.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/resources/views/filament/pages/monitoring/operations.blade.php
  • T025 [US2] Update canonical run detail messaging and diagnostics disclosure in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Pages/Operations/TenantlessOperationRunViewer.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/resources/views/filament/pages/operations/tenantless-operation-run-viewer.blade.php
  • T026 [US2] Keep /admin/operations and /admin/operations/{run} DB-only and canonical-navigation-safe while exposing lifecycle truth in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Pages/Monitoring/Operations.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Pages/Operations/TenantlessOperationRunViewer.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Filament/Resources/OperationRunResource.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/OperationsDbOnlyRenderTest.php

Checkpoint: User Story 2 is complete when operators can distinguish normal active work from stale or reconciled runs on Monitoring surfaces without losing canonical authorization or DB-only rendering guarantees.


Phase 5: User Story 3 - Prevent Repeat Incidents Through Lifecycle Contracts (Priority: P3)

Goal: Enforce explicit lifecycle policy, timeout strategy, and guardrails so covered jobs cannot silently drift back into ambiguous run truth.

Independent Test: Verify that the covered lifecycle policy rejects misaligned timeout versus retry_after settings, covered jobs declare explicit lifecycle behavior, and Ops-UX guard tests fail if service ownership or notification constraints regress.

Tests for User Story 3

  • T027 [P] [US3] Add lifecycle policy and timeout invariant coverage in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleTimingGuardTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Unit/Operations/OperationLifecyclePolicyValidatorTest.php
  • T028 [P] [US3] Add covered-job lifecycle contract coverage in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/BaselineOperationRunGuardTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Inventory/RunInventorySyncJobTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/BackupScheduling/RunBackupScheduleJobCompatibilityTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/TenantReview/TenantReviewOperationsUxTest.php
  • T029 [P] [US3] Add Ops-UX regression guard coverage for service-owned transitions, notification discipline, and initiator-null behavior in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Notifications/OperationRunNotificationTest.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Guards/OperationLifecycleOpsUxGuardTest.php

Implementation for User Story 3

  • T030 [US3] Implement lifecycle policy validation and timeout-versus-retry_after enforcement for the exact covered V1 operation set in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Services/Operations/OperationLifecyclePolicyValidator.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/tenantpilot.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config/queue.php
  • T031 [US3] Align covered job timeout and failure-contract declarations in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CaptureBaselineSnapshotJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/CompareBaselineToTenantJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/BulkBackupSetRestoreJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/BulkTenantSyncJob.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/SyncPoliciesJob.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Jobs/ComposeTenantReviewJob.php
  • T032 [US3] Preserve canonical Monitoring authorization and capability semantics for reconciled lifecycle states in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Policies/OperationRunPolicy.php and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/Operations/OperationRunCapabilityResolver.php
  • T033 [US3] Normalize operator-safe lifecycle copy and diagnostics boundaries in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/ReasonTranslation/ReasonPresenter.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app/Support/OpsUx/OperationUxPresenter.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/resources/views/filament/pages/monitoring/operations.blade.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/resources/views/filament/pages/operations/tenantless-operation-run-viewer.blade.php

Checkpoint: User Story 3 is complete when covered jobs and runtime settings have explicit lifecycle contracts and guard tests catch timing or ownership regressions before they reintroduce orphaned runs.


Phase 6: Polish & Cross-Cutting Concerns

Purpose: Validate the full feature slice, format touched files, and complete the manual smoke pass.

  • T034 [P] Run the focused Pest suites from /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/160-operation-lifecycle-guarantees/quickstart.md covering /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/OperationRunServiceStaleQueuedRunTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/TrackOperationRunMiddlewareTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleReconciliationTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Console/ReconcileOperationRunsCommandTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationRunFailedJobBridgeTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Monitoring/MonitoringOperationsTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/TenantlessOperationRunViewerTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/RunAuthorizationTenantIsolationTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Feature/Operations/OperationLifecycleTimingGuardTest.php, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Unit/Badges/OperationRunBadgesTest.php, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests/Unit/Operations/OperationLifecyclePolicyValidatorTest.php
  • T035 Run formatting for touched files under /Users/ahmeddarrazi/Documents/projects/TenantAtlas/app, /Users/ahmeddarrazi/Documents/projects/TenantAtlas/config, and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/tests with vendor/bin/sail bin pint --dirty --format agent
  • T036 [P] Validate the manual smoke checklist in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/160-operation-lifecycle-guarantees/quickstart.md against /admin/operations, /admin/operations/{run}, and the affected operation start surfaces for baseline capture, baseline compare, restore execution, backup schedule execution, inventory sync, and tenant review generation
  • T037 [P] Document worker timeout, retry_after, queue:restart, and stop-wait expectations in /Users/ahmeddarrazi/Documents/projects/TenantAtlas/specs/160-operation-lifecycle-guarantees/quickstart.md and /Users/ahmeddarrazi/Documents/projects/TenantAtlas/docs/HANDOVER.md

Dependencies & Execution Order

Phase Dependencies

  • Phase 1: Setup has no dependencies and can start immediately.
  • Phase 2: Foundational depends on Phase 1 and blocks all user story work.
  • Phase 3: User Story 1 depends on Phase 2 and delivers the MVP.
  • Phase 4: User Story 2 depends on Phase 2 and benefits from User Story 1 because it needs reconciled lifecycle evidence to display.
  • Phase 5: User Story 3 depends on Phase 2 and is safest after User Story 1 because it formalizes the lifecycle policy used by the reconciliation and failed-job bridge paths.
  • Phase 6: Polish depends on all desired user stories being complete.

User Story Dependencies

  • User Story 1 (P1) can start immediately after the foundational phase and is the MVP slice.
  • User Story 2 (P2) can start after the foundational phase but is easiest once User Story 1 provides the reconciled lifecycle states to present.
  • User Story 3 (P3) can start after the foundational phase but should land after User Story 1 to avoid validating contracts against outdated bridge behavior.

Within Each User Story

  • Write or extend tests first and confirm they fail before implementation.
  • Policy and support-layer changes should land before command, job, and UI adoption.
  • Reconciliation flow should stabilize before UI semantics consume its metadata.
  • Story-level regression coverage should pass before moving to the next priority story.

Parallel Opportunities

  • T001, T002, T003, and T004 can run in parallel because they prepare separate regression targets.
  • T005 and T006 can run in parallel before the service and reconciler wiring tasks.
  • T011, T012, and T013 can run in parallel within User Story 1.
  • T019, T020, and T021 can run in parallel within User Story 2.
  • T027, T028, and T029 can run in parallel within User Story 3.
  • T034, T036, and T037 can run in parallel after implementation is complete.

Parallel Example: User Story 1

# Run the P1 regression additions together:
Task: "Add stale queued, stale running, fresh-run non-interference, and idempotency coverage in tests/Feature/OperationRunServiceStaleQueuedRunTest.php and tests/Feature/Operations/OperationLifecycleReconciliationTest.php"
Task: "Add direct queue-failure bridge coverage for exhausted-attempt and timeout paths in tests/Feature/TrackOperationRunMiddlewareTest.php and tests/Feature/Operations/OperationRunFailedJobBridgeTest.php"
Task: "Add reconciliation command and coexistence coverage for scheduled healing paths in tests/Feature/Console/ReconcileOperationRunsCommandTest.php, tests/Feature/Console/ReconcileBackupScheduleOperationRunsCommandTest.php, and tests/Feature/OpsUx/AdapterRunReconcilerTest.php"

Parallel Example: User Story 2

# Split list/detail truth semantics, badge semantics, and auth coverage:
Task: "Add Operations index and run-detail truth-semantics coverage in tests/Feature/Monitoring/MonitoringOperationsTest.php and tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php"
Task: "Add freshness-state and badge mapping coverage in tests/Unit/Badges/OperationRunBadgesTest.php and tests/Feature/Monitoring/OperationLifecycleFreshnessPresentationTest.php"
Task: "Add positive and negative canonical Monitoring authorization coverage for stale or reconciled runs in tests/Feature/Operations/TenantlessOperationRunViewerTest.php and tests/Feature/RunAuthorizationTenantIsolationTest.php"

Parallel Example: User Story 3

# Split policy, job-contract, and Ops-UX guard work:
Task: "Add lifecycle policy and timeout invariant coverage in tests/Feature/Operations/OperationLifecycleTimingGuardTest.php and tests/Unit/Operations/OperationLifecyclePolicyValidatorTest.php"
Task: "Add covered-job lifecycle contract coverage in tests/Feature/Operations/BaselineOperationRunGuardTest.php, tests/Feature/Inventory/RunInventorySyncJobTest.php, tests/Feature/BackupScheduling/RunBackupScheduleJobCompatibilityTest.php, and tests/Feature/TenantReview/TenantReviewOperationsUxTest.php"
Task: "Add Ops-UX regression guard coverage for service-owned transitions, notification discipline, and initiator-null behavior in tests/Feature/Notifications/OperationRunNotificationTest.php and tests/Feature/Guards/OperationLifecycleOpsUxGuardTest.php"

Implementation Strategy

MVP First

  1. Complete Phase 1: Setup.
  2. Complete Phase 2: Foundational.
  3. Complete Phase 3: User Story 1.
  4. Stop and validate that orphaned queued and running runs now converge to terminal failed truth without manual intervention.

Incremental Delivery

  1. Deliver User Story 1 to close the integrity gap and establish direct failure bridging plus stale-run healing.
  2. Deliver User Story 2 to make Monitoring surfaces reflect that lifecycle truth honestly.
  3. Deliver User Story 3 to formalize covered-job contracts and timing guards so the same incident class does not recur.
  4. Finish with Phase 6 regression execution, formatting, and manual smoke validation.

Team Strategy

  1. One engineer should own the foundational lifecycle policy and reconciliation seam in app/Services/OperationRunService.php, app/Services/Operations/OperationLifecycleReconciler.php, and config/tenantpilot.php.
  2. A second engineer can prepare User Story 1 regression coverage in parallel during the foundational phase.
  3. Monitoring and badge semantics for User Story 2 can be developed separately once the reconciled metadata contract is stable.

Notes

  • [P] tasks touch separate files and can be executed in parallel.
  • Each user story remains independently testable after the foundational phase.
  • This feature does not require a schema migration in the first slice.
  • Keep lifecycle truth service-owned and operator-facing copy domain-safe across every touched surface.