## Summary - add canonical onboarding lifecycle and checkpoint fields plus optimistic locking versioning for managed tenant onboarding drafts - introduce centralized onboarding lifecycle and mutation services and route wizard mutations through version-checked writes - convert Verify Access and Bootstrap into live checkpoint-driven wizard states with conditional polling and updated browser/feature/unit coverage - add Spec Kit artifacts for feature 140, including spec, plan, tasks, research, data model, quickstart, checklist, and logical contract ## Validation - branch was committed and pushed cleanly - focused tests and formatting were updated during implementation work - full validation was not re-run as part of this final git/PR step ## Notes - base branch: `dev` - feature branch: `140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp` - outstanding follow-up items, if any, remain tracked in `specs/140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp/tasks.md` Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #169
47 KiB
Feature Specification: Onboarding Lifecycle, Operation Checkpoints & Concurrency MVP
Feature Branch: 140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp
Created: 2026-03-14
Status: Draft
Input: User description: "Keep the existing Filament wizard as the primary onboarding shell, convert Verify Access and Bootstrap into operation-backed checkpoints, introduce a pragmatic handlungsorientiertes lifecycle model, and add optimistic locking as the concurrency MVP. Do not redesign the whole flow or introduce lease/takeover in this spec."
Spec Scope Fields (mandatory)
- Scope: workspace
- Primary Routes:
/admin/onboarding/admin/onboarding/{onboardingDraft}
- Data Ownership:
- Managed tenant onboarding drafts remain workspace-scoped workflow records.
- Managed tenants remain tenant/workspace-owned domain records.
- Provider connections remain tenant-bound operational records.
- Verification and bootstrap runs remain
OperationRunrecords and continue to own execution truth. - This spec introduces no parallel onboarding-only operation backend.
- RBAC:
- Existing onboarding capabilities continue to govern access to view, edit, resume, cancel, verify, bootstrap, and activate onboarding drafts.
- Existing owner-level activation restrictions remain in effect.
- Existing provider-connection and tenant authorization semantics remain the source of truth.
- Non-members or actors outside workspace scope remain deny-as-not-found.
- This spec must not weaken current authorization boundaries.
User Scenarios & Testing (mandatory)
User Story 1 - Track a trustworthy lifecycle in one wizard (Priority: P1)
As an onboarding operator, I want the existing wizard to show a clear, canonical onboarding lifecycle so that I can tell whether a draft is still being prepared, actively verifying, blocked, bootstrapping, ready for activation, or already closed.
Why this priority: Lifecycle ambiguity is the root trust problem. Operators cannot safely resume or triage onboarding work if the system does not expose a single queryable workflow truth.
Independent Test: Create and resume onboarding drafts across normal, blocked, active, ready, completed, and cancelled states, then confirm each draft persists the expected lifecycle and checkpoint metadata without changing the current routing model.
Acceptance Scenarios:
- Given a new onboarding draft is created, When the draft is first persisted, Then it starts in
draftwith checkpoint metadata that reflects the current stage of progress. - Given a draft has a relevant verification run in progress, When the operator views the wizard, Then the draft is represented as
verifyingrather than as a passive form state. - Given verification or bootstrap becomes blocked, stale, failed, or otherwise unsafe for progression, When the workflow is recalculated, Then the draft moves to
action_requiredwith machine-readable blocker context.
User Story 2 - Monitor Verify Access and Bootstrap as live checkpoints (Priority: P1)
As an onboarding operator, I want Verify Access and Bootstrap to behave like live operation-backed checkpoints so that I do not need to manually refresh or guess whether a background run is still active, finished successfully, or now requires attention.
Why this priority: Verify Access and Bootstrap already coordinate background work. Treating them like ordinary form pages undermines operator trust and makes recovery slower.
Independent Test: Start verification and bootstrap from the existing wizard, keep the page open while the relevant runs transition, and confirm the step state updates automatically from backend truth without leaving the wizard or replacing Spec 139's in-step assist behavior.
Acceptance Scenarios:
- Given a relevant verification run is queued or running, When the operator remains on Step 3, Then the step polls and renders current backend-derived checkpoint status until the run becomes terminal.
- Given selected bootstrap runs are queued or running, When the operator remains on Step 4, Then the step polls and renders per-operation status until all relevant runs become terminal.
- Given the relevant operation reaches a terminal result while the wizard is open, When polling refreshes the step, Then the lifecycle and next-step guidance update without manual refresh.
User Story 3 - Prevent silent overwrite across tabs and operators (Priority: P1)
As an onboarding operator, I want stale mutations to be rejected clearly so that another tab or operator cannot silently overwrite newer onboarding draft state.
Why this priority: Last-write-wins behavior is not acceptable for a workflow coordinating provider connections, background runs, and activation readiness.
Independent Test: Open the same onboarding draft in two tabs or sessions, commit a mutation in one tab, then submit a stale mutation in the other and confirm the second action is rejected atomically with clear refresh guidance and no false success state.
Acceptance Scenarios:
- Given two sessions hold the same onboarding draft, When one session saves a newer draft version first, Then a stale submission from the other session is rejected atomically.
- Given a stale mutation is rejected, When the operator remains on the wizard, Then the UI shows a clear conflict message and does not pretend the stale action succeeded.
- Given a draft becomes completed or cancelled in one session, When another stale session attempts a mutation, Then the mutation is rejected and the closed draft remains non-editable.
User Story 4 - Activate only when backend truth is actually ready (Priority: P2)
As an operator with activation authority, I want the final activation step to re-evaluate backend truth before committing so that activation cannot proceed on stale assumptions from an outdated page state.
Why this priority: Activation is the irreversible checkpoint. It must be gated by current backend truth rather than optimistic UI state.
Independent Test: Bring a draft to a seemingly ready state, invalidate one of the readiness conditions in another session or via operation state changes, then attempt activation and confirm the system blocks activation until the lifecycle returns to a valid ready state.
Acceptance Scenarios:
- Given verification is current and sufficient and bootstrap is either not selected or complete, When the operator activates the draft, Then activation succeeds and the draft moves to
completed. - Given one of the readiness gates becomes invalid before activation commits, When activation is attempted, Then activation is rejected and the draft remains in a non-completed lifecycle state.
- Given activation succeeds, When the workflow closes, Then the draft becomes a historical non-editable record.
Edge Cases
- Draft exists but provider connection has never been selected.
- Verification run exists but belongs to a no-longer-selected provider connection.
- Verification completed successfully, then provider connection changes.
- Verification report is stale for the current connection.
- Verification is rerun while old failed results are still visible.
- Bootstrap was not selected at all.
- Bootstrap was selected, one run succeeded, another failed.
- Activation page is open when a background run result changes state.
- Stale tab attempts to start verification after another tab changed provider connection.
- Stale tab attempts activation after another session cancelled the draft.
- Stale tab attempts bootstrap after verification was invalidated.
- Draft is cancelled while another tab remains open.
- Polling sees a terminal run and must stop cleanly.
- Conflict occurs during a mutation that also changes lifecycle state.
- Existing deep-dive assist behavior from Spec 139 remains open or usable while Step 3 lifecycle changes.
- Refresh occurs mid-verification or mid-bootstrap.
- Operator resumes next day with a terminal run already complete.
Requirements (mandatory)
Constitution alignment (required): This feature reuses the existing onboarding wizard, verification operation path, bootstrap operation path, activation flow, and OperationRun backend. It does not introduce new Microsoft Graph call surfaces, new onboarding routes, or a parallel operation backend. Verification and bootstrap remain the canonical execution truth, while the draft gains canonical lifecycle metadata for queryability and gating.
Constitution alignment (OPS-UX): Verify Access and Bootstrap explicitly reuse existing OperationRun behavior and must continue to follow the Ops-UX 3-surface contract: intent feedback when the operation is started, a live progress surface inside the wizard while the relevant run is active, and existing Monitoring or terminal-notification behavior for operation completion. OperationRun.status and OperationRun.outcome remain service-owned. Any summary data surfaced from those runs must continue to use the existing allowed summary key set and numeric-only summary values. No new passive poll cycle may create duplicate notifications or audit spam. Regression coverage must prove lifecycle, checkpoint rendering, and readiness evaluation remain consistent with the existing operation service contract.
Constitution alignment (RBAC-UX): This feature stays in the workspace-admin plane under /admin. Non-members or actors outside workspace scope remain 404; in-scope members missing the relevant onboarding capability remain 403. Authorization must continue to be enforced server-side for view, edit, verify, bootstrap, activate, cancel, and any persisted override or checkpoint mutation. No raw capability strings or role checks may be introduced. Positive and negative authorization coverage must prove the lifecycle and concurrency hardening does not leak tenant existence or expand authority.
Constitution alignment (OPS-EX-AUTH-001): No new /auth/* behavior is introduced. This spec does not create a new exception path around operation monitoring or lifecycle persistence.
Constitution alignment (BADGE-001): Any lifecycle, checkpoint, or status badges added or adjusted in the wizard must keep using centralized state semantics for queued, running, action-required, ready, completed, cancelled, stale, and warning conditions. The feature must not introduce ad-hoc badge mappings in page code.
Constitution alignment (UI-NAMING-001): Operator-facing labels must stay task-oriented and consistent across the wizard, notifications, monitoring references, and any conflict or checkpoint feedback. Primary terms include Verify access, Bootstrap, Ready for activation, Action required, Activation blocked, Cancel onboarding, and the existing Spec 139 labels such as View required permissions. Implementation-first terms such as version mismatch, lifecycle resolver, checkpoint metadata, or stale payload must not become primary operator copy.
Constitution alignment (Filament v5 / Livewire v4): Livewire v4.0+ remains the compatibility target for this onboarding surface. The existing Filament wizard remains the primary shell, no new panel is introduced, and provider registration remains unchanged in bootstrap/providers.php. No new global-searchable Resource behavior is introduced; this spec only changes wizard-step lifecycle semantics.
Constitution alignment (Filament Action Surfaces): The Action Surface Contract is satisfied with an explicit exemption for Step 3 and Step 4 because they are composite in-step checkpoint surfaces rather than CRUD list or table surfaces. The wizard shell remains intact, and Spec 139's Verify Access assist remains additive inside Step 3 rather than becoming a separate view.
Constitution alignment (UX-001 — Layout & Information Architecture): The feature extends the existing onboarding wizard instead of adding new create, edit, or view pages. Step 3 and Step 4 may increase emphasis on status, running-state explanation, and next actions, but they remain part of the same wizard flow and do not require a new route or new information architecture. Any conflict banner or inline checkpoint state must remain subordinate to the wizard layout and preserve continuity.
Constitution alignment (destructive actions and confirmation): No new destructive actions are introduced. The existing Cancel onboarding action remains the only destructive action in scope and must continue to execute through an explicit action callback with confirmation and existing authorization. Verify, rerun, bootstrap, and activate are not destructive, but they must remain server-authorized and lifecycle-gated.
Constitution alignment (asset strategy): This feature should reuse existing panel assets and Livewire behavior. It must not require new shared frontend assets or a custom package asset pipeline. Deployment continues to run php artisan filament:assets in the existing deploy process when Filament-registered assets are present, but this spec should not add a new asset requirement.
Objective
Harden managed tenant onboarding into a trustworthy enterprise workflow without replacing the current Filament wizard shell. This spec establishes a canonical onboarding lifecycle model that is handlungsorientiert and queryable, operation-backed checkpoint semantics for Verify Access and Bootstrap, a clear boundary between UI state, persisted draft state, backend operation truth, and activation state, and optimistic locking as the MVP concurrency guardrail across all onboarding draft mutations.
Why
The current onboarding flow is already functionally strong, but it still behaves too much like a classic form wizard where it is actually coordinating background operations and cross-step workflow state. The main gaps are lifecycle ambiguity, mixed state ownership across UI and backend truth, operation steps that feel like form pages, manual-refresh trust gaps during active runs, silent multi-tab or multi-operator overwrite risk, and the absence of a handlungsorientierter lifecycle that can answer which drafts need action now.
In Scope
- Keep onboarding in the existing Filament wizard shell.
- Convert Step 3
Verify Accessinto an operation-backed checkpoint. - Convert Step 4
Bootstrapinto an optional operation-backed checkpoint. - Introduce canonical onboarding lifecycle states:
draft,verifying,action_required,bootstrapping,ready_for_activation,completed, andcancelled. - Introduce supporting precision fields:
current_checkpoint,last_completed_checkpoint,reason_code, andblocking_reason_codewhere relevant. - Add polling-based active-session refresh for operation-backed steps.
- Add optimistic locking through numeric draft versioning.
- Apply version checks to all relevant onboarding draft mutations.
- Preserve current routing, wizard continuity, authorization, operation backend, and activation flow.
- Add regression coverage for lifecycle transitions, polling-compatible checkpoint behavior, mutation conflict protection, and Spec 139 compatibility.
Out of Scope
- Replacing the wizard with a separate checklist or new onboarding application.
- Introducing edit leases, takeovers, or claimed-by timers.
- Introducing new routes for onboarding checkpoints.
- Introducing WebSocket or SSE infrastructure.
- Replacing existing
OperationRunmonitoring pages. - Rebuilding Required Permissions or other verification deep-dive pages.
- Broad onboarding copy or visual redesign beyond what lifecycle clarity requires.
- Introducing a new onboarding dashboard.
- Broad RBAC redesign.
Current Problems
- Lifecycle ambiguity: the workflow state is mostly inferred from JSON and runs rather than represented canonically.
- Mixed state concerns: transient form state, persisted draft state, background operation truth, and activation truth are blurred together.
- Operation steps feel like forms: Verify Access and Bootstrap start and monitor background work but still read too much like normal pages.
- Manual-refresh trust gap: active operations can finish while the operator keeps seeing stale results.
- Silent overwrite risk: last-write-wins remains possible across stale tabs or multiple operators.
- No queryable handlungsorientierter lifecycle: list or triage views cannot reliably answer which drafts are waiting, blocked, or ready without reconstructing workflow semantics.
Target Architecture
The onboarding experience remains one wizard shell, but not every step behaves like a normal form page.
- Step 1: Identify Managed Tenant → form step.
- Step 2: Connect Provider → form step.
- Step 3: Verify Access → operation-backed checkpoint.
- Step 4: Bootstrap → optional operation-backed checkpoint.
- Step 5: Complete / Activate → final gated checkpoint.
The implementation must explicitly respect these state layers:
- UI state: Livewire or Filament local state, open panels, selected local values not yet committed.
- Persisted draft state: the saved onboarding draft record, lifecycle state, checkpoint metadata, selected provider connection, selected bootstrap options, and version.
- Backend operation truth:
OperationRunstatus, run context, verification report, and bootstrap run outcomes. - Completion or activation truth: tenant activation state, completion timestamp, and activation override semantics.
The wizard may render local state, but checkpoint status and progression rules must be based on persisted draft state plus backend operation truth, not optimistic local assumptions.
Lifecycle Model
Canonical Lifecycle States
draftMeaning: the onboarding draft exists, no active governing checkpoint operation is running, and the draft is not yet ready for activation.verifyingMeaning: a relevant Verify Access operation is queued or running for the currently selected provider connection.action_requiredMeaning: the onboarding cannot safely proceed without operator intervention.bootstrappingMeaning: one or more selected bootstrap operations are queued or running and remain relevant to the current draft.ready_for_activationMeaning: all required gating conditions are satisfied and the workflow is waiting for final activation.completedMeaning: activation succeeded and the onboarding draft is now a historical workflow record.cancelledMeaning: the draft was intentionally cancelled and is now a historical workflow record.
Supporting Precision Fields
current_checkpoint: one ofidentify,connect_provider,verify_access,bootstrap, orcomplete_activate.last_completed_checkpoint: one of the same controlled values, ornullif no checkpoint has been satisfied yet.reason_code: nullable machine-readable precision for the current lifecycle state.blocking_reason_code: nullable machine-readable blocker code when forward progress is explicitly blocked.
Examples of supported precision include verification_blocked_permissions, verification_failed, provider_connection_changed, verification_result_stale, bootstrap_failed, bootstrap_partial_failure, and owner_activation_required.
Lifecycle Transition Rules
- Enter
draftwhen a new draft is created, the operator remains in Step 1 or Step 2, verification has not started for the current provider connection, or prior valid checkpoint output has been invalidated and no active or blocked state should govern yet. - Enter
verifyingwhen the operator starts Verify Access and a relevant verification run is created or reused for the current provider connection. - Exit
verifyingwhen the relevant run becomes terminal, transitioning toready_for_activation,bootstrapping,action_required, or back todraftdepending on the run outcome and current relevance. - Enter
action_requiredwhen verification or bootstrap finishes blocked or failed, when verification becomes stale or mismatched for the current provider connection, or when another explicit progression blocker exists. - Exit
action_requiredonly when the blocking cause is actually resolved, transitioning toverifying,bootstrapping,ready_for_activation, ordraft. - Enter
bootstrappingwhen verification is sufficiently passed, one or more bootstrap operations are selected, and those runs are successfully dispatched and active. - Exit
bootstrappingwhen all relevant selected bootstrap runs are terminal, transitioning toready_for_activation,action_required, ordraftif bootstrap intent was reset before actual dispatch. - Enter
ready_for_activationonly when verification is current and sufficient for the selected provider connection, no relevant verification run remains active, all selected bootstrap operations are complete if bootstrap was chosen, no unresolved blockers remain, and the draft is not already completed or cancelled. - Enter
completedonly when activation succeeds. - Enter
cancelledonly through an explicit cancel action on an editable draft.
Checkpoint Semantics for Step 3 and Step 4
Step 3 - Verify Access
Step 3 remains visually a wizard step but functionally becomes an operation-backed checkpoint.
- Starting verification creates or reuses a relevant operation run.
- While the relevant run is queued or running, the step behaves as active checkpoint monitoring.
- The step must surface backend-derived status rather than only last-rendered UI state.
- Polling starts when lifecycle is
verifyingand the relevant run is queued or running. - Polling stops when the relevant run becomes terminal, the draft is no longer on the checkpoint, or the page leaves the relevant editing context.
- During active monitoring the step must show running state, current checkpoint meaning, and live-updating next-step or remediation messaging.
- On completion the step must resolve into passed/current, blocked or failed, stale or mismatched, or otherwise
action_required. - Rerun keeps using the existing verification path, returns the lifecycle to
verifying, and remains subject to version checks when the draft mutation is persisted. - Spec 139's in-step Required Permissions assist remains additive and must continue to work with this checkpoint model, including its new-tab deep-dive continuity.
Step 4 - Bootstrap
Step 4 remains visually a wizard step but functionally becomes an optional operation-backed checkpoint.
- Selecting bootstrap operation types remains a normal draft mutation.
- Starting bootstrap changes the step into active monitoring of relevant selected runs.
- The step must show backend truth for all selected runs relevant to this draft.
- Polling starts when lifecycle is
bootstrappingand one or more relevant bootstrap runs are queued or running. - Polling stops when all relevant selected runs are terminal or the draft leaves the monitoring state.
- During active monitoring the step must show selected operations, per-operation status, whether the workflow is still waiting, and the next valid operator action.
- If bootstrap was not selected, the operator may continue toward activation once all non-bootstrap gates are satisfied.
Concurrency and Optimistic Locking Model
This spec adopts optimistic locking as mandatory MVP behavior and explicitly does not introduce leases, claimed-by timers, takeover UI, or collaborative editing.
Versioning
- The onboarding draft must receive a numeric
versioncolumn. - Version starts at
1on create. - Version increments on every successful relevant mutation.
- Stale version submissions must be rejected atomically.
Relevant Mutations Requiring Version Checks
- Identify step commit.
- Provider connection selection or change.
- Inline provider connection creation when it mutates draft state.
- Verification start or rerun.
- Bootstrap selection changes.
- Bootstrap start.
- Activation.
- Cancel draft.
- Persisted override toggles or reasons.
- Any mutation that changes lifecycle state, checkpoint fields, reason codes, or selected run references.
Conflict Behavior
When the submitted expected version does not match the persisted draft version:
- the mutation must be rejected atomically,
- no partial save may occur,
- the UI must show a clear conflict message,
- the UI must not imply success,
- the user must be prompted to refresh and retry.
Recommended operator copy:
This onboarding draft was changed by another session before your action could be saved. Refresh the page to load the latest state, then retry.
Where practical, the conflict feedback may also include who last updated the draft and when.
Data Model Changes
The onboarding draft table must gain the following top-level fields:
versionbigint or integer, not null, default1lifecycle_statecontrolled string or enum, not null, defaultdraftcurrent_checkpointnullable controlled string or enumlast_completed_checkpointnullable controlled string or enumreason_codenullable stringblocking_reason_codenullable string
Controlled enum or value-object semantics should be used for lifecycle_state, current_checkpoint, and last_completed_checkpoint wherever practical. Existing JSON state may remain for low-level step data, selected options, run references, report reference metadata, and other detailed context, but the new top-level lifecycle fields become the canonical queryable workflow truth.
UI and UX Behavior
- The UI remains one onboarding wizard.
- Step 1 and Step 2 remain form-driven steps.
- Step 3 and Step 4 remain wizard steps visually, but render as checkpoint-oriented surfaces that distinguish running, action-required, ready, and rerun states.
- Step 3 and Step 4 must use an active
5spolling cadence while their relevant runs remain queued or running, matching the existing active-operation detail patterns already used in the product. - The UI must make it understandable that verification and bootstrap are being monitored live and that results update automatically while the page remains open.
- Unsaved local form edits must not be mistaken for committed state.
- Activation must evaluate current backend truth rather than stale visible assumptions.
- If bootstrap is optional and not selected, activation can become ready after successful current verification.
- If bootstrap is selected, activation cannot become ready until the selected runs complete successfully.
- On optimistic locking conflict, the page must not redirect away, must not mutate visible state as if saved, and must encourage refresh.
Backend Behavior
Lifecycle Recalculation
The system must consistently maintain top-level lifecycle fields whenever relevant draft or operation state changes. The recalculation logic must be deterministic and shared through a centralized lifecycle recalculation mechanism rather than being scattered ad hoc across page methods.
Operation-Backed Checkpoints
- Verification start must dispatch the existing operation path, persist the relevant run reference, set lifecycle and checkpoint fields appropriately, and increment version if draft state changed.
- When a verification run result changes, the next poll or reload must derive and persist the correct lifecycle state if needed, and stale or mismatched results must never continue masquerading as current.
- Bootstrap start must persist selected bootstrap intent if needed, dispatch the existing bootstrap operations, set lifecycle and checkpoint fields appropriately, and increment version.
- When bootstrap results change, the next poll or reload must derive the correct lifecycle transition.
Activation
Activation must perform a fresh backend-truth gate evaluation, reject activation if lifecycle gates are no longer satisfied, persist the completed state when activation succeeds, finalize the onboarding record, and increment version if applicable before the workflow closes.
Audit and Observability
This spec does not require new audit events for every passive poll or render. It must continue auditing meaningful state changes including draft creation or resume, provider connection changes, verification start, bootstrap start, activation, cancel, and override-related actions. A conflict-rejected mutation audit event is recommended where practical. Lifecycle transitions should only create audit noise when they are already part of a meaningful user or service workflow event. Lifecycle state changes must be inspectable in the database and tests, relevant run references must remain traceable, stale or mismatched results must be reproducible in focused tests, and conflict cases must remain diagnosable.
Policies and Authorization Implications
- This spec preserves existing policy boundaries.
- Starting verification does not grant new authority.
- Starting bootstrap does not grant new authority.
- Activation remains subject to current activation rules.
- Cancel remains subject to current edit and cancel rules.
- Optimistic locking is mutation safety, not authorization; policy checks still run first.
- Completed and cancelled drafts remain read-only historical records.
Functional Requirements
- FR-140-01 Wizard continuity: The onboarding experience must remain inside the existing Filament wizard shell.
- FR-140-02 Verify checkpoint model: Verify Access must behave as an operation-backed checkpoint rather than a passive form page.
- FR-140-03 Bootstrap checkpoint model: Bootstrap must behave as an optional operation-backed checkpoint rather than a passive option page after dispatch.
- FR-140-04 Canonical lifecycle state: Each onboarding draft must have a persisted canonical
lifecycle_state. - FR-140-05 Checkpoint precision: Each onboarding draft must persist
current_checkpointand supportlast_completed_checkpoint. - FR-140-06 Reason precision: The model must support machine-readable
reason_codeandblocking_reason_codewhere relevant. - FR-140-07 Draft entry state: New or resettable onboarding drafts must default to
draft. - FR-140-08 Verifying transition: Starting a relevant verification run must move the draft into
verifying. - FR-140-09 Action-required transition: Blocked, failed, stale, or mismatched verification or bootstrap conditions must move the draft into
action_required. - FR-140-10 Bootstrapping transition: Starting selected bootstrap operations must move the draft into
bootstrapping. - FR-140-11 Ready transition: The draft must move to
ready_for_activationonly when verification is current and sufficient, selected bootstrap operations are complete if applicable, no relevant runs are still active, and no blocker remains. - FR-140-12 Completed transition: Successful activation must move the draft to
completed. - FR-140-13 Cancelled transition: Explicit cancellation must move the draft to
cancelled. - FR-140-14 Polling start verification: Step 3 must poll while a relevant verification run is queued or running.
- FR-140-15 Polling stop verification: Verification polling must stop when the relevant run is terminal or no longer active or relevant.
- FR-140-16 Polling start bootstrap: Step 4 must poll while relevant bootstrap runs are queued or running.
- FR-140-17 Polling stop bootstrap: Bootstrap polling must stop when relevant selected runs are terminal or no longer active or relevant.
- FR-140-18 Backend-truth status: Checkpoint rendering must use backend-derived operation truth rather than only stale rendered UI state.
- FR-140-19 Activation truth check: Activation must perform a fresh backend-truth gate evaluation before committing.
- FR-140-20 Optional bootstrap handling: If bootstrap was not selected, successful current verification alone may be sufficient for
ready_for_activation. - FR-140-21 Selected-bootstrap gating: If bootstrap was selected, activation readiness must wait for selected bootstrap completion.
- FR-140-22 Version column: The onboarding draft must persist a numeric version for optimistic locking.
- FR-140-23 Versioned mutations: All relevant draft mutations must be rejected atomically on stale version mismatch.
- FR-140-24 No silent overwrite: The UI must never silently overwrite a newer draft version from another session.
- FR-140-25 Conflict feedback: Conflict rejection must present a clear user-visible error.
- FR-140-26 Terminal immutability: Completed and cancelled drafts must remain non-editable.
- FR-140-27 Queryable lifecycle: Lifecycle state must be queryable without re-deriving full workflow semantics from JSON state.
- FR-140-28 Additive architecture: The implementation must reuse existing routes, wizard shell, operation backend, asset strategy, and authorization model.
- FR-140-29 Spec-139 compatibility: The Verify Access checkpoint must remain compatible with Spec 139's additive permissions assist and must not require same-tab deep-dive navigation.
- FR-140-30 No takeover scope: This spec must not implement lease or takeover behavior.
Non-Goals
- Multi-user presence indicators.
- Claimed-by editing banners.
- Forced takeover.
- Real-time push infrastructure.
- A new onboarding dashboard.
- Full visual redesign of the wizard.
- Full autosave.
- Replacing existing operation-monitoring pages.
- Replacing Spec 139's required permissions assist with a different recovery surface.
Assumptions
- Existing verification and bootstrap operations already produce sufficient execution truth through
OperationRun. - Existing onboarding authorization is already robust and should be preserved.
- Polling is an acceptable V1 and V2 enterprise compromise versus heavier real-time infrastructure.
- Optimistic locking is sufficient MVP protection against silent overwrite.
- The existing wizard shell is stable enough to keep as the primary operator experience.
Dependencies
- Existing onboarding wizard page or component.
- Existing onboarding draft model.
- Existing provider connection and verification flows.
- Existing bootstrap operation dispatch paths.
- Existing activation flow.
- Existing
OperationRunmodel and stored contexts. - Existing browser and feature test infrastructure.
- Spec 139's additive Verify Access permissions assist behavior.
Relationship to Spec 139
- Spec 139 adds an in-step Verify Access permissions-recovery assist and new-tab deep-dive continuity.
- Spec 140 defines the broader lifecycle and checkpoint semantics within which Step 3 operates.
- Spec 139 remains additive and does not own the canonical verification lifecycle or operation-state model.
- Spec 140 must not remove or invalidate the Spec 139 assist pattern.
- The Step 3 permissions assist must compose cleanly with polling-based checkpoint rendering.
Risks and Tradeoffs
- State recalculation sprawl would reintroduce truth drift if lifecycle logic is spread across too many methods.
- Partial version coverage would leave silent-overwrite side doors.
- Polling and rendering complexity could make Step 3 and Step 4 brittle if tightly coupled to current layout details.
- Overeager lifecycle transitions could thrash state between
draft,action_required, andready_for_activation. - Verify Access rendering must remain compatible with Spec 139's assist behavior.
UI Action Matrix (mandatory when Filament is changed)
| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Onboarding wizard: Identify | /admin/onboarding, /admin/onboarding/{onboardingDraft} |
Existing wizard actions only | Not a table surface | None | None | Existing start state | Not applicable | Existing step navigation | Existing behavior | Form-driven checkpoint preparation step. |
| Onboarding wizard: Connect Provider | /admin/onboarding, /admin/onboarding/{onboardingDraft} |
Existing wizard actions and connection-related actions | Not a table surface | None | None | Existing contextual actions | Not applicable | Existing step navigation | Existing behavior | Form-driven provider selection step. |
| Onboarding wizard: Verify Access | /admin/onboarding, /admin/onboarding/{onboardingDraft} |
Existing verify or rerun actions; Spec 139 assist remains additive | Not a table surface | None | None | Existing verification start path | Not applicable | Existing step navigation | Existing operation audit only | Operation-backed checkpoint; polling required while active. |
| Onboarding wizard: Bootstrap | /admin/onboarding, /admin/onboarding/{onboardingDraft} |
Existing bootstrap start actions | Not a table surface | None | None | Existing optional skip path | Not applicable | Existing step navigation | Existing operation audit only | Optional operation-backed checkpoint; polling required while active. |
| Onboarding wizard: Complete / Activate | /admin/onboarding, /admin/onboarding/{onboardingDraft} |
Existing activate action | Not a table surface | None | None | None | Not applicable | Existing confirmation flow | Existing activation audit | Final gated checkpoint; readiness must come from backend truth. |
| Conflict surface | Inline within wizard | No new header actions; refresh guidance only | Not a table surface | None | None | Refresh and retry guidance | Not applicable | Mutation rejected | Recommended conflict audit | No new route; optimistic-locking rejection surface only. |
Key Entities (include if feature involves data)
- Onboarding Draft Lifecycle: The canonical top-level workflow record describing whether a draft is in preparation, actively verifying, blocked, bootstrapping, ready for activation, completed, or cancelled.
- Checkpoint Metadata: The persisted
current_checkpoint,last_completed_checkpoint,reason_code, andblocking_reason_codevalues that make lifecycle state precise and queryable. - Relevant Operation Run: The existing verification or bootstrap
OperationRunrecord that owns execution truth for a checkpoint. - Draft Version: The monotonically increasing optimistic-locking value used to reject stale mutations.
Success Criteria (mandatory)
Measurable Outcomes
- SC-140-01 Lifecycle clarity: In focused regression coverage, 100% of onboarding drafts under tested scenarios persist a valid canonical lifecycle state.
- SC-140-02 Queryable lifecycle: Operators and tests can filter drafts by top-level lifecycle state without reconstructing workflow meaning from JSON state.
- SC-140-03 Verify active-session trust: In browser coverage, verification results update without manual refresh while the relevant run remains active.
- SC-140-04 Bootstrap active-session trust: In browser coverage, bootstrap status updates without manual refresh while selected runs remain active.
- SC-140-05 No silent overwrite: In focused concurrency coverage, 100% of stale mutations are rejected and 0 stale mutations silently overwrite newer draft state.
- SC-140-06 Activation gating correctness: In focused coverage,
ready_for_activationis reached only when all defined gating conditions are satisfied. - SC-140-07 Additive architecture: The completed implementation introduces 0 new onboarding routes and 0 new operation backends.
- SC-140-08 Terminal integrity: Completed and cancelled drafts remain non-editable in focused regression coverage.
Acceptance Criteria
Lifecycle and State Model
- Every onboarding draft persists one canonical
lifecycle_state. - Every onboarding draft persists
current_checkpoint. - The model supports
last_completed_checkpoint. - The model supports machine-readable
reason_codeandblocking_reason_code. - A newly created onboarding draft starts in
draft. - Starting verification moves the draft to
verifying. - Verification blocked, failed, stale, or mismatched moves the draft to
action_required. - Starting selected bootstrap operations moves the draft to
bootstrapping. - A draft becomes
ready_for_activationonly when verification is current and sufficient, no relevant verification run is active, selected bootstrap operations are complete if applicable, and no blocker remains. - Successful activation moves the draft to
completed. - Cancelling an editable draft moves it to
cancelled.
Verify Access Checkpoint
- Step 3 polls while a relevant verification run is queued or running.
- Step 3 stops polling when the relevant verification run reaches terminal state.
- Step 3 renders backend-derived current status.
- Step 3 does not require manual refresh to surface terminal run outcome while the page remains active.
- Step 3 remains compatible with Spec 139's required-permissions assist.
Bootstrap Checkpoint
- Step 4 polls while selected relevant bootstrap runs are queued or running.
- Step 4 stops polling when all relevant selected bootstrap runs are terminal.
- Step 4 shows per-selected-operation current status from backend truth.
- If no bootstrap was selected, the draft can still become
ready_for_activationafter successful verification and no blockers. - If bootstrap was selected and not yet complete, the draft must not become
ready_for_activation.
Concurrency
- The onboarding draft has a
versioncolumn. - Every relevant draft mutation checks expected version before commit.
- Stale mutations are rejected atomically.
- Conflict rejection shows a clear user-facing error.
- No silent last-write-wins remains for covered draft mutations.
Activation
- Activation re-evaluates backend truth at commit time.
- Activation is rejected if the draft is no longer actually ready.
- Completed drafts remain non-editable afterward.
Terminal Drafts
- Cancelled drafts remain non-editable afterward.
- Completed and cancelled drafts remain historical workflow records.
Testing Requirements
Core Regression Matrix
- New draft enters
draft. - Identify step commit keeps or sets correct checkpoint metadata.
- Provider selection keeps or sets correct checkpoint metadata.
- Verification start moves draft to
verifying. - Verification success without bootstrap selected moves draft to
ready_for_activation. - Verification success with bootstrap selected and started moves draft to
bootstrapping. - Verification blocked moves draft to
action_required. - Verification failed moves draft to
action_required. - Verification result stale after provider change moves draft to
action_requiredordraftaccording to the defined rule. - Bootstrap success after selected runs complete moves draft to
ready_for_activation. - Bootstrap failure moves draft to
action_required. - Cancel moves draft to
cancelled. - Activation moves draft to
completed. - Completed draft cannot be edited.
- Cancelled draft cannot be edited.
- Verify Access polling stops correctly on terminal run.
- Bootstrap polling stops correctly on terminal runs.
- Step 3 remains usable with Spec 139 assist present.
- Activation remains blocked when selected bootstrap is still active.
- Activation remains blocked when a blocker reason persists.
- Positive authorization coverage proves allowed actors can continue to use the lifecycle and checkpoint flow.
- Negative authorization coverage proves non-members remain
404and in-scope users missing capability remain403.
Concurrency Matrix
- Same user, two tabs: stale provider-change mutation rejected.
- Same user, two tabs: stale verification start rejected.
- Same user, two tabs: stale bootstrap start rejected.
- Same user, two tabs: stale activation rejected.
- Two operators, same workspace: stale mutation rejected.
- Conflict path shows visible error and no false success.
- Terminal or cancelled transition in one tab invalidates edit mutation in another tab.
Browser-Level Validation
- Verify Access updates while the run completes in the background.
- Bootstrap updates while selected runs complete in the background.
- No manual refresh is required for visible active-step status updates.
- Conflict notification is visible after stale submit.
- Wizard continuity is preserved after a conflict.
- Step 3 permissions assist from Spec 139 remains usable under polling-compatible rendering.
Testing Plan
- Feature tests: lifecycle transitions, checkpoint metadata updates, activation gating, version mismatch rejection, immutable terminal drafts, and RBAC allow or deny coverage.
- Browser tests: polling behavior for verification, polling behavior for bootstrap, stale-tab conflict UX, active-session transition to ready-for-activation, and compatibility with Spec 139 assist plus new-tab continuity.
- Unit or service tests: lifecycle recalculation, reason-code mapping, readiness evaluation, and versioned mutation guard behavior.
Implementation Notes and Sequencing
Phase 1 - Data Model Foundation
- Add schema columns for
version,lifecycle_state,current_checkpoint,last_completed_checkpoint,reason_code, andblocking_reason_code. - Add controlled casts, enums, or equivalent value semantics.
Phase 2 - Lifecycle Service Foundation
- Introduce a centralized lifecycle recalculation mechanism.
- Stop scattering lifecycle truth ad hoc through page logic.
- Define readiness evaluation in one canonical place.
Phase 3 - Versioned Mutation Guard
- Add expected-version handling for all relevant mutation paths.
- Reject stale writes atomically.
- Add conflict notification plumbing.
Phase 4 - Verify Checkpoint Hardening
- Add conditional polling for Step 3.
- Render backend-truth status while polling.
- Ensure terminal resolution updates lifecycle correctly.
- Preserve Spec 139 compatibility.
Phase 5 - Bootstrap Checkpoint Hardening
- Add conditional polling for Step 4.
- Render per-selected-run backend truth.
- Resolve lifecycle correctly on terminal outcomes.
Phase 6 - Activation Hardening
- Ensure the final activation gate reads canonical readiness plus backend truth.
- Confirm terminal immutability behavior.
Phase 7 - Regression Hardening
- Add focused feature, browser, and concurrency coverage.
Definition of Done
- Migration adds the required lifecycle and version fields.
- Controlled model values or casts are in place.
- A centralized lifecycle recalculation mechanism exists.
- Verify Access behaves as a polling operation-backed checkpoint.
- Bootstrap behaves as a polling optional operation-backed checkpoint.
ready_for_activationis only reachable under explicitly defined conditions.- All relevant draft mutations are protected by optimistic locking.
- Stale mutations are rejected with clear user feedback.
- Completed and cancelled drafts remain immutable.
- Focused feature and browser coverage proves lifecycle transitions, checkpoint behavior, and concurrency protection.
- Spec 139 compatibility is preserved.
- No new onboarding routes were introduced.
- No lease or takeover model was introduced.
Final Architectural Notes
This spec is intentionally pragmatic. It does not reinvent onboarding as a separate application, overbuild collaborative editing, or replace the existing wizard shell. It gives the current onboarding flow the smallest meaningful hardening slice it needs now: a canonical lifecycle model, truthful operation-backed checkpoints, and MVP-safe concurrency protection.