ahmido d2f2c55ead feat: add onboarding lifecycle checkpoints and locking (#169 )

## Summary
- add canonical onboarding lifecycle and checkpoint fields plus optimistic locking versioning for managed tenant onboarding drafts
- introduce centralized onboarding lifecycle and mutation services and route wizard mutations through version-checked writes
- convert Verify Access and Bootstrap into live checkpoint-driven wizard states with conditional polling and updated browser/feature/unit coverage
- add Spec Kit artifacts for feature 140, including spec, plan, tasks, research, data model, quickstart, checklist, and logical contract

## Validation
- branch was committed and pushed cleanly
- focused tests and formatting were updated during implementation work
- full validation was not re-run as part of this final git/PR step

## Notes
- base branch: `dev`
- feature branch: `140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp`
- outstanding follow-up items, if any, remain tracked in `specs/140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp/tasks.md`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #169

2026-03-14 11:02:29 +00:00

5.9 KiB

Raw Permalink Blame History

Research: Onboarding Lifecycle, Operation Checkpoints & Concurrency MVP

Decision 1: Extend the existing onboarding session table instead of introducing a new workflow table

Decision: Add version, lifecycle_state, current_checkpoint, last_completed_checkpoint, reason_code, and blocking_reason_code directly to managed_tenant_onboarding_sessions.
Rationale: The current onboarding draft is already persisted in TenantOnboardingSession, and the constitution explicitly allows this workflow record to remain workspace-scoped and coordination-oriented. First-class lifecycle fields make the workflow queryable without duplicating draft identity or adding a second coordination backend.
Alternatives considered:
- Create a separate onboarding lifecycle table: rejected because it would split draft truth across two workflow records and add sync complexity without improving authorization boundaries.
- Keep lifecycle fully derived from JSON and runs: rejected because the current ambiguity is the problem this feature is meant to solve.

Decision 2: Replace ad hoc stage inference with a centralized onboarding lifecycle recalculation service

Decision: Introduce a dedicated onboarding lifecycle service or resolver that owns canonical lifecycle transitions, checkpoint precision, and readiness evaluation, while OnboardingDraftStageResolver becomes a presentation-oriented wrapper or consumer.
Rationale: Current wizard progression is inferred from current_step, JSON state, and run inspection in page logic plus OnboardingDraftStageResolver. A shared lifecycle service keeps one deterministic source of workflow truth and prevents drift between rendering, activation gating, and background refresh.
Alternatives considered:
- Keep logic in the page class: rejected because lifecycle sprawl is already visible and would worsen with checkpoint polling and concurrency rules.
- Use model accessors only: rejected because the feature needs explicit write-time recalculation and conflict-aware mutation orchestration, not only read-time convenience.

Decision 3: Reuse existing OperationRun infrastructure for checkpoint execution truth

Decision: Keep Verify Access on ProviderOperationStartGate plus provider.connection.check and keep Bootstrap on the existing operation dispatch paths and OperationRunService.
Rationale: The repo already enforces Ops-UX lifecycle ownership, dedupe, notifications, and monitoring through OperationRunService. Reusing those paths satisfies the constitution and avoids accidental creation of an onboarding-only operations stack.
Alternatives considered:
- Introduce onboarding-specific async job records: rejected because it would violate the spec's additive architecture constraint.
- Perform verification or bootstrap inline in the wizard: rejected because the constitution requires long-running and remote work to remain observable through OperationRun.

Decision 4: Add conditional Livewire polling to the existing wizard instead of introducing real-time push infrastructure

Decision: Use conditional wire:poll for Step 3 and Step 4 while relevant runs are active, following existing patterns used by operation viewers and progress widgets.
Rationale: The repo already uses conditional polling in BulkOperationProgress and the tenantless operation run viewer. Polling matches the feature scope, avoids WebSocket or SSE complexity, and satisfies the active-session trust requirement.
Alternatives considered:
- WebSockets or SSE: rejected because the spec explicitly excludes real-time push infrastructure.
- Continue manual refresh: rejected because the feature's trust goal requires the checkpoint to refresh while the page stays open.

Decision 5: Implement optimistic locking through version-checked draft mutations rather than leases or takeovers

Decision: Add a numeric version column and require all relevant onboarding mutations to submit and compare an expected version before commit.
Rationale: The spec explicitly chooses optimistic locking as the MVP concurrency mechanism. It prevents silent overwrites with minimal information architecture change and keeps the user on the existing wizard page when conflicts occur.
Alternatives considered:
- Claimed-by or lease model: rejected because it is explicitly out of scope.
- Last-write-wins with warnings: rejected because it does not prevent data loss.

Decision 6: Preserve Spec 139 as a Step 3 assist layered on top of canonical checkpoint state

Decision: Treat the required-permissions assist as an additive Step 3 recovery surface that is driven by current verification lifecycle and report truth rather than as a separate owner of checkpoint semantics.
Rationale: Spec 139 already established in-step recovery and new-tab deep-dive continuity. Feature 140 needs broader lifecycle semantics, but must not replace that recovery surface or force same-tab navigation.
Alternatives considered:
- Fold the assist into a new full verification dashboard: rejected because it would redesign the flow and conflict with Spec 139's additive scope.
- Ignore the assist during lifecycle planning: rejected because Step 3 rendering must remain compatible with it under polling.

Decision 7: Keep top-level lifecycle values controlled and machine-readable

Decision: Represent lifecycle state, checkpoints, and blocker precision through enums or equivalent controlled values in the onboarding support layer.
Rationale: The current OnboardingDraftStatus enum only models draft, completed, and cancelled. Feature 140 needs additional workflow truth that can be queried, rendered consistently, and tested without ad hoc strings spread across page code.
Alternatives considered:
- Raw string literals in the page and model: rejected because it undermines determinism and testability.
- Nested JSON-only codes: rejected because the feature explicitly needs top-level queryable lifecycle state.

5.9 KiB Raw Permalink Blame History

Research: Onboarding Lifecycle, Operation Checkpoints & Concurrency MVP

Decision 1: Extend the existing onboarding session table instead of introducing a new workflow table

Decision 2: Replace ad hoc stage inference with a centralized onboarding lifecycle recalculation service

Decision 3: Reuse existing OperationRun infrastructure for checkpoint execution truth

Decision 4: Add conditional Livewire polling to the existing wizard instead of introducing real-time push infrastructure

Decision 5: Implement optimistic locking through version-checked draft mutations rather than leases or takeovers

Decision 6: Preserve Spec 139 as a Step 3 assist layered on top of canonical checkpoint state

Decision 7: Keep top-level lifecycle values controlled and machine-readable

5.9 KiB

Raw Permalink Blame History