# Feature Specification: Managed Tenant Onboarding Wizard UI (v2)

**Feature Branch**: `069-tenant-onboarding-wizard-v2`  
**Created**: 2026-01-31  
**Status**: Draft  
**Input**: User description: "Spec 069 v2 — Managed Tenant Onboarding Wizard UI"

## Clarifications

### Session 2026-01-31

- Q: What is the v2 concurrency rule for starting onboarding tasks? → A: One active run per (tenant_id, task_type).
- Q: For v2, is Evidence the source-of-truth for step/task statuses, or are cached fields authoritative? → A: Evidence is the source-of-truth; cached fields (if any) are derived.
- Q: Which Provider Connection auth_mode(s) are in scope for v2? → A: `client_secret` only.
- Q: Where should the Task Board become visible in v2? → A: Starting Step 4.

## Terminology & Routing

This repository uses Filament multi-tenancy. For this spec:

- **Managed Tenant** = `App\Models\Tenant` (current Filament tenant scope).
- **Tenant-scoped / tenant membership** = tenant plane routing under `/admin/t/{tenant}`.
- **`tenant_id`** (in concurrency rules) = the internal tenant primary key (`tenants.id`) used by `OperationRunService`.

There is no separate “Workspace” model in the current codebase; any prior “workspace” wording refers to the current tenant scope.

## User Scenarios & Testing *(mandatory)*

<!--
  IMPORTANT: User stories should be PRIORITIZED as user journeys ordered by importance.
  Each user story/journey must be INDEPENDENTLY TESTABLE - meaning if you implement just ONE of them,
  you should still have a viable MVP (Minimum Viable Product) that delivers value.
  
  Assign priorities (P1, P2, P3, etc.) to each story, where P1 is the most critical.
  Think of each story as a standalone slice of functionality that can be:
  - Developed independently
  - Tested independently
  - Deployed independently
  - Demonstrated to users independently
-->

### User Story 1 - Onboard a managed tenant with a provider connection (Priority: P1)

As a tenant Owner, I want to onboard a Managed Tenant using a dedicated Provider Connection, so that credentials are managed separately from tenant metadata and onboarding can be resumed and operated safely.

**Why this priority**: This is the primary “front door” into tenant onboarding and must support enterprise workflows (separation of concerns, repeatable checks, and recovery) from day one.

**Independent Test**: Can be fully tested by creating an onboarding session, linking/choosing a Provider Connection, and completing at least one verification task that produces stored evidence and updates visible status.

**Acceptance Scenarios**:

1. **Given** a tenant member with Owner permissions and no existing onboarding session, **When** they start onboarding for a new Managed Tenant, **Then** the system creates a session and shows a guided stepper with a preview of the onboarding plan (tasks + prerequisites) and role expectations.
2. **Given** an onboarding session in progress, **When** the user resumes it, **Then** the system shows step statuses based on stored evidence and session state (not transient run output).
3. **Given** a Managed Tenant without a linked Provider Connection, **When** the Owner selects an existing Provider Connection or creates a new one and assigns it, **Then** the tenant becomes linked to that connection and the UI never reveals secrets.
4. **Given** the user opens the consent/permissions step, **When** they run “verify permissions”, **Then** a run record is created, evidence is captured, and the step/task status updates to reflect the latest evidence.

---

### User Story 2 - Operate and recover using a task board (Priority: P2)

As an Owner or Operator, I want a persistent onboarding task board with task history, reruns, and safe fix guidance, so that onboarding can be completed iteratively without resetting or manual database work.

**Why this priority**: Enterprise onboarding rarely succeeds in a single linear pass; users need repeatable tasks, clear “why failing?”, and safe recovery actions.

**Independent Test**: Can be fully tested by running at least two onboarding tasks (one failing, one succeeding), verifying task state changes, and observing fix hints derived from standardized reasons.

**Acceptance Scenarios**:

1. **Given** onboarding has reached the verification phase (or later), **When** the user opens the onboarding task board, **Then** they can see each task’s latest status (OK/Warn/Fail/Unknown), prerequisites, and last run metadata.
2. **Given** a task is blocked by prerequisites, **When** the user tries to run it, **Then** the UI disables the action and explains why (without leaking sensitive details).
3. **Given** a task fails, **When** the user views the task’s details, **Then** they see a sanitized reason and “fix hints” with recommended next actions.
4. **Given** a previously failed task is rerun after the user fixes the issue, **When** the rerun completes, **Then** the new evidence supersedes the prior status and the history remains viewable.

---

### User Story 3 - Collaborate safely across multiple users (Priority: P3)

As an Owner, I want to hand off onboarding work to another workspace member and prevent conflicting edits, so that multiple users can collaborate without corrupting session state.

**Why this priority**: Collaboration and support escalation are common in enterprise environments; session locking and handoff reduce risk and clarify responsibility.

**Independent Test**: Can be fully tested by having two users access the same onboarding session and verifying lock visibility, read-only behavior, and the takeover/handoff behavior under capability constraints.

**Acceptance Scenarios**:

1. **Given** an onboarding session is active, **When** User A opens it, **Then** the session is locked for a short time window and User A is shown as the active editor.
2. **Given** the session is locked by User A, **When** User B opens the same session, **Then** User B sees a “locked by User A” banner and cannot perform mutating actions unless they have takeover permission.
3. **Given** takeover is permitted, **When** User B takes over the session, **Then** the lock transfers, the event is auditable, and User A sees the session become read-only.
4. **Given** handoff is permitted, **When** the Owner hands off the session to another user, **Then** the assignee is clearly shown in the UI and the handoff is auditable.

---

### User Story 4 - Review onboarding evidence and history (Priority: P4)

As a Readonly/Auditor user, I want to view onboarding evidence and run history without being able to mutate anything, so that compliance and troubleshooting are possible without elevated access.

**Why this priority**: Evidence-driven onboarding is a core v2 goal; audit visibility builds trust and reduces support load.

**Independent Test**: Can be fully tested by opening a managed tenant’s onboarding view as a read-only user and verifying evidence visibility with all actions disabled.

**Acceptance Scenarios**:

1. **Given** a Managed Tenant has onboarding evidence, **When** a read-only user views onboarding, **Then** they can see the latest evidence per evidence type and the run history metadata.
2. **Given** the read-only user does not have operation/run permissions, **When** they view tasks, **Then** they cannot start runs and all mutation actions are disabled.

### Edge Cases

- Multiple onboarding sessions exist for the same Managed Tenant: the user is guided to resume the active session or open the task board without creating conflicting sessions.
- Evidence is missing (first-time onboarding) or stale: statuses render as Unknown and the UI encourages running the relevant tasks.
- A session lock expires while a user is mid-flow: the UI refreshes state and prevents silent overwrites.
- A user loses membership in the tenant while viewing onboarding: access becomes deny-as-not-found (404 semantics).
- Permissions/consent is partially satisfied: the UI shows a degraded state (Warn/Fail) with fix guidance rather than a generic error.
- Provider connection becomes disabled/invalid after initial success: tasks become runnable again and status changes reflect the latest evidence.

## Requirements *(mandatory)*

**Constitution alignment (required):** If this feature introduces any Microsoft Graph calls, any write/change behavior,
or any long-running/queued/scheduled work, the spec MUST describe contract registry updates, safety gates
(preview/confirmation/audit), tenant isolation, run observability (`OperationRun` type/identity/visibility), and tests.
If security-relevant DB-only actions intentionally skip `OperationRun`, the spec MUST describe `AuditLog` entries.

**Constitution alignment (RBAC-UX):** If this feature introduces or changes authorization behavior, the spec MUST:
- state which authorization plane(s) are involved (tenant `/admin/t/{tenant}` vs platform `/system`),
- ensure any cross-plane access is deny-as-not-found (404),
- explicitly define 404 vs 403 semantics:
  - non-member / not entitled to tenant scope → 404 (deny-as-not-found)
  - member but missing capability → 403
- describe how authorization is enforced server-side (Gates/Policies) for every mutation/operation-start/credential change,
- reference the canonical capability registry (no raw capability strings; no role-string checks in feature code),
- ensure global search is tenant-scoped and non-member-safe (no hints; inaccessible results treated as 404 semantics),
- ensure destructive-like actions require confirmation (`->requiresConfirmation()`),
- include at least one positive and one negative authorization test, and note any RBAC regression tests added/updated.

**Constitution alignment (OPS-EX-AUTH-001):** OIDC/SAML login handshakes may perform synchronous outbound HTTP (e.g., token exchange)
on `/auth/*` endpoints without an `OperationRun`. This MUST NOT be used for Monitoring/Operations pages.

**Constitution alignment (BADGE-001):** If this feature changes status-like badges (status/outcome/severity/risk/availability/boolean),
the spec MUST describe how badge semantics stay centralized (no ad-hoc mappings) and which tests cover any new/changed values.

<!--
  ACTION REQUIRED: The content in this section represents placeholders.
  Fill them out with the right functional requirements.
-->

### Functional Requirements

- **FR-001 (Tenant-scoped onboarding entry)**: System MUST provide prominent entry points to start/resume onboarding for the current tenant scope (e.g., from the Tenant view, Provider Connection view where linked, and tenant creation flow).

- **FR-002 (Two UI modes)**: System MUST provide (a) a guided 5-step flow and (b) a persistent onboarding task board view for the same onboarding session.

  Task board placement (clarified): The task board MUST be visible starting Step 4 (Consent & Permissions) and remain available through Step 5.

- **FR-003 (Provider connection as first-class object)**: System MUST allow creating and managing a Provider Connection as a first-class object that can be linked to a Managed Tenant.

- **FR-004 (Secrets never disclosed)**: System MUST never display stored secrets or secret material in the UI, logs, notifications, or share links.

- **FR-005 (Role-aware guidance)**: System MUST clearly communicate which roles/capabilities are required for each onboarding action and which actions are restricted.

- **FR-006 (Evidence-driven statuses)**: Step statuses and task statuses MUST be derived from stored evidence and session state, not transient run output.

  Evidence authority (clarified): Evidence records are the source of truth for statuses. Any status-like fields on other entities are optional caches derived from evidence.

- **FR-007 (Evidence types)**: System MUST store and display evidence for, at minimum: consent state, permission snapshot, connection diagnostics, and inventory/sync coverage (where applicable).

- **FR-008 (Latest evidence by type)**: For each evidence type, the UI MUST primarily surface the latest evidence with access to historical evidence/runs.

- **FR-009 (Onboarding plan preview)**: System MUST show a preview of the onboarding plan (tasks, ordering, prerequisites) early in the flow, before actions are executed.

- **FR-010 (Duplicate handling)**: If a Managed Tenant already exists or is already onboarded, the system MUST prevent duplicate onboarding and offer safe navigation to the existing tenant and/or its task board.

- **FR-011 (Provider connection selection/creation)**: Users MUST be able to choose an existing Provider Connection or create a new one within onboarding (subject to authorization). “Create within onboarding” may be satisfied either via an inline create flow or by navigating to the Provider Connection create page and returning to onboarding.

- **FR-011a (Provider connection auth modes v2 scope)**: Provider Connections MUST support `client_secret` auth mode in v2. Other auth modes (e.g., certificate, workload identity) are out of scope for v2.

- **FR-012 (Consent and permissions tasks)**: System MUST provide actions to (a) guide users to complete consent steps and (b) verify permissions via a repeatable task that produces stored evidence.

- **FR-013 (Task board tasks and reruns)**: System MUST provide a task board containing repeatable tasks with prerequisites, last status, and ability to run/rerun where authorized.

- **FR-014 (Default task set)**: The onboarding plan MUST include, at minimum, tasks equivalent to: permission verification, connection diagnostics, and an initial synchronization task (where supported).

- **FR-015 (Standardized reasons and fix hints)**: System MUST present failure reasons using standardized reason codes and display curated fix hints, without exposing raw provider error payloads.

- **FR-016 (Run safety and observability)**: Starting any provider-affecting onboarding task MUST be observable via a run record and MUST produce auditable events identifying who started what and when.

- **FR-017 (Rate and concurrency protections)**: System MUST guard against excessive or conflicting task execution for the same Managed Tenant (e.g., repeated starts or overlapping runs) and provide clear UX feedback when blocked.

  Concurrency rule (clarified): The system MUST allow at most one active run per `(tenant_id, task_type)`. Attempts to start a second run while one is active MUST be blocked (and the UI SHOULD disable the action where possible).

- **FR-018 (Session locking)**: System MUST prevent conflicting edits via session locking with an expiry, showing clear UI banners to non-lock holders.

- **FR-019 (Takeover and handoff)**: System MUST support capability-gated session takeover and owner/manager handoff with clear UI signaling and auditability.

- **FR-020 (Resume everywhere)**: System MUST surface “Onboarding incomplete → Resume” prompts from the Managed Tenant view and the Provider Connection view (where linked).

- **FR-021 (Compatibility with v1 sessions)**: System MUST allow existing v1 onboarding sessions to be resumed in v2, including a safe migration path for legacy credential placement into a Provider Connection.

- **FR-022 (Strict legacy entry point behavior)**: Direct creation of a Managed Tenant outside the onboarding flow MUST be prevented; the onboarding flow is the canonical path.

- **FR-023 (Global search and navigation constraints)**: Onboarding sessions MUST not be discoverable via global search. Tenants MAY be discoverable within the current tenant scope without leaking non-member information.

- **FR-024 (RBAC UX semantics)**: Authorization MUST follow tenant membership semantics:
  - non-member / not entitled to tenant scope → deny-as-not-found behavior
  - member but missing capability → forbidden behavior
  The UI MUST reflect capability absence via disabled controls and tooltips, while server-side enforcement remains authoritative.

- **FR-025 (Centralized badge semantics)**: Status labels (OK/Warn/Fail/Unknown) for steps and tasks MUST follow a centralized mapping and be covered by tests for any new/changed values.

### Assumptions

- The onboarding flow operates within the tenant plane admin area (`/admin/t/{tenant}`), not the platform plane (`/system`).
- Provider Connections support at least one credential mode initially, with future expansion possible.
- Evidence is considered the source of truth for onboarding status; derived “status fields” (if present) are treated as cache-only.

### Out of Scope (v2)

- Azure infrastructure deployment and app registration automation beyond guidance/validation.
- Provider Connection certificate/workload identity auth modes (future).

### Key Entities *(include if feature involves data)*

- **Tenant**: The current Filament tenant scope (`App\Models\Tenant`); the primary subject of onboarding.
- **Provider Connection**: A reusable, managed connection configuration for a provider (credentials + metadata), linkable to a Managed Tenant.
- **Onboarding Session**: A resumable, collaborative record representing onboarding progress, locks, and completion state.
- **Onboarding Task**: A repeatable unit of onboarding work with prerequisites and outcomes.
- **Evidence Record**: A stored snapshot/result of a task/check that drives UI status.
- **Reason Code**: A standardized explanation category used to generate safe user-facing guidance.
- **Run Record (`OperationRun`)**: An observable record for user-triggered operations that affect providers or tenant state.

## Success Criteria *(mandatory)*

<!--
  ACTION REQUIRED: Define measurable success criteria.
  These must be technology-agnostic and measurable.
-->

### Measurable Outcomes

- **SC-001**: Tenant Owners can complete an initial onboarding run (up to “task board available”) in under 10 minutes in a healthy environment.
- **SC-002**: At least 90% of onboarding recoveries after a failed permission/consent check are completed without manual admin intervention beyond following provided fix hints.
- **SC-003**: For onboarded tenants, 95% of task status views render with clear, actionable status (OK/Warn/Fail/Unknown) and at least one evidence record per completed task type.
- **SC-004**: Session collaboration reduces conflicting-edit incidents to near-zero (no silent overwrites; lock/takeover/handoff events are visible and auditable).
- **SC-005**: No secret material is exposed in user-visible surfaces during onboarding (UI, downloadable artifacts, share links, notifications).