TenantAtlas/specs/073-unified-managed-tenant-onboarding-wizard/spec.md

# Feature Specification: Managed Tenant Onboarding Wizard V1 (Enterprise)

**Feature Branch**: `073-unified-managed-tenant-onboarding-wizard`
**Created**: 2026-02-04
**Status**: Draft
**Input**: User description: "Spec 073 — Managed Tenant Onboarding Wizard V1 (Enterprise): single workspace-first wizard as source of truth, tenantless until activation; legacy entry points removed; strict 404/403 semantics; verification checklist with tenantless run page; optional bootstrap; enterprise-grade UX and regression tests."

## Clarifications

### Session 2026-02-04

- Q: Capability granularity for the wizard? → A: Per-step/per-action capabilities (least-privilege). Activation is owner-only; bootstrap actions are separately gated.
- Q: For members without capability, should actions be hidden or disabled? → A: Visible but disabled, with tooltip/explanation; server-side remains authoritative.
- Q: What is the tenantless “View run” URL pattern? → A: `/admin/operations/{run}` (no workspace in path), access-controlled by run.workspace membership (non-member → 404), no auto workspace switching.
- Q: What is the canonical onboarding entry point URL? → A: `/admin/onboarding` (sole entry point in V1; no aliases).

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Start onboarding from a single entry point (Priority: P1)

As a workspace member, I can open a single onboarding entry point and start (or resume) onboarding for a Managed Tenant in the currently selected workspace, so that tenant onboarding is consistent, workspace-first, and safe.

**Why this priority**: This is the foundation for all onboarding work and replaces fragmented legacy flows.

**Independent Test**: Can be fully tested by visiting `/admin/onboarding` with and without a selected workspace, completing Step 1, and verifying that a single tenant is created or resumed without duplicates.

**Acceptance Scenarios**:

1. **Given** no workspace is selected, **When** a user visits `/admin/onboarding`, **Then** they are redirected to choose a workspace.
2. **Given** a workspace is selected and has no active tenants, **When** a user visits the onboarding entry point, **Then** the onboarding wizard opens directly.
3. **Given** a workspace is selected and has at least one active tenant, **When** a user visits the onboarding entry point, **Then** the onboarding wizard is still reachable via an “Add managed tenant” call-to-action.
4. **Given** the user identifies a tenant using an Entra Tenant ID that already exists in the same workspace, **When** they submit Step 1 again, **Then** the wizard stays on Step 1 and shows a notification that the tenant already exists with a link to open it.
5. **Given** the user provides an Entra Tenant ID that exists in a different workspace, **When** they submit Step 1, **Then** the system responds with deny-as-not-found behavior and the UI shows a generic “Not found” notification (no details leaked).

---

### User Story 2 - Attach or create a provider connection safely (Priority: P2)

As a workspace member, I can choose an existing provider connection or create a new one during onboarding, so that the system has a valid technical connection without exposing secret material.

**Why this priority**: Without a valid connection, verification and activation cannot be completed safely.

**Independent Test**: Can be tested by selecting “Use existing connection” vs “Create new connection”, ensuring secrets are masked and never displayed again, and verifying that onboarding state stores no secrets.

**Acceptance Scenarios**:

1. **Given** the user chooses “Use existing connection”, **When** they select a connection and proceed, **Then** onboarding records the chosen connection and continues.
2. **Given** the user chooses “Create new connection”, **When** they input connection details, **Then** any secret input is masked and is not retrievable from the UI later.
3. **Given** the user starts Step 2 but leaves before finishing, **When** they resume onboarding later, **Then** only non-secret inputs are prefilled and secret material is never shown.

---

### User Story 3 - Verify access and review results without tenant-scoped context (Priority: P3)

As a workspace member, I can start a verification run, manually refresh its status, and view a stored checklist report (including a tenantless “View run” page), so that verification works even before the tenant is activated and without using tenant-scoped routes.

**Why this priority**: Verification is the safety gate that enables activation, and it must work in empty workspaces and pre-activation flows.

**Independent Test**: Can be tested by starting verification, asserting idempotent dedupe while a run is active, verifying the viewer renders using stored data only, and verifying the “View run” link is tenantless.

**Acceptance Scenarios**:

1. **Given** verification has not been started, **When** the user clicks “Start verification”, **Then** a new verification run is started and the UI shows that verification is in progress.
2. **Given** a verification run is active, **When** the user clicks “Start verification” again, **Then** the system dedupes the request and does not create a second active run.
3. **Given** a verification run is active, **When** the user clicks “Refresh”, **Then** the UI updates status using stored run state.
4. **Given** verification completes with any blocking failures, **When** the report is shown, **Then** the step status is “Blocked”.
5. **Given** verification completes with warnings but no blocking failures, **When** the report is shown, **Then** the step status is “Needs attention”.
6. **Given** verification completes with no warnings and no failures, **When** the report is shown, **Then** the step status is “Ready”.
7. **Given** the UI shows a “View run” link, **When** the user clicks it, **Then** it opens a tenantless operations URL (not a tenant-scoped URL).

### Edge Cases

- Visiting legacy entry points returns “not found” behavior (no redirects).
- A non-member of the selected workspace receives deny-as-not-found behavior for the onboarding entry point.
- A workspace member without the required capability can see the page, but action controls are disabled and show a tooltip; server-side action attempts are denied with 403.
- Activation is owner-only: non-owners can see Step 5 but cannot activate; the UI explains “Owner required”, and server-side attempts are denied.
- Bootstrap actions are optional and gated independently per action; non-authorized users cannot start them.
- The wizard must not generate or require tenant-scoped links before activation.
- Manual refresh should not trigger external network calls; it may only re-read stored status/report.
- Verification report content must never contain secrets/tokens, raw headers, or credential material.
- Completing onboarding while verification is blocked is prevented unless an explicit override policy applies.

## Requirements *(mandatory)*

**Constitution alignment (required):** If this feature introduces any Microsoft Graph calls, any write/change behavior,
or any long-running/queued/scheduled work, the spec MUST describe contract registry updates, safety gates
(preview/confirmation/audit), tenant isolation, run observability (`OperationRun` type/identity/visibility), and tests.
If security-relevant DB-only actions intentionally skip `OperationRun`, the spec MUST describe `AuditLog` entries.

**Constitution alignment (RBAC-UX):** If this feature introduces or changes authorization behavior, the spec MUST:
- state which authorization plane(s) are involved (tenant `/admin/t/{tenant}` vs platform `/system`),
- ensure any cross-plane access is deny-as-not-found (404),
- explicitly define 404 vs 403 semantics:
  - non-member / not entitled to tenant scope → 404 (deny-as-not-found)
  - member but missing capability → 403
- describe how authorization is enforced server-side (Gates/Policies) for every mutation/operation-start/credential change,
- reference the canonical capability registry (no raw capability strings; no role-string checks in feature code),
- ensure global search is tenant-scoped and non-member-safe (no hints; inaccessible results treated as 404 semantics),
- ensure destructive-like actions require confirmation (`->requiresConfirmation()`),
- include at least one positive and one negative authorization test, and note any RBAC regression tests added/updated.

**Authorization plane(s) involved (filled for this feature):**
- **Tenant plane (Entra users)** only. This feature adds tenantless, workspace-scoped routes under `/admin/*` (`/admin/onboarding`, `/admin/operations/{run}`) that must still enforce tenant-plane membership and capability rules.
- **Platform plane (`/system`) is out of scope**. No cross-plane navigation is introduced; deny-as-not-found (404) semantics remain the default for non-members / not entitled.

**Constitution alignment (OPS-EX-AUTH-001):** OIDC/SAML login handshakes may perform synchronous outbound HTTP (e.g., token exchange)
on `/auth/*` endpoints without an `OperationRun`. This MUST NOT be used for Monitoring/Operations pages.

**Constitution alignment (BADGE-001):** If this feature changes status-like badges (status/outcome/severity/risk/availability/boolean),
the spec MUST describe how badge semantics stay centralized (no ad-hoc mappings) and which tests cover any new/changed values.

### Functional Requirements

- **FR-001 (Single onboarding entry point)**: The system MUST provide a single onboarding entry point at `/admin/onboarding` that is the source of truth for onboarding.
- **FR-002 (Workspace required)**: If no workspace is selected, the onboarding entry point MUST redirect the user to a workspace chooser.
- **FR-003 (Workspace landing behavior)**: With a selected workspace, the system MUST:
  - open the wizard directly when the workspace has zero active tenants, and
  - keep the wizard reachable via an “Add managed tenant” call-to-action when the workspace has one or more active tenants.
- **FR-004 (Remove legacy entry points)**: The following legacy entry points MUST NOT exist and MUST return “not found” behavior (no redirects):
  - `/admin/new`
  - any legacy tenant-scoped create entry point
  - `/admin/managed-tenants/onboarding` (legacy)
- **FR-005 (Membership boundary)**: A non-member of the selected workspace MUST always receive deny-as-not-found behavior for onboarding and for any workspace-visible operations.
- **FR-006 (Capability boundary)**: A workspace member without the required capability MUST be able to view the page, but action controls MUST be disabled with an explanatory tooltip; server-side action attempts MUST be denied with 403.
- **FR-006d (Discoverability default)**: In V1, capability-gated controls SHOULD remain visible but disabled with an explanation (rather than being hidden), to support enterprise operator workflows.
- **FR-006a (Least-privilege capability model)**: The wizard MUST gate each step and each action by canonical capabilities (no ad-hoc role string checks).
- **FR-006b (Wizard capability breakdown)**: The system MUST support, at minimum, distinct capability gates for:
  - identifying / creating / resuming onboarding for a managed tenant,
  - viewing/selecting a provider connection,
  - creating/editing a provider connection,
  - starting verification,
  - running each optional bootstrap action (inventory sync, policy sync, backup bootstrap) independently,
  - activating a tenant.
- **FR-006c (Viewer visibility)**: Viewing verification reports and operation-run results MUST be permitted to workspace members (subject to workspace membership), even when they cannot start runs.
- **FR-007 (Workspace↔tenant match hard rule)**: For any tenant-scoped route, if the tenant does not belong to the currently selected workspace, the system MUST return deny-as-not-found behavior.
- **FR-008 (Tenantless wizard until activation)**: The wizard MUST not require tenant-scoped pages, routes, or links before the final “Complete / Activate” step.
- **FR-009 (Identify managed tenant inputs)**: Step 1 MUST capture, at minimum:
  - tenant name,
  - environment,
  - Entra Tenant ID,
  - optional primary domain,
  - optional notes.
- **FR-010 (Idempotent identification)**: Step 1 MUST be idempotent for the same tenant identifier within the same workspace and MUST resume an active onboarding session when applicable.
- **FR-011 (Uniqueness of Entra Tenant ID)**: The system MUST enforce Entra Tenant ID uniqueness globally, and each Entra Tenant ID MUST be bound to exactly one workspace in V1.
- **FR-012 (Tenant status model)**: Managed Tenants MUST support a v1 lifecycle including: `draft`, `onboarding`, `active`, `archived`.
- **FR-013 (Provider connection choice)**: Step 2 MUST let the user either use an existing connection or create a new connection.
- **FR-014 (Secret safety)**: Any secret material entered during connection creation MUST be masked, stored securely, and MUST never be displayed again. Onboarding session state MUST not store secret material.
- **FR-015 (Verification run start)**: Step 3 MUST allow starting a verification run and MUST dedupe requests while an active verification run exists.
- **FR-016 (Verification viewer behavior)**: Step 3 MUST display a stored checklist report with:
  - an “in progress” banner while a run is active,
  - a manual “Refresh” control,
  - status mapping: blocking failures → Blocked; warnings-only → Needs attention; otherwise → Ready,
  - “Next steps” as links only (no server-side actions in V1).
- **FR-017 (Tenantless operations page)**: The wizard’s “View run” link MUST point to `/admin/operations/{run}` and MUST never use a tenant-scoped operations URL.
- **FR-017a (Tenantless access semantics)**: Access to `/admin/operations/{run}` MUST be granted only if the user is a member of the run’s workspace; otherwise the system MUST respond with deny-as-not-found behavior. The page MUST NOT require a pre-selected workspace context and MUST NOT auto-switch workspaces.
- **FR-018 (Workspace-visible operations)**: Operation runs started by the wizard MUST be safely viewable in a workspace context without tenant-scoped routing and MUST honor the same deny-as-not-found membership boundary.
- **FR-019 (Optional bootstrap step)**: Step 4 MAY offer optional bootstrap actions (e.g., inventory sync, policy sync, baseline creation) with per-action capability gating; each selected action MUST start its own operation run and be viewable tenantlessly.
- **FR-020 (Complete / Activate gate)**: The wizard MUST only allow activation when a provider connection exists and verification is not Blocked, except when a workspace owner explicitly overrides the block.
- **FR-020a (Override requirements)**: When overriding a blocked verification, the system MUST require a human-entered reason and MUST record an audit event capturing the override decision and reason.
- **FR-020b (Owner-only activation)**: Activation MUST be restricted to workspace owners (non-owner members may not activate, even if they can run earlier steps).
- **FR-021 (Activation outcome)**: On activation, the tenant MUST become visible in the workspace tenant switcher and the user MUST be redirected either to the tenant home (open now) or back to the workspace managed tenant list.
- **FR-022 (Connection ownership model)**: Provider connections MUST be workspace-owned.
- **FR-022a (Safe default binding)**: By default in V1, a provider connection MUST be bound to exactly one managed tenant.
- **FR-022b (Reuse safety gate)**: Reuse of an existing provider connection for additional managed tenants MUST be disabled by default and MUST only be possible via an explicit opt-in that clearly communicates risk and is policy-gated.
- **FR-023 (Auditability)**: The system MUST record audit events for: tenant identification, connection creation/updates, verification start/completion, bootstrap run start/completion, and activation.
- **FR-024 (DB-only rendering)**: The wizard and the verification viewer MUST render using stored data only; any external checks MUST run as background work.
- **FR-025 (Badge semantics)**: Step-status and verification-result chips MUST use centralized badge semantics (no per-page ad-hoc mappings), and changes MUST be covered by automated tests.
- **FR-026 (Graph contract path)**: Any Microsoft Graph call made by verification/bootstrap runs MUST go through the canonical contract registry path (`GraphClientInterface` + `config/graph_contracts.php`). Feature code MUST NOT hardcode ad-hoc endpoints; missing contracts MUST fail safe and be covered by automated tests.

### Key Entities *(include if feature involves data)*

- **Workspace**: A portfolio context that a user selects; controls membership and owns one or more managed tenants.
- **Managed Tenant**: A record representing a Microsoft tenant managed by the organization; includes identity (Entra Tenant ID), environment, and lifecycle status.
- **Onboarding Session**: A resumable record of onboarding progress and safe, non-secret state.
- **Provider Connection**: A technical connection configuration used to access tenant data; includes secret material that must never be displayed after capture.
- **Operation Run**: A trackable background run started by the wizard (verification and optional bootstrap actions) with a stored report suitable for safe, tenantless viewing.
- **Verification Report**: A stored checklist result with per-check statuses, safe messages, evidence pointers, and “next steps” links.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001 (Single entry point adoption)**: 100% of managed-tenant onboarding starts from the single onboarding entry point; legacy URLs return “not found” behavior.
- **SC-002 (Time to first verification)**: A workspace admin can reach “verification started” within 3 minutes of opening onboarding (excluding external consent/approval wait time).
- **SC-003 (No pre-activation tenant-scoped routing)**: Before activation, the wizard never generates tenant-scoped URLs; this is validated by regression tests.
- **SC-004 (Authorization correctness)**: Non-members consistently receive deny-as-not-found behavior; members lacking capability receive 403 on action attempts; authorized users complete onboarding.
- **SC-005 (Idempotency)**: For repeated Step 1 submissions with the same Entra Tenant ID in the same workspace, no duplicates are created and the user resumes the existing onboarding session.
- **SC-006 (Secret safety)**: No secret material appears in UI, reports, notifications, logs, or audit events; validated by automated tests.
- **SC-007 (Operational clarity)**: When verification is blocked, at least 90% of users can identify the reason category and next step from the report without opening a support ticket (measured via internal feedback or support tagging).