TenantAtlas/specs/081-provider-connection-cutover/spec.md
ahmido 4db8030f2a Spec 081: Provider connection cutover (#98)
Implements Spec 081 provider-connection cutover.

Highlights:
- Adds provider connection resolution + gating for operations/verification.
- Adds provider credential observer wiring.
- Updates Filament tenant verify flow to block with next-steps when provider connection isn’t ready.
- Adds spec docs under specs/081-provider-connection-cutover/ and extensive Spec081 test coverage.

Tests:
- vendor/bin/sail artisan test --compact tests/Feature/Filament/TenantSetupTest.php
- Focused suites for ProviderConnections/Verification ran during implementation (see local logs).

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@MacBookPro.fritz.box>
Reviewed-on: #98
2026-02-08 11:28:51 +00:00

200 lines
15 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feature Specification: Provider Connection Full Cutover
**Feature Branch**: `081-provider-connection-cutover`
**Created**: 2026-02-07
**Status**: Draft (implementation-ready)
**Input**: Spec 081 — Provider Connection Full Cutover (single source of truth, enterprise suite)
## Clarifications
### Session 2026-02-07
- Q: Provider scope for “default ProviderConnection” enforcement? → A: All providers (generic rule), but backfill only creates Microsoft defaults.
- Q: When a provider-backed operation is started but the connection/credential is missing, should we create an OperationRun? → A: Yes — create an OperationRun with a `blocked` outcome/state and store `reason_code` + link-only next steps.
- Q: Backfill behavior when Microsoft connections exist but none is default? → A: If exactly one exists, set it default; if multiple exist, do not auto-select (leave blocked + remediation).
- Q: Legacy tenant credential fields (`tenants.app_*`) after cutover? → A: Forbidden in runtime; allowed only in explicit backfill tooling for one-time copy.
- Q: Legacy tenant credential columns lifecycle? → A: Keep columns for now (deprecated/unused), defer dropping to a follow-up spec.
## User Scenarios & Testing *(mandatory)*
### User Story 1 - Deterministic provider operations (Priority: P1)
As an operator, I can run provider-backed operations (inventory, sync, backup, restore, verification) and the system always uses the same, workspace-managed provider connection for the selected managed tenant.
If the tenant is not configured, the system blocks the action with a clear reason and a guided path to remediation.
**Why this priority**: This removes “verify green, restore red” drift and makes the suite reliable and auditable.
**Independent Test**: Start a provider-backed operation for a managed tenant (a) with a default provider connection and (b) without one; verify the first runs using the default connection and the second is blocked with a stable reason and next-step links.
**Acceptance Scenarios**:
1. **Given** a managed tenant with exactly one default provider connection for a provider, **When** an operator starts a provider-backed operation, **Then** the operation uses that default connection and records the connection identity for traceability.
2. **Given** a managed tenant with no default provider connection for a provider, **When** an operator starts a provider-backed operation, **Then** the operation is blocked deterministically with reason code `provider_connection_missing` and a remediation link.
3. **Given** a managed tenant with more than one “default” provider connection (invalid configuration), **When** an operator starts a provider-backed operation, **Then** the operation is blocked/failed deterministically with reason code `provider_connection_invalid` (optional extension detail such as `ext.multiple_defaults_detected`) and does not proceed.
---
### User Story 2 - Safe credential management with audit (Priority: P2)
As an admin, I can manage provider connections and rotate credentials with explicit confirmation and complete auditability, without secrets ever being shown or stored in logs, reports, or audit payloads.
**Why this priority**: Credential handling is security-critical; enterprise operations require least privilege, safe UI flows, and reliable audit trails.
**Independent Test**: Update a provider credential and confirm (a) confirmation is required, (b) an audit event is created, and (c) secret values are never persisted outside the encrypted credential store.
**Acceptance Scenarios**:
1. **Given** an admin with the required capability, **When** they update provider credentials, **Then** the action requires explicit confirmation and produces an audit event with redacted metadata.
2. **Given** a non-member attempting to access provider connection management for a tenant, **When** they load the page, **Then** they receive deny-as-not-found behavior (404 semantics) with no tenant hints.
3. **Given** a member without the credential-management capability, **When** they attempt to update credentials, **Then** the system denies the mutation with a forbidden response (403 semantics).
---
### User Story 3 - Troubleshoot failures using stable reason codes (Priority: P3)
As an operator, I can understand why a provider-backed operation is blocked or failed through stable, machine-readable reason codes and consistent “next steps” links.
**Why this priority**: Stable reason codes enable predictable UX, support workflows, and long-term suite consistency.
**Independent Test**: Trigger a blocked/failing operation and verify reason codes and next steps appear consistently and contain no secrets.
**Acceptance Scenarios**:
1. **Given** a missing credential, **When** an operation runs, **Then** the outcome is blocked with reason code `provider_credential_missing` and a next-step link to update credentials.
2. **Given** an authentication failure at the provider, **When** an operation runs, **Then** the outcome is failed with reason code `provider_auth_failed` and a documentation link for troubleshooting.
### Edge Cases
- Default provider connection is missing for a tenant/provider pair.
- More than one default provider connection exists for the same tenant/provider pair.
- A provider connection exists but is disabled/unusable.
- Provider credential is missing.
- Provider credential is present but rejected by the provider.
- Admin consent is missing / cannot be detected.
- Required permissions are missing.
- Provider returns forbidden/insufficient privileges.
- Provider target tenant does not match the managed tenant target.
- Network is unreachable / timeouts occur.
- Rate limiting/throttling occurs.
- A user without membership tries to view tenant/provider connection details (deny-as-not-found).
## Requirements *(mandatory)*
**Constitution alignment (required):** This feature affects provider calls, long-running operations, and credential management. The solution must preserve run observability, tenant isolation, safety/confirmations for sensitive actions, and auditable credential handling.
**Constitution alignment (RBAC-UX):** Authorization behavior must be explicit:
- Non-member / not entitled to tenant scope → deny-as-not-found (404 semantics)
- Member but missing capability → forbidden (403 semantics)
### Functional Requirements
- **FR-081-001 (Default required)**: For every managed tenant and provider, the system MUST have exactly one default provider connection OR block provider-backed flows with a clear “missing connection” reason and remediation link.
- **FR-081-002 (Single source of truth)**: All provider-facing runtime flows MUST use provider connections + credentials as the only authoritative credential source.
- **FR-081-003 (No tenant credential runtime use)**: Tenant-stored application credential fields MUST NOT be used at runtime for provider calls.
- **FR-081-003a (Legacy reads are tooling-only)**: Reads of legacy tenant credential fields are permitted only inside explicit backfill tooling (migration/command) for a one-time copy into provider credentials. Runtime flows MUST NOT read legacy tenant credential fields under any circumstances.
- **FR-081-004 (No tenant credential write path)**: The system MUST NOT provide any UI or service flow that writes provider secrets into tenant fields.
- **FR-081-005 (Single provider call entry point)**: All provider calls MUST go through a single, centralized provider gateway/factory layer that accepts a provider connection as the primary identifier.
- **FR-081-006 (Operation traceability)**: Every provider-backed operation MUST record, at minimum, provider identity, provider connection identity, managed tenant identity, and the provider tenant target scope so operators can trace runs.
- **FR-081-007 (Deterministic failure semantics)**: When a provider connection/credential is missing or invalid, operations MUST be blocked or failed deterministically with stable reason codes.
- **FR-081-007a (Blocked operations are observable)**: When an operator attempts to start a provider-backed operation but it cannot proceed due to configuration/credential reasons, the system MUST still create an operation run record in a `blocked` state and store a safe `reason_code` plus link-only next steps.
- **FR-081-008 (No secret leakage)**: Secrets MUST NOT appear in audit metadata, operation context, verification reports, application logs, or exception messages.
- **FR-081-009 (DB-only viewing)**: “View” pages MUST render only stored data and MUST NOT perform provider calls during rendering.
### Security & Authorization Requirements
- **SR-081-001 (Least privilege)**: The system MUST separate permissions for viewing vs managing provider connections/credentials and enforce them server-side.
- **SR-081-002 (Deny-as-not-found)**: Non-members MUST experience deny-as-not-found boundaries for tenant/provider-connection scoped resources.
- **SR-081-003 (Confirmed credential mutations)**: Credential changes MUST require explicit confirmation and generate auditable events with redacted payloads.
### Data & Migration Requirements
- **FR-081-010 (Backfill defaults, idempotent)**: The system MUST provide a one-time backfill that ensures every managed tenant has a default provider connection for the Microsoft provider.
- If a default provider connection already exists, backfill MUST leave it unchanged.
- If no default exists and exactly one Microsoft provider connection exists, backfill MUST set it as the default.
- If no default exists and multiple Microsoft provider connections exist, backfill MUST NOT auto-select a default and MUST leave the tenant in a blocked/remediation-required state.
- If no Microsoft provider connection exists, backfill MUST create one and set it default.
- If legacy tenant credentials exist and backfill creates a new Microsoft provider connection, it MUST copy those legacy credentials into the provider credential store for that new connection.
- Running the backfill multiple times MUST NOT create duplicates.
- **FR-081-011 (Uniqueness invariant)**: The system MUST enforce the invariant “exactly one default provider connection per (managed tenant, provider)” for all providers.
### UX Requirements (minimal changes)
- **UX-081-001 (Blocked state guidance)**: When blocked due to missing default provider connection, the UI MUST clearly state the blocked reason and provide a primary remediation link to manage provider connections.
- **UX-081-002 (Single management surface)**: Tenants MUST NOT have a second credential edit surface; provider connection management is the only supported place to manage provider credentials.
- **UX-081-003 (Link-only next steps)**: Verification “next steps” MUST be navigation-only (links), not server-side “fix-it” actions.
### Scope Boundaries
- **NG-081-001**: This spec does not introduce new credential types (e.g., certificates) or redesign token caching.
- **NG-081-002**: This spec does not change the canonical, tenantless operation run URL structure.
- **NG-081-003**: This spec does not drop legacy tenant credential columns; removal is deferred to a follow-up spec once cutover is proven stable.
### Key Entities *(include if feature involves data)*
- **Managed Tenant**: A workspace-owned tenant target used for provider operations.
- **Provider Connection**: A managed integration asset bound to a managed tenant and provider.
- **Provider Credential**: An encrypted credential payload owned by a provider connection.
- **Default Provider Connection**: The single connection designated as default for a managed tenant + provider.
- **Operation Run**: A canonical record representing a provider-backed operations identity, state, and outcome.
- **Audit Event**: An immutable record of credential changes and other sensitive actions.
- **Verification Report**: Stored results of readiness checks with stable reason codes and link-only next steps.
## Success Criteria *(mandatory)*
### Measurable Outcomes
- **SC-081-001 (Eliminate drift)**: Provider-backed operations for a managed tenant never rely on tenant-stored credential fields at runtime; the system consistently uses the default provider connection.
- **SC-081-002 (Blocked determinism)**: 100% of attempts to start provider-backed operations without a default provider connection are blocked with a stable reason code and a remediation link.
- **SC-081-003 (Audit coverage)**: 100% of credential mutations produce auditable events with no secret material included.
- **SC-081-004 (No secret leakage)**: Secrets appear in 0 verification reports, 0 audit payloads, and 0 operator-visible error messages.
- **SC-081-005 (Backfill completeness)**: After backfill, every managed tenant either has exactly one default provider connection for the Microsoft provider or is left in an explicit remediation-required state (`provider_connection_missing` / `provider_connection_invalid`) per FR-081-010 decision rules.
## Appendix A — Reason Code Taxonomy (v1 baseline)
**Purpose:** Stable, machine-readable classification for provider/credential/auth/permission failures.
| Reason code | Category | Typical status | Meaning |
|---|---|---:|---|
| `provider_connection_missing` | configuration | `block` | No default provider connection configured for this managed tenant/provider. |
| `provider_connection_invalid` | configuration | `fail` | Provider connection exists but is inconsistent/disabled/cannot be used (including multi-default corruption). |
| `provider_credential_missing` | credentials | `block` | Connection exists, but no provider credential (secret) is present. |
| `provider_credential_invalid` | credentials | `fail` | Credential exists but is unusable (bad secret, wrong app, expired, etc.). |
| `provider_consent_missing` | consent | `block` | Admin consent not granted (or not detected). |
| `provider_auth_failed` | auth | `fail` | Authentication/token exchange failed. |
| `provider_permission_missing` | permissions | `block` | Required application permissions are not granted. |
| `provider_permission_denied` | permissions | `fail` | Provider denied access for an attempted call. |
| `provider_permission_refresh_failed` | permissions | `warn` | Permission refresh did not run or failed; observed permissions may be stale. |
| `tenant_target_mismatch` | integrity | `block` | Connection/credential is bound to a different tenant than the managed tenant target. |
| `network_unreachable` | transport | `fail` | Network/DNS/timeout prevents reaching provider endpoints. |
| `rate_limited` | transport | `warn` | Provider throttling / rate limiting encountered. |
| `unknown_error` | fallback | `fail` | Unclassified failure. |
### Extension Namespace (`ext.*`)
Extension codes MAY be added as secondary details without breaking consumers (e.g., provider-specific or error-code subtyping). Viewers MUST degrade gracefully for unknown codes.
## Appendix B — Next Steps Registry (link-only)
**Purpose:** Make remediation links consistent across onboarding, verification, and error screens.
**Rule (v1):** Next steps are navigation-only (links). They do not trigger server-side “fix” actions.
### Default next steps (examples)
- `provider_connection_missing`: Link to manage provider connections and set a default.
- `provider_credential_missing`: Link to update credentials.
- `provider_permission_missing`: Link to required permissions guidance.
- `provider_auth_failed`: Link to connection review and troubleshooting documentation.