# Research: Provider Connection Full Cutover

**Feature**: [specs/081-provider-connection-cutover/spec.md](spec.md)  
**Date**: 2026-02-07

## Goal

Resolve repo-specific unknowns for the full credential cutover, and document decisions with rationale and alternatives.

## Findings (Repo Reality)

### Existing ProviderConnection / ProviderCredential primitives

- `ProviderConnection` exists as a workspace-owned, tenant-scoped integration asset.
- Default invariant already exists at DB level via partial unique index:
  - `provider_connections_default_unique` on `(tenant_id, provider)` where `is_default = true`.
- `ProviderCredential` exists and stores encrypted payload in `payload` (`encrypted:array`) and is hidden from serialization.
- `ProviderGateway::graphOptions(ProviderConnection $connection)` builds Graph options using `CredentialManager`.

### Existing runtime provider call patterns

- Some jobs are already ProviderConnection-first:
  - Provider connection health check uses `ProviderGateway::graphOptions($connection)`.
  - Provider operation start gate (`ProviderOperationStartGate`) uses `provider_connection_id` in operation run context and dedupe.

- Legacy tenant credential reads still exist in high-impact services and UI:
  - Services: inventory sync, policy sync, policy snapshots/backups, restore, RBAC onboarding, scope tag resolver.
  - UI: tenant registration + tenant resource form exposes `app_client_id` / `app_client_secret`.

### Operations / observability primitives

- `OperationRun` has:
  - `status`: queued|running|completed
  - `outcome`: pending|succeeded|partially_succeeded|failed (+ reserved cancelled)
  - `context` JSON field used for identity and target scope.
- Provider operation start gate already writes `context.provider`, `context.provider_connection_id`, and `context.target_scope.entra_tenant_id`.

## Decisions

### D1 — Single Source of Truth: ProviderConnection + ProviderCredential

**Decision**: All runtime provider calls use `ProviderConnection` + `ProviderCredential` via `ProviderGateway`.

**Rationale**: Eliminates drift between verification vs restore and makes the suite deterministic and auditable.

**Alternatives considered**:
- Continue dual-source (tenant fields + provider connections): rejected due to drift and security risk.
- Allow runtime fallback to tenant fields: rejected; violates “single read path” and creates non-determinism.

### D2 — Default enforcement applies to all providers; backfill creates Microsoft defaults only

**Decision**: The invariant “exactly one default per (tenant, provider)” is generic for all providers, but the one-time backfill only creates/repairs defaults for provider `microsoft`.

**Rationale**: Keeps the suite future-proof while delivering Microsoft-only cutover now.

**Alternatives considered**:
- Microsoft-only invariant: rejected; forces future migrations and special cases.

### D3 — Blocked starts still create an OperationRun

**Decision**: Starting a provider-backed operation without usable configuration still creates an `OperationRun` record to preserve observability.

**Rationale**: Operators need a canonical record for “what was attempted” and why it is blocked.

**Alternatives considered**:
- UI-only blocked banner without a run: rejected; loses auditability/observability.

### D4 — Represent “blocked” runs as a distinct OperationRun outcome

**Decision**: Introduce a `blocked` outcome on operation runs (keep status lifecycle unchanged: `completed`).

**Rationale**: The repo currently has no “blocked” status/outcome for runs; representing it explicitly prevents conflating blocked with failed.

**Alternatives considered**:
- Encode blocked as `outcome=failed` + reason_code: rejected; UI semantics become inconsistent and ambiguous.
- Add a new status value (`blocked`): rejected; affects active-run dedupe and status badge expectations more broadly.

### D5 — Backfill selection rule for existing connections without a default

**Decision**: If exactly one Microsoft provider connection exists, set it default. If multiple exist, do not auto-select (requires admin remediation).

**Rationale**: Avoids accidental selection of the wrong app registration.

**Alternatives considered**:
- Always pick the oldest: rejected; unsafe in enterprise environments.
- Always create a new connection: rejected; increases clutter and may violate tenant/provider/entra uniqueness.

### D6 — Legacy tenant credential reads allowed only in explicit backfill tooling

**Decision**: Legacy tenant fields (`tenants.app_*`) are forbidden in runtime and permitted only in backfill command/migration.

**Rationale**: Tightens the security posture and makes cutover verifiable via guard tests.

**Alternatives considered**:
- Runtime fallback: rejected.
- No backfill reads: rejected; forces manual secret re-entry for all tenants.

## Open Points (to be handled in implementation)

- Centralize “next steps” as link keys (the repo currently embeds Filament URLs directly in verification checks).
- Determine the final reason_code taxonomy mapping for common exceptions (credential missing, auth failure, tenant mismatch).