103 lines
5.1 KiB
Markdown
103 lines
5.1 KiB
Markdown
# Research: Provider Connection Full Cutover
|
|
|
|
**Feature**: [specs/081-provider-connection-cutover/spec.md](spec.md)
|
|
**Date**: 2026-02-07
|
|
|
|
## Goal
|
|
|
|
Resolve repo-specific unknowns for the full credential cutover, and document decisions with rationale and alternatives.
|
|
|
|
## Findings (Repo Reality)
|
|
|
|
### Existing ProviderConnection / ProviderCredential primitives
|
|
|
|
- `ProviderConnection` exists as a workspace-owned, tenant-scoped integration asset.
|
|
- Default invariant already exists at DB level via partial unique index:
|
|
- `provider_connections_default_unique` on `(tenant_id, provider)` where `is_default = true`.
|
|
- `ProviderCredential` exists and stores encrypted payload in `payload` (`encrypted:array`) and is hidden from serialization.
|
|
- `ProviderGateway::graphOptions(ProviderConnection $connection)` builds Graph options using `CredentialManager`.
|
|
|
|
### Existing runtime provider call patterns
|
|
|
|
- Some jobs are already ProviderConnection-first:
|
|
- Provider connection health check uses `ProviderGateway::graphOptions($connection)`.
|
|
- Provider operation start gate (`ProviderOperationStartGate`) uses `provider_connection_id` in operation run context and dedupe.
|
|
|
|
- Legacy tenant credential reads still exist in high-impact services and UI:
|
|
- Services: inventory sync, policy sync, policy snapshots/backups, restore, RBAC onboarding, scope tag resolver.
|
|
- UI: tenant registration + tenant resource form exposes `app_client_id` / `app_client_secret`.
|
|
|
|
### Operations / observability primitives
|
|
|
|
- `OperationRun` has:
|
|
- `status`: queued|running|completed
|
|
- `outcome`: pending|succeeded|partially_succeeded|failed (+ reserved cancelled)
|
|
- `context` JSON field used for identity and target scope.
|
|
- Provider operation start gate already writes `context.provider`, `context.provider_connection_id`, and `context.target_scope.entra_tenant_id`.
|
|
|
|
## Decisions
|
|
|
|
### D1 — Single Source of Truth: ProviderConnection + ProviderCredential
|
|
|
|
**Decision**: All runtime provider calls use `ProviderConnection` + `ProviderCredential` via `ProviderGateway`.
|
|
|
|
**Rationale**: Eliminates drift between verification vs restore and makes the suite deterministic and auditable.
|
|
|
|
**Alternatives considered**:
|
|
- Continue dual-source (tenant fields + provider connections): rejected due to drift and security risk.
|
|
- Allow runtime fallback to tenant fields: rejected; violates “single read path” and creates non-determinism.
|
|
|
|
### D2 — Default enforcement applies to all providers; backfill creates Microsoft defaults only
|
|
|
|
**Decision**: The invariant “exactly one default per (tenant, provider)” is generic for all providers, but the one-time backfill only creates/repairs defaults for provider `microsoft`.
|
|
|
|
**Rationale**: Keeps the suite future-proof while delivering Microsoft-only cutover now.
|
|
|
|
**Alternatives considered**:
|
|
- Microsoft-only invariant: rejected; forces future migrations and special cases.
|
|
|
|
### D3 — Blocked starts still create an OperationRun
|
|
|
|
**Decision**: Starting a provider-backed operation without usable configuration still creates an `OperationRun` record to preserve observability.
|
|
|
|
**Rationale**: Operators need a canonical record for “what was attempted” and why it is blocked.
|
|
|
|
**Alternatives considered**:
|
|
- UI-only blocked banner without a run: rejected; loses auditability/observability.
|
|
|
|
### D4 — Represent “blocked” runs as a distinct OperationRun outcome
|
|
|
|
**Decision**: Introduce a `blocked` outcome on operation runs (keep status lifecycle unchanged: `completed`).
|
|
|
|
**Rationale**: The repo currently has no “blocked” status/outcome for runs; representing it explicitly prevents conflating blocked with failed.
|
|
|
|
**Alternatives considered**:
|
|
- Encode blocked as `outcome=failed` + reason_code: rejected; UI semantics become inconsistent and ambiguous.
|
|
- Add a new status value (`blocked`): rejected; affects active-run dedupe and status badge expectations more broadly.
|
|
|
|
### D5 — Backfill selection rule for existing connections without a default
|
|
|
|
**Decision**: If exactly one Microsoft provider connection exists, set it default. If multiple exist, do not auto-select (requires admin remediation).
|
|
|
|
**Rationale**: Avoids accidental selection of the wrong app registration.
|
|
|
|
**Alternatives considered**:
|
|
- Always pick the oldest: rejected; unsafe in enterprise environments.
|
|
- Always create a new connection: rejected; increases clutter and may violate tenant/provider/entra uniqueness.
|
|
|
|
### D6 — Legacy tenant credential reads allowed only in explicit backfill tooling
|
|
|
|
**Decision**: Legacy tenant fields (`tenants.app_*`) are forbidden in runtime and permitted only in backfill command/migration.
|
|
|
|
**Rationale**: Tightens the security posture and makes cutover verifiable via guard tests.
|
|
|
|
**Alternatives considered**:
|
|
- Runtime fallback: rejected.
|
|
- No backfill reads: rejected; forces manual secret re-entry for all tenants.
|
|
|
|
## Open Points (to be handled in implementation)
|
|
|
|
- Centralize “next steps” as link keys (the repo currently embeds Filament URLs directly in verification checks).
|
|
- Determine the final reason_code taxonomy mapping for common exceptions (credential missing, auth failure, tenant mismatch).
|
|
|