TenantAtlas/specs/081-provider-connection-cutover/research.md
ahmido 4db8030f2a Spec 081: Provider connection cutover (#98)
Implements Spec 081 provider-connection cutover.

Highlights:
- Adds provider connection resolution + gating for operations/verification.
- Adds provider credential observer wiring.
- Updates Filament tenant verify flow to block with next-steps when provider connection isn’t ready.
- Adds spec docs under specs/081-provider-connection-cutover/ and extensive Spec081 test coverage.

Tests:
- vendor/bin/sail artisan test --compact tests/Feature/Filament/TenantSetupTest.php
- Focused suites for ProviderConnections/Verification ran during implementation (see local logs).

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@MacBookPro.fritz.box>
Reviewed-on: #98
2026-02-08 11:28:51 +00:00

103 lines
5.1 KiB
Markdown

# Research: Provider Connection Full Cutover
**Feature**: [specs/081-provider-connection-cutover/spec.md](spec.md)
**Date**: 2026-02-07
## Goal
Resolve repo-specific unknowns for the full credential cutover, and document decisions with rationale and alternatives.
## Findings (Repo Reality)
### Existing ProviderConnection / ProviderCredential primitives
- `ProviderConnection` exists as a workspace-owned, tenant-scoped integration asset.
- Default invariant already exists at DB level via partial unique index:
- `provider_connections_default_unique` on `(tenant_id, provider)` where `is_default = true`.
- `ProviderCredential` exists and stores encrypted payload in `payload` (`encrypted:array`) and is hidden from serialization.
- `ProviderGateway::graphOptions(ProviderConnection $connection)` builds Graph options using `CredentialManager`.
### Existing runtime provider call patterns
- Some jobs are already ProviderConnection-first:
- Provider connection health check uses `ProviderGateway::graphOptions($connection)`.
- Provider operation start gate (`ProviderOperationStartGate`) uses `provider_connection_id` in operation run context and dedupe.
- Legacy tenant credential reads still exist in high-impact services and UI:
- Services: inventory sync, policy sync, policy snapshots/backups, restore, RBAC onboarding, scope tag resolver.
- UI: tenant registration + tenant resource form exposes `app_client_id` / `app_client_secret`.
### Operations / observability primitives
- `OperationRun` has:
- `status`: queued|running|completed
- `outcome`: pending|succeeded|partially_succeeded|failed (+ reserved cancelled)
- `context` JSON field used for identity and target scope.
- Provider operation start gate already writes `context.provider`, `context.provider_connection_id`, and `context.target_scope.entra_tenant_id`.
## Decisions
### D1 — Single Source of Truth: ProviderConnection + ProviderCredential
**Decision**: All runtime provider calls use `ProviderConnection` + `ProviderCredential` via `ProviderGateway`.
**Rationale**: Eliminates drift between verification vs restore and makes the suite deterministic and auditable.
**Alternatives considered**:
- Continue dual-source (tenant fields + provider connections): rejected due to drift and security risk.
- Allow runtime fallback to tenant fields: rejected; violates “single read path” and creates non-determinism.
### D2 — Default enforcement applies to all providers; backfill creates Microsoft defaults only
**Decision**: The invariant “exactly one default per (tenant, provider)” is generic for all providers, but the one-time backfill only creates/repairs defaults for provider `microsoft`.
**Rationale**: Keeps the suite future-proof while delivering Microsoft-only cutover now.
**Alternatives considered**:
- Microsoft-only invariant: rejected; forces future migrations and special cases.
### D3 — Blocked starts still create an OperationRun
**Decision**: Starting a provider-backed operation without usable configuration still creates an `OperationRun` record to preserve observability.
**Rationale**: Operators need a canonical record for “what was attempted” and why it is blocked.
**Alternatives considered**:
- UI-only blocked banner without a run: rejected; loses auditability/observability.
### D4 — Represent “blocked” runs as a distinct OperationRun outcome
**Decision**: Introduce a `blocked` outcome on operation runs (keep status lifecycle unchanged: `completed`).
**Rationale**: The repo currently has no “blocked” status/outcome for runs; representing it explicitly prevents conflating blocked with failed.
**Alternatives considered**:
- Encode blocked as `outcome=failed` + reason_code: rejected; UI semantics become inconsistent and ambiguous.
- Add a new status value (`blocked`): rejected; affects active-run dedupe and status badge expectations more broadly.
### D5 — Backfill selection rule for existing connections without a default
**Decision**: If exactly one Microsoft provider connection exists, set it default. If multiple exist, do not auto-select (requires admin remediation).
**Rationale**: Avoids accidental selection of the wrong app registration.
**Alternatives considered**:
- Always pick the oldest: rejected; unsafe in enterprise environments.
- Always create a new connection: rejected; increases clutter and may violate tenant/provider/entra uniqueness.
### D6 — Legacy tenant credential reads allowed only in explicit backfill tooling
**Decision**: Legacy tenant fields (`tenants.app_*`) are forbidden in runtime and permitted only in backfill command/migration.
**Rationale**: Tightens the security posture and makes cutover verifiable via guard tests.
**Alternatives considered**:
- Runtime fallback: rejected.
- No backfill reads: rejected; forces manual secret re-entry for all tenants.
## Open Points (to be handled in implementation)
- Centralize “next steps” as link keys (the repo currently embeds Filament URLs directly in verification checks).
- Determine the final reason_code taxonomy mapping for common exceptions (credential missing, auth failure, tenant mismatch).