TenantAtlas/specs/081-provider-connection-cutover/research.md
ahmido 4db8030f2a Spec 081: Provider connection cutover (#98)
Implements Spec 081 provider-connection cutover.

Highlights:
- Adds provider connection resolution + gating for operations/verification.
- Adds provider credential observer wiring.
- Updates Filament tenant verify flow to block with next-steps when provider connection isn’t ready.
- Adds spec docs under specs/081-provider-connection-cutover/ and extensive Spec081 test coverage.

Tests:
- vendor/bin/sail artisan test --compact tests/Feature/Filament/TenantSetupTest.php
- Focused suites for ProviderConnections/Verification ran during implementation (see local logs).

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@MacBookPro.fritz.box>
Reviewed-on: #98
2026-02-08 11:28:51 +00:00

5.1 KiB

Research: Provider Connection Full Cutover

Feature: specs/081-provider-connection-cutover/spec.md
Date: 2026-02-07

Goal

Resolve repo-specific unknowns for the full credential cutover, and document decisions with rationale and alternatives.

Findings (Repo Reality)

Existing ProviderConnection / ProviderCredential primitives

  • ProviderConnection exists as a workspace-owned, tenant-scoped integration asset.
  • Default invariant already exists at DB level via partial unique index:
    • provider_connections_default_unique on (tenant_id, provider) where is_default = true.
  • ProviderCredential exists and stores encrypted payload in payload (encrypted:array) and is hidden from serialization.
  • ProviderGateway::graphOptions(ProviderConnection $connection) builds Graph options using CredentialManager.

Existing runtime provider call patterns

  • Some jobs are already ProviderConnection-first:

    • Provider connection health check uses ProviderGateway::graphOptions($connection).
    • Provider operation start gate (ProviderOperationStartGate) uses provider_connection_id in operation run context and dedupe.
  • Legacy tenant credential reads still exist in high-impact services and UI:

    • Services: inventory sync, policy sync, policy snapshots/backups, restore, RBAC onboarding, scope tag resolver.
    • UI: tenant registration + tenant resource form exposes app_client_id / app_client_secret.

Operations / observability primitives

  • OperationRun has:
    • status: queued|running|completed
    • outcome: pending|succeeded|partially_succeeded|failed (+ reserved cancelled)
    • context JSON field used for identity and target scope.
  • Provider operation start gate already writes context.provider, context.provider_connection_id, and context.target_scope.entra_tenant_id.

Decisions

D1 — Single Source of Truth: ProviderConnection + ProviderCredential

Decision: All runtime provider calls use ProviderConnection + ProviderCredential via ProviderGateway.

Rationale: Eliminates drift between verification vs restore and makes the suite deterministic and auditable.

Alternatives considered:

  • Continue dual-source (tenant fields + provider connections): rejected due to drift and security risk.
  • Allow runtime fallback to tenant fields: rejected; violates “single read path” and creates non-determinism.

D2 — Default enforcement applies to all providers; backfill creates Microsoft defaults only

Decision: The invariant “exactly one default per (tenant, provider)” is generic for all providers, but the one-time backfill only creates/repairs defaults for provider microsoft.

Rationale: Keeps the suite future-proof while delivering Microsoft-only cutover now.

Alternatives considered:

  • Microsoft-only invariant: rejected; forces future migrations and special cases.

D3 — Blocked starts still create an OperationRun

Decision: Starting a provider-backed operation without usable configuration still creates an OperationRun record to preserve observability.

Rationale: Operators need a canonical record for “what was attempted” and why it is blocked.

Alternatives considered:

  • UI-only blocked banner without a run: rejected; loses auditability/observability.

D4 — Represent “blocked” runs as a distinct OperationRun outcome

Decision: Introduce a blocked outcome on operation runs (keep status lifecycle unchanged: completed).

Rationale: The repo currently has no “blocked” status/outcome for runs; representing it explicitly prevents conflating blocked with failed.

Alternatives considered:

  • Encode blocked as outcome=failed + reason_code: rejected; UI semantics become inconsistent and ambiguous.
  • Add a new status value (blocked): rejected; affects active-run dedupe and status badge expectations more broadly.

D5 — Backfill selection rule for existing connections without a default

Decision: If exactly one Microsoft provider connection exists, set it default. If multiple exist, do not auto-select (requires admin remediation).

Rationale: Avoids accidental selection of the wrong app registration.

Alternatives considered:

  • Always pick the oldest: rejected; unsafe in enterprise environments.
  • Always create a new connection: rejected; increases clutter and may violate tenant/provider/entra uniqueness.

D6 — Legacy tenant credential reads allowed only in explicit backfill tooling

Decision: Legacy tenant fields (tenants.app_*) are forbidden in runtime and permitted only in backfill command/migration.

Rationale: Tightens the security posture and makes cutover verifiable via guard tests.

Alternatives considered:

  • Runtime fallback: rejected.
  • No backfill reads: rejected; forces manual secret re-entry for all tenants.

Open Points (to be handled in implementation)

  • Centralize “next steps” as link keys (the repo currently embeds Filament URLs directly in verification checks).
  • Determine the final reason_code taxonomy mapping for common exceptions (credential missing, auth failure, tenant mismatch).