TenantAtlas/specs/188-provider-connection-state-cleanup/research.md
2026-04-10 12:31:26 +02:00

8.7 KiB

Phase 0 Research: Canonical Provider Connection State Cleanup

Decision: Persist lifecycle as a single is_enabled boolean and present it as the Lifecycle dimension

Rationale: The only provider-state behavior still carried uniquely by legacy status is the explicit enabled or disabled gate. Consent and verification already have canonical enums and should not absorb administrative disablement. A boolean is_enabled is the narrowest persisted truth that preserves the current disabled behavior without inventing a broader lifecycle framework or another enum class.

Alternatives considered:

  • Add a new lifecycle_status enum class: rejected because current-release behavior is binary and a boolean is narrower.
  • Keep deriving disablement from legacy status: rejected because it preserves the dual-truth model the spec is removing.
  • Encode disablement inside verification_status: rejected because administrative pause is not a verification outcome.

Rationale: The current schema already stores consent_status and verification_status, and the runtime already uses them in onboarding, consent, and health-check flows. The cleanup should remove the competing legacy layer, not introduce a new readiness or health family. Consent continues to answer permission state, verification continues to answer what the latest check proves, and lifecycle remains separate.

Alternatives considered:

  • Introduce a new combined provider readiness status: rejected because it would collapse three distinct operator questions back into one ambiguous state.
  • Preserve legacy status as a derived compatibility field: rejected because the spec calls for a hard cut with no compatibility tail.

Decision: Remove legacy projection outputs and trim ProviderConnectionStateProjector to canonical verification-outcome logic only

Rationale: ProviderConnectionStateProjector::project() and projectForConnection() exist to emit status and health_status, preserve disabled, and keep legacy fields alive across multiple writers. That responsibility disappears after cutover. The only reusable logic worth retaining is the current projectVerificationOutcome() reasoning that translates a health-check result plus current consent state into canonical consent and verification outcomes and error metadata. Retaining that narrow derivation avoids duplicating reason-code handling while still eliminating legacy field projection.

Alternatives considered:

  • Keep the projector and repoint it to emit a new canonical DTO: rejected because direct canonical persistence is simpler and a new DTO layer is unnecessary.
  • Inline all verification-outcome logic into each writer: rejected because the consent and verification reason-code rules already exist in one place and are reused.

Decision: Replace legacy HealthResult transport fields with canonical verification output

Rationale: HealthResult currently transports legacy status and healthStatus, and MicrosoftProviderHealthCheck manufactures those values purely so downstream code can keep writing the removed columns. The contract should instead expose canonical verification intent and diagnostics only: verificationStatus, reasonCode, message, and meta. That keeps health-check output aligned with the final storage model and removes a legacy vocabulary bridge from the contract layer.

Alternatives considered:

  • Keep status and healthStatus in HealthResult until the final migration: rejected because that prolongs legacy semantics inside the runtime contract layer.
  • Replace HealthResult with a new service or result hierarchy: rejected because the existing contract can be narrowed without new abstraction.

Decision: Treat reader cutover as the first breaking-change phase and schema removal as the last

Rationale: The current runtime still reads legacy fields in ProviderConnectionResolver, ScanEntraAdminRolesJob, ProviderConnectionResource action visibility, tenant provider summaries, system directory queries, and shared Blade views. Dropping columns before those reads move would break provider resolution, action gating, and page rendering immediately. The correct order is lifecycle foundation plus the one-time disabled-state backfill first, canonical reader cutover across shared helpers, operator surfaces, and runtime gates second, canonical writer and transport cleanup third, badge and helper cleanup fourth, residual test-scaffolding cleanup fifth, and schema removal last.

Alternatives considered:

  • Remove columns immediately and fix failures opportunistically: rejected because the affected reads span resolvers, background jobs, Livewire actions, and read-only system pages.
  • Dual-read or dual-write for a compatibility period: rejected because the feature explicitly forbids a compatibility tail.

Decision: Recompute system-directory health summaries from canonical verification and permission truth

Rationale: The system directory currently counts provider issues from health_status and renders provider rows from status and health_status. Those summaries must move to canonical verification plus existing permission signals so /system remains semantically aligned with /admin. Tenant-level list rollups count only the default Microsoft provider connection for each tenant; non-default connections may still appear on the read-only detail page but do not affect list rollup badges. The existing SystemHealth badge domain can stay; only its input contract changes.

Alternatives considered:

  • Leave /system on legacy diagnostics while /admin becomes canonical: rejected because the same connection would tell two different stories depending on the plane.
  • Introduce a new system-only provider-health projection: rejected because that would add another layer of derived truth.

Decision: Reuse existing centralized badge infrastructure and retire legacy provider badge domains

Rationale: The repo already has BadgeDomain::BooleanEnabled, ProviderConsentStatus, and ProviderVerificationStatus. That is enough to render lifecycle, consent, and verification without inventing a provider-lifecycle badge domain. The cleanup should delete BadgeDomain::ProviderConnectionStatus, BadgeDomain::ProviderConnectionHealth, and their mapper classes, while reusing BooleanEnabled for lifecycle and the existing provider consent and verification mappings for the other two dimensions.

Alternatives considered:

  • Add a new provider lifecycle badge domain: rejected because BooleanEnabled already matches the required enabled or disabled semantics.
  • Keep legacy badge domains for diagnostics only: rejected because their continued existence makes reintroduction of legacy truth too easy.

Rationale: TenantResource::providerConnectionState(), the shared provider-connection-state.blade.php entry, and system directory detail views still carry status and health_status keys. Those helpers are the most likely place for accidental legacy fallback because many surfaces share them. Their contract should change directly to lifecycle, consent, verification, and diagnostics such as last_health_check_at and last_error_reason_code.

Alternatives considered:

  • Leave legacy keys in helper arrays and stop rendering them: rejected because hidden compatibility keys would keep inviting silent legacy reads.
  • Create a new presenter for canonical provider state: rejected because the existing helper contract can be narrowed without adding a new presentation layer.

Decision: Update factories and Pest helpers before final schema removal and add contradiction and residual-guard coverage

Rationale: ProviderConnectionFactory and the shared Pest helper still force status = connected and health_status = ok when constructing default provider fixtures. Those helpers will recreate legacy coupling unless they change before the drop migration lands. The regression suite also needs explicit contradiction scenarios, such as disabled plus consent granted and disabled plus verification healthy, so lifecycle separation is proven after the cutover. A final residual guard should fail if provider-state runtime paths still reference removed columns.

Alternatives considered:

  • Let individual tests update as they fail: rejected because the helper layer would keep reintroducing removed fields across unrelated suites.
  • Rely only on migration failure to find remaining references: rejected because many references live in PHP helpers, Blade views, and query builders that can survive until runtime.