TenantAtlas/specs/137-platform-provider-identity/research.md
ahmido bab01f07a9 feat: standardize platform provider identity (#166)
## Summary
- standardize Microsoft provider connections around explicit platform vs dedicated identity modes
- centralize admin-consent URL and runtime identity resolution so platform flows no longer fall back to tenant-local credentials
- add migration classification, richer consent and verification state handling, dedicated override management, and focused regression coverage

## Validation
- focused repo test coverage was added across provider identity, onboarding, audit, policy, guard, and migration flows
- latest explicit passing run in the workspace: `vendor/bin/sail artisan test --compact tests/Feature/AdminConsentCallbackTest.php tests/Feature/Audit/ProviderConnectionConsentAuditTest.php`

## Notes
- branch includes the full Spec 137 artifact set under `specs/137-platform-provider-identity/`
- target base branch: `dev`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #166
2026-03-13 16:29:08 +00:00

79 lines
7.9 KiB
Markdown

# Research: Platform Provider Identity Standardization
## Decision 1: Use one centrally managed platform identity source
- **Decision**: Add a `PlatformProviderIdentityResolver` that resolves the Microsoft platform identity from centralized platform configuration, initially backed by `config('graph.client_id')`, `config('graph.client_secret')`, `config('graph.tenant_id')`, and canonical redirect metadata.
- **Rationale**: The repo already treats Graph credentials as central application configuration in [config/graph.php](config/graph.php). A dedicated resolver isolates secret storage details and gives both consent generation and runtime one canonical source.
- **Alternatives considered**:
- Keep resolving platform identity from tenant or provider credential rows: rejected because it recreates the hybrid model.
- Copy platform credentials into every `provider_connections` row: rejected because it breaks secret governance and rotation.
## Decision 2: Keep `connection_type` binary and model review-required separately
- **Decision**: Add `connection_type` with only `platform` and `dedicated`, and represent migration review state separately using migration metadata and audit events instead of a third runtime mode.
- **Rationale**: The spec requires dedicated to be an explicit exception, not a fallback, and requires `connection_type` to remain non-null. A third persisted runtime type would leak migration state into the long-term operating model.
- **Alternatives considered**:
- Add `review_required` as a third `connection_type`: rejected because it complicates runtime resolution and weakens the binary platform-versus-dedicated rule.
- Auto-classify every ambiguous legacy record to one of the two modes: rejected because it hides hybrid risk.
## Decision 3: Use string columns plus PHP-backed enums or constants, not a native DB enum
- **Decision**: Implement `connection_type`, `consent_status`, `verification_status`, `source`, and `credential_kind` as indexed string columns backed by application enums or canonical constants.
- **Rationale**: Existing provider connection status fields are strings, and this repo already favors string-backed states over database enum types. This keeps migrations simpler across PostgreSQL and SQLite test environments.
- **Alternatives considered**:
- Native PostgreSQL enum columns: rejected due to migration friction and poorer local test portability.
## Decision 4: Centralize consent URL generation behind a factory
- **Decision**: Add an `AdminConsentUrlFactory` that receives a provider connection, target tenant, and callback route metadata and returns the full canonical admin-consent URL.
- **Rationale**: Current consent generation is fragmented across [app/Http/Controllers/TenantOnboardingController.php](app/Http/Controllers/TenantOnboardingController.php), [app/Filament/Resources/TenantResource.php](app/Filament/Resources/TenantResource.php), and [app/Support/Links/RequiredPermissionsLinks.php](app/Support/Links/RequiredPermissionsLinks.php). A single factory removes client-ID drift.
- **Alternatives considered**:
- Keep `TenantResource::adminConsentUrl()` as the root implementation: rejected because it still reads legacy tenant fields first.
- Generate consent URLs directly in each controller or page: rejected because it recreates divergence.
## Decision 5: Split consent and verification into separate state machines
- **Decision**: Add a new `consent_status` field and a separate `verification_status` field while keeping legacy `status` and `health_status` only as backward-compatible summaries projected from the richer state.
- **Rationale**: Current code often treats `status=connected` as if consent and operational health are the same thing. The spec requires those to be independently visible.
- **Alternatives considered**:
- Reuse `status` alone for consent and readiness: rejected because it is the existing defect.
- Replace `status` immediately everywhere: rejected because too many current flows and tests rely on it.
## Decision 6: Runtime resolution branches strictly on `connection_type`
- **Decision**: Add a `ProviderIdentityResolver` that chooses one of two paths only:
- `platform` uses `PlatformProviderIdentityResolver`
- `dedicated` uses `ProviderCredential`
- **Rationale**: Current [app/Services/Providers/CredentialManager.php](app/Services/Providers/CredentialManager.php), [app/Services/Providers/ProviderGateway.php](app/Services/Providers/ProviderGateway.php), and [app/Services/Providers/ProviderConnectionResolver.php](app/Services/Providers/ProviderConnectionResolver.php) all assume dedicated credentials exist for every usable connection.
- **Alternatives considered**:
- Teach `CredentialManager` to silently fall back among multiple sources: rejected because the spec explicitly forbids silent fallback.
## Decision 7: Standard onboarding creates a platform connection first, then consent, then verification
- **Decision**: The standard wizard flow creates or updates a `platform` provider connection before consent, shows the platform app ID read-only, and never asks for client ID or client secret unless the operator intentionally enters dedicated override mode.
- **Rationale**: This aligns the user journey with the desired SaaS model and with the existing verification flow, which already starts from a selected provider connection.
- **Alternatives considered**:
- Keep “new connection” in the wizard as a credential-entry step: rejected because it preserves the legacy onboarding burden.
- Delay provider connection creation until after callback: rejected because the consent flow then has no explicit connection record to classify or audit.
## Decision 8: Migration classification is explicit and auditable
- **Decision**: Add a `ProviderConnectionClassifier` that classifies each existing Microsoft connection as platform, dedicated, or review-required using signals from tenant legacy fields, provider credential payload, and current consent/runtime behavior. Review-required outcomes emit audit events and keep the record out of silent automatic conversion.
- **Rationale**: Existing code and historical specs already show legacy drift, especially around tenant app fields and provider credential payloads. Safe standardization requires explicit visibility.
- **Alternatives considered**:
- Bulk-convert every current default connection to platform: rejected because some customers intentionally use dedicated app registrations.
- Bulk-convert every connection with a credential row to dedicated: rejected because some credential rows are just historical remnants.
## Decision 9: Reuse existing operation observability for verification
- **Decision**: Keep provider verification and health checks on the existing `provider.connection.check` `OperationRun` path, but update the blockers and report payloads to reflect consent-versus-verification separation and platform-versus-dedicated identity selection.
- **Rationale**: Existing jobs and verification reporting are already wired into the operations model. The required change is identity resolution and reason-code clarity, not a new operations subsystem.
- **Alternatives considered**:
- Move verification inline into the callback or detail page: rejected because remote Graph work belongs in the existing operation model.
## Decision 10: Guard against legacy fallback with dedicated tests and static scans
- **Decision**: Add regression guards for platform runtime and consent paths so CI fails if code starts reading `tenants.app_client_id`, `tenants.app_client_secret`, or `ProviderCredential.payload` as silent fallbacks for platform connections.
- **Rationale**: The repo already uses targeted architectural guards such as [tests/Feature/Guards/NoLegacyTenantGraphOptionsTest.php](tests/Feature/Guards/NoLegacyTenantGraphOptionsTest.php). This feature needs equivalent enforcement for the new identity standard.
- **Alternatives considered**:
- Rely on feature tests alone: rejected because a future fallback could appear in an uncaught call site.