TenantAtlas/specs/216-provider-dispatch-gate/research.md

# Research: Provider-Backed Action Preflight and Dispatch Gate Unification

## Decision 1: Extend the existing gate and registry; do not create a second provider-start framework

- Decision: Migrate every covered operator-triggered provider-backed start onto the existing `ProviderOperationStartGate` by expanding `ProviderOperationRegistry` and updating the current action hosts to call the same queue-admission path.
- Rationale: The repo already has the right hardening seam. `ProviderOperationStartGate` resolves connection readiness, blocks missing provider prerequisites before queue admission, dedupes same-operation starts, blocks conflicting operations on the same protected scope, and returns a shared result object. The problem is adoption breadth, not missing infrastructure.
- Alternatives considered:
  - Introduce a new `ProviderStartCoordinator` or second orchestration pipeline. Rejected because it would duplicate the gate, widen the change into architecture work, and violate FR-216-014.
  - Keep local `ensureRun*/dispatch` flows and copy the same preflight into each action. Rejected because that preserves semantic drift and repeats the same blocker logic across tenant, provider-connection, restore, directory, and onboarding surfaces.

## Decision 2: Accepted work must pin `provider_connection_id` at dispatch time

- Decision: Every migrated accepted start resolves its provider connection before queue admission and persists the chosen `provider_connection_id` into `OperationRun.context` and job arguments.
- Rationale: Existing non-gate starts often resolve the default connection at job execution time. That allows runtime default drift if the default connection changes between click time and job execution. Dispatch-time pinning is already the correct pattern on the gated starts and is required by FR-216-006.
- Alternatives considered:
  - Continue resolving the default connection inside queued jobs. Rejected because the same operator action can execute against a different connection than the one implied at click time.
  - Store only a display label or provider name in context. Rejected because monitoring, dedupe, and run-detail explanation need the stable connection identity, not only presentation data.

## Decision 3: Keep blocked starts as canonical prevented-from-starting truth, but never admit background work

- Decision: A blocked preflight continues to produce canonical blocked-start truth where the current gate already does so, but blocked starts never enqueue jobs and must remain distinguishable from accepted runs that later fail during execution.
- Rationale: FR-216-004 and FR-216-015 require the product to stop turning preventable prerequisite problems into ordinary execution failures. Preserving canonical blocked truth keeps auditability and resolution links intact while still ensuring no remote work was accepted.
- Alternatives considered:
  - Show only an ephemeral toast and create no run truth at all. Rejected because operators and support would lose the canonical blocked audit trail and linked next-step metadata.
  - Queue the work and let the job fail fast. Rejected because that is the failure mode this spec is correcting.

## Decision 4: Standardize operator feedback through one thin presentation helper layered over the existing Ops UX stack

- Decision: Add one narrow start-result presentation helper that consumes `ProviderOperationStartResult` and composes the existing `OperationUxPresenter`, `ReasonPresenter`, `ProviderNextStepsRegistry`, and `OperationRunLinks` building blocks.
- Rationale: The current gated surfaces already prove that the shared start result is viable, but they still duplicate local `if/else` notification code across tenant, widget, provider-connection, and onboarding surfaces. A thin presenter absorbs that duplication without introducing a new UI semantics framework.
- Alternatives considered:
  - Leave each surface with its own `Notification::make()` branching. Rejected because it fails FR-216-008/009 and guarantees future copy drift.
  - Invent a larger badge/explanation framework for provider starts. Rejected because the repo constitution explicitly discourages turning UI semantics into their own mandatory architecture.

## Decision 5: The first slice is bounded by the spec routes, not by every possible provider-backed job in the codebase

- Decision: The first implementation slice covers every current operator-triggered provider-backed start reachable from tenant-scoped surfaces, provider-connection surfaces, and onboarding: tenant verification, provider-connection check/inventory/compliance actions, restore execute, directory groups sync, role definitions sync, onboarding verification, and onboarding bootstrap.
- Rationale: FR-216-012 defines the route-bounded first slice. Read-only exploration also surfaced workspace-level baseline/evidence/review generators and other background operations, but those lie outside the spec's primary routes and would expand this hardening feature into adjacent workflow areas.
- Alternatives considered:
  - Expand the first slice to every provider-backed operation in the entire repo. Rejected because it would overshoot the spec and slow delivery.
  - Limit the slice to restore only. Rejected because the same operator pain already exists across onboarding, directory sync, and provider-connection action hosts.

## Decision 6: Keep current write-time operation type strings in this feature; do not combine hardening with operation-type normalization

- Decision: Migrated starts keep their current write-time operation type strings, even where `OperationCatalog` exposes newer aliases or canonical dotted names.
- Rationale: The operator problem here is queue admission and start-result consistency, not operation-type taxonomy cleanup. Renaming start types would widen the blast radius into monitoring, audit, test fixtures, and historical read-model expectations.
- Alternatives considered:
  - Rename legacy operation types such as `entra_group_sync` or `directory_role_definitions.sync` during the gate migration. Rejected because that is a separate normalization concern and not required to deliver block-before-queue semantics.
  - Add a new translation layer inside the gate just for this feature. Rejected because that adds semantic machinery without solving the core operator problem.

## Decision 7: Onboarding bootstrap must normalize to sequential protected-scope admission

- Decision: The onboarding wizard can no longer admit multiple provider-backed operations concurrently for the same provider connection. The existing wizard flow remains, but queue admission becomes sequential: one accepted provider-backed run per protected scope, with remaining selected bootstrap work retained as follow-up state.
- Rationale: The current bootstrap implementation explicitly starts more than one provider-backed run for the same connection if no run is active at the beginning of the transaction. That conflicts with FR-216-007 and SC-216-003, which require click-time conflict protection and at most one accepted provider-backed operation per protected scope.
- Alternatives considered:
  - Keep the existing batch-start bypass for onboarding only. Rejected because it would leave a permanent exception to the canonical start contract inside the first slice.
  - Introduce a new umbrella bootstrap orchestration entity. Rejected because it would create a second start framework and unnecessary new semantics.

## Decision 8: The operator contract uses accepted/deduped/scope-busy/blocked vocabulary, even if the internal result object keeps `started` for compatibility

- Decision: The shared operator-facing contract standardizes on `accepted`, `deduped`, `scope busy`, and `blocked`, while the internal `ProviderOperationStartResult` can retain its current `started` variant in this slice if that avoids unnecessary churn.
- Rationale: FR-216-002 is about operator-visible start outcomes. The narrowest implementation is to keep internal compatibility where it helps while converging every visible surface and logical contract on the same operator vocabulary.
- Alternatives considered:
  - Rename every internal `started` code path immediately. Rejected because it widens refactor scope without increasing product certainty.
  - Keep operator-facing copy inconsistent with the spec language. Rejected because that would fail the core purpose of the feature.

## Decision 9: Testing should stay feature-first with one supporting unit seam

- Decision: Reuse and extend the existing `ProviderOperationStartGateTest` unit suite, then prove the feature through focused feature coverage on real Filament action hosts and canonical run-detail reason alignment.
- Rationale: The business truth is server-side authorization, preflight, dedupe, scope-busy handling, queue admission, and cross-surface reason consistency. Those are best proven with targeted feature tests against current action hosts, not with browser-heavy coverage or a new dedicated presenter harness.
- Alternatives considered:
  - Rely mainly on browser tests. Rejected because the critical behavior is server-owned and already easier to prove through existing resource/page test families.
  - Create a large presenter-only test harness. Rejected because it would shift effort from the real action hosts to indirection created only for the test suite.