# Feature Specification: Provider Foundation v1 (Microsoft-first, Security-first)

**Feature Branch**: `061-provider-foundation`  
**Created**: 2026-01-23  
**Status**: Draft  
**Input**: Build a provider integration foundation (starting with Microsoft) that centralizes provider communication, enables safe tenant-scoped connections, runs operations in the background with a tracked run record, prevents overlapping runs per provider tenant by default, and prevents credential leakage.

## Clarifications

### Session 2026-01-23

- Q: When an admin starts a provider operation for a target scope that already has an active run, what should the system do by default? → A: Dedupe: reuse/return the active run (no new run created).
- Q: For Microsoft in v1, what should the admin enter (and we store) as the canonical target scope identifier for a connection? → A: Entra tenant ID (GUID); domains may be stored only as a display label, not as the canonical identifier.
- Q: In v1, should a tenant be able to have multiple Microsoft provider connections, or exactly one? → A: Multiple connections allowed, but one “default” connection is required for operations unless explicitly selected.
- Q: In v1, which tenant roles should be allowed to (a) manage provider connections/credentials and (b) start provider operations (health check, inventory, compliance)? → A: Owner/Manager manage connections and credentials; Owner/Manager/Operator can start operations; Readonly is view-only.
- Q: If a provider run is already active for a target scope, and a user tries to start a different provider operation type, what should happen by default? → A: Block (“scope busy”) and link to the active run.

## User Scenarios & Testing *(mandatory)*

### User Story 1 - Set up a provider connection safely (Priority: P1)

An Owner or Manager can create and manage a provider connection for a tenant, attach credentials, and see the connection’s current state without ever exposing secrets.

**Why this priority**: All provider-backed capabilities depend on a secure, tenant-scoped connection and a safe way to manage credentials.

**Independent Test**: An Owner/Manager can create a connection, attach credentials, and later view/manage the connection without any secret value being displayed or leaked in the UI.

**Acceptance Scenarios**:

1. **Given** a tenant with no provider connections, **When** an Owner/Manager creates a new Microsoft provider connection with a display name and provider tenant identifier, **Then** the connection is saved and is uniquely identifiable within the tenant.
2. **Given** an existing provider connection with credentials attached, **When** an Owner/Manager views the connection details, **Then** secret values are never displayed and only safe metadata is shown.

---

### User Story 2 - Verify connection health without blocking the UI (Priority: P2)

An Owner, Manager, or Operator can trigger a connection health check and see a tracked run with a clear outcome, including a safe, categorized error when the check fails.

**Why this priority**: Admins need a reliable way to confirm connectivity/permissions and to troubleshoot failures without guesswork.

**Independent Test**: Triggering “Check connection” creates a new run, completes asynchronously, updates the connection’s health state, and records a stable failure category when applicable.

**Acceptance Scenarios**:

1. **Given** a configured provider connection, **When** an Owner/Manager/Operator runs a health check, **Then** the system creates a visible operation run and updates the connection’s health status when the run completes.
2. **Given** invalid or revoked credentials, **When** an Owner/Manager/Operator runs a health check, **Then** the run ends in a failure state with a stable reason code and a short, sanitized message (no secrets or raw payloads).

---

### User Story 3 - Run provider operations with safety and observability (Priority: P3)

An Owner, Manager, or Operator can run provider-backed operations (such as inventory collection and compliance snapshots) that are tenant-scoped, safe by default, limited to one active run per provider tenant (by default), and fully tracked through Operations monitoring.

**Why this priority**: Provider operations can be long-running and failure-prone; they must be safe by default and easy to audit and troubleshoot.

**Independent Test**: Starting a provider operation results in a single observable run, respects the per-scope concurrency limit, and surfaces summary counts and categorized failures.

**Acceptance Scenarios**:

1. **Given** a valid provider connection, **When** an Owner/Manager/Operator initiates an inventory collection run, **Then** the run is queued/executed asynchronously and appears in Operations monitoring with provider + scope context.
2. **Given** an active run for the same provider tenant identifier and the same operation type, **When** an Owner/Manager/Operator starts that operation again targeting the same scope, **Then** the system returns the active run (no new run created) and communicates the outcome clearly.
3. **Given** an active run for the same provider tenant identifier, **When** an Owner/Manager/Operator starts a different provider operation type targeting the same scope, **Then** the system blocks the request (“scope busy”) and links to the active run.

### Edge Cases

- A provider tenant identifier is entered that does not match the attached credentials: the system prevents unsafe configuration and guides the admin to fix it.
- Provider access is revoked or consent changes: health checks and operations fail with a clear, stable reason code and a safe message.
- Provider throttling/transient outages: operations behave predictably, remain tracked as a single run, and provide a clear outcome without repeated noise.
- Provider service downtime: runs fail safely and do not cause repeated background failures.
- An admin views Provider Connections or Operations pages: pages render using stored data only and never trigger provider calls during render/poll.
- A user attempts to start a different operation while a run is active for the same scope: the system blocks with “scope busy” and links to the active run.

## Requirements *(mandatory)*

**Constitution alignment:** This feature introduces tenant-scoped external-provider operations. It must enforce tenant isolation, safe run observability, and sanitized failure handling, and it must include automated tests for these guarantees.

### Scope

**In scope (v1)**

- Provider connections for Microsoft as the first supported provider.
- Multiple Microsoft connections per tenant are supported; one connection is marked as the default for operations unless the user explicitly selects another connection.
- Secure credential attachment/rotation without exposing secret values.
- A single controlled outbound-provider communication path (gateway) used by all provider-backed operations.
- Observable asynchronous operations (operation runs) for health checks and provider-backed data collection.
- Central concurrency limiting per provider tenant identifier (default: 1 concurrent run per scope).
- A minimal set of provider capabilities: inventory collection and compliance snapshot (counts).

**Out of scope (v1)**

- User-delegated sign-in flows.
- Certificate-based credentials and external secret managers.
- Cross-tenant “global MSP” dashboards.
- Provider-backed remediation/script execution and evidence-collection suites.

### Functional Requirements

- **FR-001**: System MUST store provider connections as tenant-scoped records, including provider type, a canonical provider tenant identifier (target scope) (v1 Microsoft: Entra tenant ID (GUID)), display name, and connection state (e.g., connected / needs consent / error / disabled).
- **FR-002**: System MUST enforce uniqueness of provider connections within a tenant by provider type + provider tenant identifier.
- **FR-003**: System MUST store provider credentials separately from provider connection identity and MUST support credential rotation without changing the connection identity.
- **FR-004**: System MUST never display stored secret values after they are submitted, and MUST not expose secrets in UI, notifications, operation runs, or logs.
- **FR-005**: System MUST categorize provider failures using stable reason codes and short, sanitized messages suitable for audit and support triage.
- **FR-006**: System MUST route all outbound provider communication through a single controlled gateway that enforces consistent authentication, tracking identifiers for support, and safe handling of throttling and transient failures.
- **FR-007**: System MUST ensure that viewing admin pages (including Provider Connections and Operations monitoring) never triggers outbound provider calls during page render/poll.
- **FR-008**: System MUST execute provider operations asynchronously when they involve provider communication or may exceed normal UI response times.
- **FR-009**: System MUST create or reuse a canonical operation run for each provider operation and MUST record provider, target scope, module/capability, timestamps, and outcome for Monitoring → Operations.
- **FR-010**: System MUST enforce a central per-scope concurrency limit for provider operations (default: 1 concurrent run per provider tenant identifier). If a run is already active for a scope: (a) re-starting the same operation type MUST return the active run (no new run created), and (b) starting a different operation type MUST be blocked with “scope busy” and a link to the active run.
- **FR-011**: System MUST provide a “Check connection” operation that updates connection health state based on the result and records the outcome as an operation run.
- **FR-012**: System MUST define provider capability interfaces that allow adding new provider-backed modules (inventory, compliance, directory, scripts) without scattering provider-specific logic across unrelated features.
- **FR-013**: System MUST ship at least one Microsoft provider implementation that supports (a) inventory collection and (b) compliance snapshot runs, producing stored results and summary counts.
- **FR-014**: Security-relevant configuration changes (creating/updating connections and credentials, disabling connections) MUST be recorded in an audit trail with actor + tenant + timestamp.
- **FR-015**: System MUST maintain a centralized, reviewed registry of allowed provider operations, so new provider calls cannot be introduced ad-hoc outside the approved integration path.
- **FR-016**: System MUST allow multiple Microsoft provider connections per tenant (distinguished by Entra tenant ID (GUID)) and MUST require exactly one default Microsoft connection that is used when starting provider operations without an explicit connection selection.
- **FR-017**: System MUST enforce tenant-role based access: Owner/Manager can create/update/disable provider connections and manage credentials; Owner/Manager/Operator can start provider operations; Readonly is view-only and cannot start operations.

### Acceptance Criteria

- Connection setup is tenant-scoped, unique per provider + scope, uses Entra tenant ID (GUID) as the canonical scope identifier, and does not expose secrets.
- Health checks and provider operations run in the background and are tracked as operation runs with clear outcomes.
- Provider operations do not overlap for the same target scope by default.
- When a run is already in progress for a target scope, starting the same operation returns the active run (no new run created).
- When a run is already in progress for a target scope, starting a different operation is blocked with “scope busy” and links to the active run.
- When multiple Microsoft connections exist for a tenant, one is designated as the default; starting an operation without selecting a connection uses the default.
- Only Owner/Manager can manage connections and credentials; Owner/Manager/Operator can start operations; Readonly can only view.
- Monitoring/Connections pages are “read-only at render time” and do not call providers while loading or polling.

### Key Entities *(include if feature involves data)*

- **Provider Connection**: A tenant-scoped representation of a relationship to an external provider, including target scope identifier, display name, state, and health indicators.
- **Provider Credential**: A securely stored credential set linked to exactly one provider connection, used only for provider communication.
- **Target Scope**: A stable identifier that defines which external tenant/environment a provider operation targets (v1 Microsoft: Entra tenant ID (GUID)); domains may be stored only as labels.
- **Default Provider Connection**: The tenant’s selected default connection for a provider, used when starting provider operations without an explicit connection selection.
- **Operation Run**: A canonical record of an initiated provider operation, including initiator, scope, timestamps, outcome, summary counts, and categorized failures.
- **Provider Capability (Module)**: A named area of functionality (e.g., inventory, compliance) that can be implemented per provider and executed as an operation run.

## Success Criteria *(mandatory)*

### Measurable Outcomes

- **SC-001**: A tenant admin can create a provider connection and attach credentials in under 5 minutes without support intervention.
- **SC-002**: 100% of provider-backed operations initiated by users create an operation run visible in Operations monitoring within 5 seconds of initiation.
- **SC-003**: For a given target scope, the system prevents overlapping provider operations by default (max 1 concurrent run per scope), with clear user feedback.
- **SC-004**: Automated tests demonstrate zero secret leakage in UI surfaces and operation run messages (no tokens/secrets/PII displayed or persisted).
- **SC-005**: 95% of connection health checks complete and update connection health status within 2 minutes under normal provider conditions.

## Assumptions

- v1 targets a single provider (Microsoft) but must be designed to add additional providers and modules later without reworking existing features.
- Provider operations are initiated by authorized administrators and must respect tenant scoping and audit requirements.
- Monitoring and “read-only” admin pages prioritize predictable load and safety over real-time provider querying.

## Dependencies

- A tenant model and authorization boundaries exist so provider connections and runs can be scoped correctly.
- Operations monitoring and an audit trail mechanism exist (or are introduced alongside this feature) to record runs and security-relevant configuration changes.