TenantAtlas/specs/108-provider-access-hardening/spec.md
ahmido 0dc79520a4 feat: provider access hardening (RBAC write gate) (#132)
Implements provider access hardening for Intune write operations:

- RBAC-based write gate with configurable staleness thresholds
- Gate enforced at restore start and in jobs (execute + assignments)
- UI affordances: disabled rerun action, tenant RBAC status card, refresh RBAC action
- Audit logging for blocked writes
- Ops UX label: `rbac.health_check` now displays as “RBAC health check”
- Adds/updates Pest tests and SpecKit artifacts for feature 108

Notes:
- Filament v5 / Livewire v4 compliant.
- Destructive actions require confirmation.
- Assets: no new global assets.

Tested:
- `vendor/bin/sail artisan test --compact` (suite previously green) + focused OpsUx tests for OperationCatalog labels.
- `vendor/bin/sail bin pint --dirty`.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #132
2026-02-23 00:49:37 +00:00

18 KiB
Raw Permalink Blame History

Feature Specification: Provider Access Hardening v1 — Write-Path RBAC Gate (Intune)

Feature Branch: 108-provider-access-hardening
Created: 2026-02-22
Status: Draft
Input: Server-side write gate for Intune operations requiring RBAC hardening to be configured and healthy before any Graph mutations can execute.

Spec Scope Fields (mandatory)

  • Scope: tenant
  • Primary Routes: Tenant View page (RBAC card), Restore Run start actions, any Intune write-trigger actions
  • Data Ownership: tenant-owned (tenants.rbac_status, tenants.rbac_last_checked_at, operation_runs)
  • RBAC: workspace membership required + tenant-context access; write operations additionally gated by Intune RBAC hardening status (DB-persisted)

User Scenarios & Testing (mandatory)

User Story 1 — Write Operations Blocked When RBAC Not Configured (Priority: P1)

An operator attempts to restore an Intune policy (or restore assignments) on a tenant where Intune RBAC hardening has not been configured. The system blocks the operation at the server level before any Graph write occurs. The operator receives a clear explanation and a call-to-action directing them to configure Intune RBAC.

Why this priority: This is the core safety gate. Without it, write operations can execute with full app-only permissions and no blast-radius control — the primary compliance and trust risk this feature addresses.

Independent Test: Can be fully tested by attempting a restore start on a tenant with rbac_status = null and verifying the operation is blocked with the correct reason code.

Acceptance Scenarios:

  1. Given a tenant with rbac_status not set (null), When an operator triggers a restore action, Then the system blocks the operation, does not enqueue any job, and returns a reason code intune_rbac.not_configured with a human-readable message.
  2. Given a tenant with rbac_status = not_configured, When an operator triggers a restore assignments action, Then the system blocks the operation with reason code intune_rbac.not_configured and provides a CTA to "Setup Intune RBAC".
  3. Given a tenant with rbac_status = ok and a fresh rbac_last_checked_at, When an operator triggers a restore action, Then the operation proceeds normally without any gate interference.

User Story 2 — Write Operations Blocked When RBAC Unhealthy or Stale (Priority: P1)

An operator attempts a write operation on a tenant where RBAC hardening was previously configured but is now in a degraded, failed, or stale state. The system blocks the operation and explains why, offering relevant recovery actions.

Why this priority: Equally critical as US1 — a configured-but-broken RBAC state is arguably more dangerous because operators may assume it is safe.

Independent Test: Can be tested by setting rbac_status = degraded or rbac_last_checked_at to a date older than the freshness threshold, then attempting a write operation.

Acceptance Scenarios:

  1. Given a tenant with rbac_status = degraded, When a write operation is attempted, Then the system blocks it with reason code intune_rbac.unhealthy and a CTA to "Run health check".
  2. Given a tenant with rbac_status = failed, When a write operation is attempted, Then the system blocks it with reason code intune_rbac.unhealthy.
  3. Given a tenant with rbac_status = ok but rbac_last_checked_at older than the configured freshness threshold, When a write operation is attempted, Then the system blocks it with reason code intune_rbac.stale and a CTA to "Run health check".

User Story 3 — Defense in Depth: Job-Level Gate (Priority: P1)

Even if a write operation is somehow enqueued (race condition, direct dispatch, future code path), the job itself must re-check the gate before executing any Graph write call. If blocked, the job marks its OperationRun as failed with a stable reason code and does not attempt any Graph mutation.

Why this priority: Defense-in-depth is a non-negotiable for enterprise SaaS. The job-level gate is the last line of defense before actual Graph writes.

Independent Test: Can be tested by directly instantiating a restore job with a tenant in blocked state and verifying the OperationRun is marked failed without any Graph calls.

Acceptance Scenarios:

  1. Given a tenant with rbac_status = not_configured, When ExecuteRestoreRunJob runs, Then the job marks the OperationRun as failed with reason code intune_rbac.not_configured and performs zero Graph write calls.
  2. Given a tenant with rbac_status = ok but stale rbac_last_checked_at, When RestoreAssignmentsJob runs, Then the job marks the OperationRun as failed with intune_rbac.stale.
  3. Given a tenant with rbac_status = ok and fresh health check, When a restore job runs, Then the gate passes and the job proceeds to execute Graph writes.

User Story 4 — UI: Disabled Actions with Reason and CTA (Priority: P2)

When the Intune write gate would block an operation, the UI should proactively disable write-trigger actions (e.g., "Execute restore", "Restore assignments") and show the operator why the action is unavailable, along with a relevant CTA.

Why this priority: Good UX prevents confusion and reduces support burden. However, server-side enforcement (US1US3) is the security boundary; UI is an affordance.

Independent Test: Can be tested by rendering a restore action on a tenant with blocked RBAC status and verifying the action is disabled with the correct tooltip/helper text.

Acceptance Scenarios:

  1. Given a tenant with rbac_status = null, When the operator views restore actions, Then write-trigger actions are visible but disabled, with a helper explaining "Intune RBAC not configured" and a link to the Tenant View page RBAC section.
  2. Given a tenant with rbac_status = degraded, When the operator views write actions, Then actions are disabled with a helper explaining the degraded state and a CTA to run a health check.
  3. Given a tenant with rbac_status = ok and fresh health, When the operator views write actions, Then actions are enabled normally.

User Story 5 — Tenant RBAC Status Card (Progressive Disclosure) (Priority: P2)

On the tenant view page, the RBAC hardening status is displayed as a compact card with a badge, short explanation, and contextual actions — replacing the current approach of showing many individual RBAC fields.

Why this priority: Improves operator understanding of RBAC posture at a glance. Supports the write gate UX by making status visible before operators attempt writes.

Independent Test: Can be tested by viewing a tenant page with various rbac_status values and verifying the card renders the correct badge, text, and actions.

Acceptance Scenarios:

  1. Given a tenant with rbac_status = ok, When the operator views the tenant page, Then a card displays "Intune Access Hardening" with a "Healthy" badge and a "Run health check" action.
  2. Given a tenant with rbac_status = null, When the operator views the tenant page, Then a card displays a "Not Configured" badge and a "Setup Intune RBAC" action.
  3. Given a tenant with rbac_status = degraded, When the operator views the tenant page, Then a card displays a "Degraded" badge and both "Run health check" and "View details" actions.

User Story 6 — Auditable Blocked Write Attempts (Priority: P3)

When a write operation is blocked by the gate, the event is recorded for audit and compliance purposes. At the job level this is captured via the OperationRun failure. At the UI level, an optional AuditLog entry records the blocked attempt.

Why this priority: Important for compliance and post-incident review but not a functional blocker for the gate itself.

Independent Test: Can be tested by triggering a blocked write and verifying the OperationRun or AuditLog contains the expected reason code and metadata.

Acceptance Scenarios:

  1. Given a blocked write attempt in a job, When the gate blocks execution, Then the OperationRun is marked failed with reason_code, reason_message, and no sensitive data.
  2. Given a blocked write attempt at the UI start surface, When the gate prevents operation start, Then an AuditLog entry is created with the action intune_rbac.write_blocked, the tenant ID, and the operation type.

Edge Cases

  • What happens when rbac_status is transitioning (health check running concurrently with a write attempt)? The gate evaluates persisted status only; a running health check OperationRun has no special effect on gate evaluation. If the last persisted status was ok and fresh, writes proceed. If stale or degraded, writes remain blocked. The operator must wait for the health check to complete and then retry. No "in-progress" sentinel or lock is applied to the gate.
  • What happens when the freshness threshold configuration changes? The gate uses the threshold value at evaluation time. Lowering the threshold may immediately block previously-allowed operations if rbac_last_checked_at is now considered stale.
  • What happens when a tenant has rbac_status = ok but the underlying RBAC artifacts were removed externally in Entra/Intune? The gate will allow the write (status is persisted). The next health check will detect the problem and update rbac_status to degraded or failed. This is by design — no live Graph calls in the gate path.
  • How does the gate interact with ProviderOperationStartGate? The write hardening gate runs as an additional check within or alongside the existing start gate. If the provider connection is unresolved, that blocking reason takes precedence. If the connection is resolved but RBAC is unhealthy, the write hardening gate blocks.

Clarifications

Session 2026-02-22

  • Q: How should the gate behave while a health check OperationRun is in-progress? → A: Gate evaluates persisted status only; running health check has no effect on gate evaluation (consistent with FR-004).
  • Q: Where does the "Setup Intune RBAC" CTA link to? → A: Links to the existing Tenant View page RBAC section (no new wizard page created in this feature scope).
  • Q: Does an AuditLog model exist or must it be created? → A: AuditLog model already exists (related to Tenant via hasMany). UI-level blocked write entries will use it directly — no new table needed.
  • Q: When the write gate is disabled via config, should writes proceed and should it log? → A: Writes proceed and the gate logs a warning per evaluation that the gate is bypassed.

Requirements (mandatory)

Constitution alignment (required): This feature does not introduce new Microsoft Graph calls or new OperationRun types. It adds a server-side gate that blocks existing write operations. Blocked writes at the job level are recorded in the existing OperationRun (failed status + reason code). Blocked writes at the UI level record an entry in the existing AuditLog model (already related to Tenant). No new contract registry entries are required; this feature gates operations that already have registered contracts.

Constitution alignment (RBAC-UX):

  • Authorization plane: tenant-context /admin/t/{tenant}/...
  • The write gate is not an RBAC capability check — it is a tenant-health prerequisite check that applies regardless of the operator's role.
  • Existing RBAC capability checks (workspace membership, manage permission) remain enforced before the write gate is evaluated.
  • 404 vs 403 semantics: The write gate returns a 422-family response (operation precondition not met), not 403 or 404, since the operator is authorized but the tenant's RBAC posture is insufficient.
  • No new capability strings are introduced.
  • Destructive actions (restore) already require confirmation; the gate adds a pre-check before the confirmation flow even applies.

Constitution alignment (BADGE-001): The tenant RBAC status card uses the existing TenantRbacStatus badge domain from BadgeDomain. New badge values (stale) must be registered in the centralized badge semantics. Tests will cover all badge states.

Constitution alignment (Filament Action Surfaces): This feature modifies existing Filament action surfaces (restore start actions on RestoreRunResource and tenant view page). No new Resources or Pages are created. The UI Action Matrix below covers the changes.

Constitution alignment (UX-001): The tenant RBAC card is placed within the existing tenant View page infolist, inside a Section. No naked inputs. Badge semantics use BADGE-001.

Functional Requirements

  • FR-001: System MUST evaluate Intune RBAC hardening status before allowing any Intune write operation to start or execute.
  • FR-002: System MUST block write operations when the tenant's RBAC status is null, not_configured, degraded, or failed.
  • FR-003: System MUST block write operations when rbac_last_checked_at is older than a configurable freshness threshold (default: 24 hours).
  • FR-004: The write gate MUST use only persisted database state (no synchronous Graph calls during evaluation).
  • FR-005: When a write operation is blocked at the job level, the system MUST mark the associated OperationRun as failed with a stable reason code (intune_rbac.not_configured, intune_rbac.unhealthy, or intune_rbac.stale) and a sanitized message.
  • FR-006: When a write operation is blocked at the UI start surface, the system MUST prevent job enqueue and display the reason with a CTA to the operator.
  • FR-007: Blocked write operations MUST NOT perform any Microsoft Graph mutations — zero write calls.
  • FR-008: The write gate MUST be enforced at both the start surface (UI/command) and the job execution layer (defense in depth).
  • FR-009: The tenant view page MUST display a compact RBAC hardening status card with badge, explanation, and contextual actions (link to Tenant View RBAC section, run health check).
  • FR-010: Write-trigger Filament actions MUST be disabled with a reason tooltip when the write gate would block the operation.
  • FR-011: The gate design MUST be provider-agnostic in its interface, even though v1 only implements the Intune check. Future providers can plug in without redesign.
  • FR-012: A "Refresh Intune RBAC status" action on the tenant page MUST start an OperationRun that runs the health check asynchronously.
  • FR-013: The gate MUST be toggleable via configuration (tenantpilot.hardening.intune_write_gate.enabled, default: true) for rollback safety. When disabled, the gate MUST allow writes to proceed and MUST log a warning per evaluation that the gate was bypassed.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance Row Actions Bulk Actions Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
TenantResource ViewTenant Tenant View page RBAC status card with badge "Refresh RBAC status" (OperationRun), "Setup RBAC" (link to Tenant View RBAC section) Yes (blocked writes) Card replaces raw field list for RBAC section
RestoreRunResource Restore Run actions "Execute" action: disabled when gate blocks "Execute": disabled + tooltip when blocked Yes (OperationRun failure) Gate check added before existing confirmation

Key Entities

  • IntuneRbacWriteGate: Central service responsible for evaluating whether an Intune write operation is allowed. Reads tenant RBAC status fields, returns allowed or throws a domain exception with a stable reason code.
  • ProviderAccessHardeningRequired: Domain exception carrying tenant ID, operation identifier, reason code, and a safe human-readable message. Used by both start surfaces and job-level enforcement.
  • Tenant (existing): Existing entity with rbac_status, rbac_status_reason, rbac_last_checked_at fields used as the gate's data source.
  • OperationRun (existing): Existing entity that captures job outcomes. Blocked write operations store the reason code in the run's failure metadata.

Assumptions

  • The existing rbac_status, rbac_status_reason, and rbac_last_checked_at fields on the Tenant model are sufficient for gate evaluation — no schema migrations are required.
  • The existing periodic health check job (or ProviderConnectionHealthCheckJob) already updates RBAC status fields, or will be extended to do so as part of this feature.
  • The freshness threshold defaults to 24 hours and is configured via config('tenantpilot.hardening.intune_write_gate.freshness_threshold_hours').
  • ProviderOperationStartGate is the existing entry point for starting provider-backed operations and can be extended to invoke the write hardening gate for write-classified operations.
  • The gate applies to all Intune write operations: restore, restore assignments, and any future write operation types.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: 100% of Intune write operations are blocked when the tenant's RBAC hardening status is not "ok" or is stale — verified by automated tests covering all three reason codes.
  • SC-002: Zero Graph write calls occur when the gate blocks an operation — verified by mocking the Graph client and asserting zero invocations in gate-blocked scenarios.
  • SC-003: Operators see a clear reason and CTA within the UI when a write action is blocked — verified by Livewire component tests asserting disabled state and helper text.
  • SC-004: OperationRun failures from gate blocks contain stable, parseable reason codes — verified by asserting the reason_code field in job-level tests.
  • SC-005: The gate adds no synchronous Graph calls to any UI render or action request — verified by architectural tests asserting no HTTP calls during gate evaluation.
  • SC-006: Full test suite remains green with the gate enabled (default) and with the gate disabled (config toggle) — regression safety.