TenantAtlas/specs/112-list-expand-parity/research.md
ahmido 32c3a64147 feat(112): LIST $expand parity + Entra principal names (#136)
Implements LIST `$expand` parity with GET by forwarding caller-provided, contract-allowlisted expands.

Key changes:
- Entra Admin Roles scan now requests `expand=principal` for role assignments so `principal.displayName` can render.
- `$expand` normalization/sanitization: top-level comma split (commas inside balanced parentheses preserved), trim, dedupe, allowlist exact match, caps (max 10 tokens, max 200 chars/token).
- Diagnostics when expands are removed/truncated (non-prod warning, production low-noise).

Tests:
- Adds/extends unit coverage for Graph contract sanitization, list request shaping, and the EntraAdminRolesReportService.

Spec artifacts included under `specs/112-list-expand-parity/`.

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #136
2026-02-25 23:54:20 +00:00

87 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Research — Graph Contracts: LIST `$expand` Parity Fix
## Context
This feature fixes a capability mismatch between LIST vs GET Graph calls:
- `MicrosoftGraphClient::getPolicy()` already supports `$expand` (via query sanitization).
- `MicrosoftGraphClient::listPolicies()` currently does **not** forward `$expand` at all.
Downstream impact: `EntraAdminRolesReportService` lists `entraRoleAssignments` and then reads `principal.displayName`. The contract allowlist permits `principal`, but it is never requested on LIST today, so the report frequently falls back to `"Unknown"`.
## Decisions
### 1) `$expand` input normalization
**Decision**: Accept `expand` as either:
- comma-separated string (e.g. `"principal,roleDefinition($select=id,displayName)"`)
- array of strings (e.g. `["principal", "roleDefinition($select=id,displayName)"]`)
Normalization steps (applied consistently inside `GraphContractRegistry::sanitizeQuery()`):
- Split on **top-level commas only** for string input (commas inside balanced parentheses are not separators).
- Trim whitespace.
- Drop empty tokens.
- Dedupe while preserving first-seen order.
**Rationale**: This matches the spec clarifications and keeps all LIST/GET query-shape logic in the contract registry (single source of truth).
Note: For subquery/select-style expansions, prefer passing an array form (one token per expansion) to avoid ambiguity.
**Alternatives considered**:
- Normalize in `MicrosoftGraphClient` only: rejected because sanitization must remain centralized.
### 2) Capability allowlist enforcement
**Decision**: Keep exact-match allowlist behavior via `array_intersect()`.
**Rationale**: Prevents partial matches and blocks accidental subquery injection. Existing registry behavior is already exact-match.
### 3) Performance/safety caps for `$expand`
**Decision**: Apply a hard cap during normalization:
- `maxItems = 10` allowed expand tokens (keep first 10 allowed after allowlist)
- `maxTokenLen = 200` per token (tokens exceeding are treated as invalid and dropped)
**Rationale**:
- Contract allowlists are small in this repo (most policy types allow 02 expands).
- A small cap prevents pathological caller inputs from bloating URLs and logs.
- `200` aligns with the existing bounded-string pattern used in `InventoryMetaSanitizer::boundedStringList($values, 25, 200)`.
**Alternatives considered**:
- `maxItems = 25` to match `InventoryMetaSanitizer`: rejected as unnecessary for Graph query params.
- Only cap total string length: rejected because token count is easier to reason about and test.
### 4) Diagnostics when sanitize removes/truncates `$expand`
**Decision**: Emit a structured log record when:
- disallowed expand tokens are dropped
- tokens are dropped due to maxTokenLen
- allowed tokens are truncated due to maxItems
Logging behavior:
- Non-production: `Log::warning('Graph query sanitized', [...])`
- Production: `Log::debug('Graph query sanitized', [...])`
Context fields to include:
- `policy_type`
- `query_key` = `'$expand'`
- `requested` (normalized tokens)
- `allowed` (filtered tokens)
- `removed` (dropped tokens)
- `reason` (e.g. `disallowed`, `too_long`, `max_items`)
**Rationale**:
- The spec requires that misuse is detectable in non-production.
- Production should be low-noise.
- This matches existing patterns where services log structured `type`/`reason` context (e.g. inventory dependency extraction warnings).
**Alternatives considered**:
- Rely on `GraphResponse->warnings` only: rejected because `GraphLogger` logs warnings at info level, which can be noisy in production and doesnt guarantee visibility in non-prod.
- Introduce a new event/telemetry system: rejected as out of scope for this fix.
## Testing approach (selected)
- Unit test `GraphContractRegistry::sanitizeQuery()` for `$expand` normalization:
- comma-separated string is split
- disallowed tokens dropped
- max-items behavior keeps first N allowed
- emits `Log::warning` in non-prod when removal occurs
- Unit test `MicrosoftGraphClient::listPolicies()` request shape using `Http::fake()` + `Http::assertSent()`:
- when `expand` is provided and allowlisted, outbound URL includes `%24expand=...`
- Unit/feature test for `EntraAdminRolesReportService`:
- ensures role assignment list call requests `expand=principal`
- ensures payload uses `principal.displayName` when present