12 KiB
Feature Specification: Inventory Core (Sync + Catalog)
Feature Branch: feat/040-inventory-core
Created: 2026-01-07
Status: Draft
Overview
TenantPilot needs a reliable, tenant-scoped inventory catalog that represents what the system last observed in Microsoft Intune. This inventory is used as the primary substrate for analysis, reporting, monitoring, and UI visibility.
Key intent: Inventory is a “last observed” catalog (TenantPilot’s truth), not an absolute truth about Intune completeness.
Non-goal: A sync MUST NOT create snapshots or backups automatically.
User Scenarios & Testing (mandatory)
User Story 1 — Run Inventory Sync for a Tenant (Priority: P1)
A tenant admin (or scheduled automation) runs an inventory sync for a tenant to populate/update the inventory catalog.
Why this priority: Everything else depends on having a stable, queryable inventory catalog.
Independent Test: Run a sync for a tenant and verify inventory items are upserted, tenant-scoped, and last-observed fields update without producing snapshots/backups.
Acceptance Scenarios:
- Given a tenant and a configured selection of policy types/categories, When a sync completes, Then inventory items are upserted for each observed object with correct
tenant_id,policy_type,external_id, andlast_seen_at. - Given an existing inventory item, When the same object is observed again, Then the existing record is updated (not duplicated) and
last_seen_atandlast_seen_run_idare updated. - Given a sync selection that excludes some policy types/categories, When the sync completes, Then only objects within that selection are observed/updated.
- Given a successful sync, When the sync finishes, Then no policy snapshots/backups are created as a side effect.
User Story 2 — Observe Completeness/Confidence of a Sync (Priority: P1)
A tenant admin views whether missing items are likely “not seen” due to partial/failed sync vs confidently missing in a clean run.
Why this priority: Prevents misleading conclusions (e.g., “deleted”) when Graph errors or permissions issues occur.
Independent Test: Mark a run as partial/failed and verify missing items are presented as low confidence (derived at query/UI time) and do not imply deletion.
Acceptance Scenarios:
- Given a
latestRunfor a tenant+selection that hasstatus != successorhad_errors = true, When inventory is queried for missing items relative to that run, Then missing is presented as low confidence (and no stronger claim is made). - Given a
latestRunfor a tenant+selection that isstatus = successandhad_errors = false, When an item was not observed in that run, Then the UI can show it as “not seen in latest run” (higher confidence) without implying deletion.
User Story 3 — Monitor Sync Runs (Priority: P2)
A tenant admin (and platform admin) can see sync run history and quickly diagnose failures using stable error codes and counts.
Why this priority: Makes automation observable and supportable at MSP scale.
Independent Test: Create sync runs with different statuses and verify run records include counts and stable error codes.
Acceptance Scenarios:
- Given multiple sync runs, When a user views run history, Then each run shows status, started/finished timestamps, and counts (observed/updated/errors).
- Given a throttling event, When a sync run records it, Then the run captures a stable error code (e.g., “graph_throttled”) and does not fail silently.
Edge Cases
- Sync is triggered twice for the same tenant+selection while the first is still running.
- Sync completes with partial results due to transient Graph errors.
- A tenant’s permissions change between runs causing objects to be invisible.
- Selection payload is equivalent but arrays are ordered differently.
Requirements (mandatory)
Functional Requirements
- FR-001: System MUST maintain an Inventory Catalog that represents TenantPilot’s last observed state of Intune objects.
- FR-002: System MUST upsert inventory items by a stable identity key that prevents duplicates.
- FR-003: System MUST record Sync Runs with status, timestamps, counts, and stable error codes.
- FR-004: System MUST ensure tenant isolation for all inventory and run queries.
- FR-005: System MUST support deterministic selection scoping via
selection_hashfor sync runs. - FR-006: System MUST NOT create snapshots/backups during inventory sync (sync is not backup).
- FR-007: System MUST derive “missing” as a computed state relative to the latest completed run for the same tenant+selection.
- FR-008: System MUST enforce
meta_jsonbkey whitelisting by dropping unknown keys without failing the sync. - FR-009: System MUST implement safe automation behavior: locking, idempotency, and observable failures.
Non-Functional Requirements
- NFR-001 (Concurrency limits): Sync automation MUST enforce two limits: a global concurrency limit (across tenants) and a per-tenant concurrency limit.
- NFR-002 (Throttling resilience): Sync MUST handle throttling/transient failures (e.g., 429/503) using backoff + jitter.
- NFR-003 (Deterministic behavior): Selection hashing and capability derivation MUST be deterministic and testable.
- NFR-004 (Data minimization): Inventory MUST store metadata and whitelisted meta only; payload-heavy content belongs to snapshots/backups.
- NFR-005 (Safe logging): Logs MUST not contain secrets/tokens; monitoring MUST rely on run records + error codes.
Key Entities (include if feature involves data)
- Inventory Item: A tenant-scoped record representing a single Intune object as last observed (type, external identity, display name/metadata, last observed fields, whitelisted meta).
- Sync Run: A tenant-scoped record representing an inventory sync execution for a specific selection (selection_hash, status, timestamps, counts, stable error codes).
- Selection Payload: The normalized representation of the run scope used to compute selection_hash.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: For a given tenant, inventory sync can be executed repeatedly without creating duplicate inventory items.
- SC-002: A sync run always produces a run record with status, timestamps, and counts.
- SC-003: Missing is computed relative to latest completed run for the same tenant+selection; runs with different selection hashes do not affect each other.
- SC-004: Unknown meta keys never break sync and are not persisted.
- SC-005: Operators can distinguish “not seen” from “deleted” (deleted is reserved and not produced in this feature).
Spec Appendix: Deterministic Selection + Missing Semantics (copy/paste-ready)
Definition: “completed” and “latestRun”
- Definition:
completedmeansstatus ∈ {success, partial, failed, skipped}andfinished_at != null(or the equivalent field used by the run model). - Definition:
latestRunis the latest completed Sync Run for(tenant_id, selection_hash).
Selection Hash
selection_payloadincludes only fields that influence run scope:policy_types[],categories[],include_foundations(bool),include_dependencies(bool)
canonical_json(payload)is a canonical JSON serialization with:- sorted object keys
- sorted arrays for
policy_typesandcategories - no whitespace / pretty formatting
selection_hash = sha256(canonical_json(selection_payload))- AC: Identical selection payload ⇒ identical selection_hash (independent of array ordering).
Missing is derived (not persisted)
- Definition: Missing is a derived state computed at query/UI time relative to
latestRun(tenant_id, selection_hash). - AC: Runs with different
selection_hashdo not affect missing computation for other selections. - If
latestRun.status != successorlatestRun.had_errors = true, items not observed in that run are presented asmissing (low confidence).
Deleted is reserved
deletedis reserved and MUST NOT be produced by this feature.- Only a later lifecycle feature may set
deletedwith strict verification rules.
Meta Whitelist (Fail-safe)
meta_jsonbhas a documented whitelist of allowed keys.- AC: Unknown
meta_jsonbkeys are dropped (not persisted) and MUST NOT cause sync to fail.
Initial meta_jsonb whitelist (v1)
Allowed keys (all optional; if not applicable for a type, omit):
odata_type: string (copied from Graph@odata.type)etag: string|null (Graph etag if available; never treated as a secret)scope_tag_ids: array (IDs only; no display names required)assignment_target_count: int|null (count only; no target details)warnings: array (bounded, human-readable, no secrets)
AC: Any other key is dropped silently (not persisted) and MUST NOT fail sync.
Observed Run
inventory_items.last_seen_run_idandinventory_items.last_seen_atare updated when an item is observed.last_seen_run_idimplies the selection viasync_runs.selection_hash; no per-item selection hash is required for core.
Run Error Codes (taxonomy)
Sync runs record:
status: one ofsuccess|partial|failed|skippedhad_errors: bool (true if any non-ideal condition occurred)error_codes[]: array of stable machine-readable codes (no secrets)
Minimal taxonomy (3–8 codes):
lock_contended(a run could not start because the per-tenant+selection lock is held)concurrency_limit_global(global concurrency limit reached; run skipped)concurrency_limit_tenant(per-tenant concurrency limit reached; run skipped)graph_throttled(429 encountered; run partial/failed depending on recovery)graph_transient(503/timeout/other transient errors)graph_forbidden(403/insufficient permission)unexpected_exception(unexpected failure; message must be safe/redacted)
Rule: Run records MUST store codes (and safe, bounded context) rather than raw exception dumps or tokens.
Concurrency Limits (source, defaults, behavior)
Source: Config (recommended keys):
tenantpilot.inventory_sync.concurrency.global_maxtenantpilot.inventory_sync.concurrency.per_tenant_max
Defaults (if not configured):
- global_max = 2
- per_tenant_max = 1
Behavior when limits are hit:
- The system MUST create a Sync Run record with:
status = skippedhad_errors = true(so missing stays low-confidence for that selection)error_codes[]includesconcurrency_limit_globalorconcurrency_limit_tenantstarted_at/finished_atset (observable)
- No inventory items are mutated in a skipped run.
Testing Guidance (non-implementation)
These are test cases expressed in behavior terms (not code).
Test Cases — Sync and Upsert
- TC-001: Sync creates or updates inventory items and sets
last_seen_at. - TC-002: Re-running sync for the same tenant+selection updates existing records and does not create duplicates.
- TC-003: Inventory queries scoped to Tenant A never return Tenant B’s items.
- TC-004: Inventory sync does not create or modify snapshot/backup records (e.g., no new rows in
policy_versions,backup_sets,backup_items,backup_schedules,backup_schedule_runs).
Test Cases — Selection Hash Determinism
- TC-010: Same selection payload with arrays in different order yields the same selection_hash.
- TC-011: Different selection payload yields a different selection_hash.
Test Cases — Missing Semantics
- TC-020: Missing is derived relative to latest completed run for the same tenant+selection.
- TC-021: A run for selection Y does not affect missing computation for selection X.
- TC-022: If latestRun is partial/failed or had_errors, missing is shown as low confidence.
Test Cases — Meta Whitelist
- TC-030: Unknown meta keys are not persisted and do not fail sync.
Test Cases — Automation Safety
- TC-040: Concurrent sync triggers for the same tenant+selection do not result in overlapping runs (lock behavior).
- TC-041: A throttling event results in a visible, stable error code and a non-silent failure signal.