From cae0c584337f185e458d171d7517b7fa61ee745d Mon Sep 17 00:00:00 2001 From: Ahmed Darrazi Date: Wed, 7 Jan 2026 15:20:45 +0100 Subject: [PATCH] spec: finalize 040 plan/tasks/checklist --- .../checklists/requirements.md | 45 ++++++ specs/040-inventory-core/plan.md | 134 +++++++++++++++--- specs/040-inventory-core/spec.md | 54 +++++++ specs/040-inventory-core/tasks.md | 40 +++--- 4 files changed, 236 insertions(+), 37 deletions(-) create mode 100644 specs/040-inventory-core/checklists/requirements.md diff --git a/specs/040-inventory-core/checklists/requirements.md b/specs/040-inventory-core/checklists/requirements.md new file mode 100644 index 0000000..6560703 --- /dev/null +++ b/specs/040-inventory-core/checklists/requirements.md @@ -0,0 +1,45 @@ +# Requirements Checklist — Inventory Core (040) + +## Scope + +- [x] This checklist applies only to Spec 040 (Inventory Core). + +## Constitution Gates + +- [x] Inventory-first: inventory stores “last observed” only (no snapshot/backup side effects) +- [x] Read/write separation: no Intune write paths introduced +- [x] Single contract path to Graph: Graph access only via GraphClientInterface + contracts (if used) +- [x] Tenant isolation: all reads/writes tenant-scoped +- [x] Automation is idempotent & observable: run records + locks + stable error codes +- [x] Data minimization & safe logging: no secrets/tokens; DB run records are the source of truth + +## Functional Requirements Coverage + +- [x] FR-001 Inventory catalog exists (inventory_items) +- [x] FR-002 Stable upsert identity prevents duplicates +- [x] FR-003 Sync runs recorded (inventory_sync_runs) +- [x] FR-004 Tenant isolation enforced +- [x] FR-005 Deterministic selection_hash implemented and tested +- [x] FR-006 Sync does NOT create snapshots/backups (explicitly tested) +- [x] FR-007 Missing is derived (not persisted) relative to latestRun(tenant_id, selection_hash) +- [x] FR-008 meta_jsonb whitelist enforced; unknown keys dropped; never fails sync +- [x] FR-009 Locks/idempotency/observable failures implemented + +## Non-Functional Requirements Coverage + +- [x] NFR-001 Global + per-tenant concurrency limits (source + defaults + behavior defined) +- [x] NFR-002 Throttling resilience (429/503 backoff+jitter) with stable error codes +- [x] NFR-003 Deterministic behavior is testable (Pest) +- [x] NFR-004 Data minimization enforced (no raw payload storage) +- [x] NFR-005 Safe logging: no secrets/tokens; rely on run records not log parsing + +## Tests (Pest) + +- [x] Upsert prevents duplicates + updates last_seen fields +- [x] selection_hash determinism (order invariant) +- [x] Missing semantics per latest completed run and selection isolation +- [x] Low-confidence missing when latest run partial/failed/had_errors +- [x] meta whitelist drops unknown keys +- [x] Lock prevents overlapping runs for same tenant+selection +- [x] No snapshot/backup tables are written during sync +- [x] Error reporting uses stable error codes; no secrets/tokens persisted in run error details diff --git a/specs/040-inventory-core/plan.md b/specs/040-inventory-core/plan.md index 01099f4..5e59e6f 100644 --- a/specs/040-inventory-core/plan.md +++ b/specs/040-inventory-core/plan.md @@ -1,32 +1,126 @@ # Implementation Plan: Inventory Core (040) -**Branch**: `feat/040-inventory-core` | **Date**: 2026-01-07 | **Spec**: `specs/040-inventory-core/spec.md` +**Branch**: `spec/040-inventory-core` | **Date**: 2026-01-07 | **Spec**: `specs/040-inventory-core/spec.md` +**Scope (this step)**: Produce a clean, implementable `plan.md` + consistent `tasks.md` for Spec 040 only. ## Summary -Implement a tenant-scoped inventory catalog (“last observed”) and an observable sync run system with deterministic selection scoping. Ensure no snapshots/backups are created by sync. +Implement tenant-scoped Inventory + Sync Run tracking as the foundational substrate for later Inventory UI and higher-order features. + +Key outcomes: + +- Inventory is “last observed” (not backup), stored as metadata + whitelisted `meta_jsonb`. +- Sync runs are observable, selection-scoped via deterministic `selection_hash`. +- “Missing” is derived relative to latest completed run for the same `(tenant_id, selection_hash)`. +- Automation is safe: locks, idempotency, throttling handling, global+per-tenant concurrency limits. + +## Technical Context + +- **Language/Version**: PHP 8.4 +- **Framework**: Laravel 12 +- **Admin UI**: Filament v4 + Livewire v3 +- **Storage**: PostgreSQL (JSONB available) +- **Queue/Locks**: Laravel queue + cache/Redis locks (as configured) +- **Testing**: Pest v4 (`php artisan test`) +- **Target Platform**: Sail-first local dev; container deploy (Dokploy) ## Constitution Check -- Inventory-first, snapshots-second (sync never creates snapshots) -- Read/write separation (sync is read-only; any future writes require preview/confirmation/audit/tests) -- Single contract path to Graph (Graph access only through existing abstractions/contracts) -- Deterministic capabilities (capabilities resolver output testable) -- Tenant isolation (non-negotiable) -- Automation observable + idempotent (locks, run records, stable error codes, 429/503 handling) -- Data minimization + safe logging +- Inventory-first: inventory stores last observed state only (no snapshot/backup side effects). +- Read/write separation: this feature introduces no Intune write paths. +- Single contract path to Graph: Graph reads (if needed) go via Graph abstraction and contracts. +- Tenant isolation: all reads/writes tenant-scoped; no cross-tenant shortcuts. +- Automation: locked + idempotent + observable; handle 429/503 with backoff+jitter. +- Data minimization: no payload-heavy storage; safe logs. -## Deliverables (Phase-friendly) +No constitution violations expected. -- Data model for inventory items and sync runs -- Sync engine orchestration and locking strategy -- Deterministic selection hashing -- Capabilities resolver output snapshot tests -- Minimal Filament/CLI surface to trigger and observe sync runs (if required by tasks) +## Project Structure (Impacted Areas) -## Out of Scope +```text +specs/040-inventory-core/ +├── spec.md +├── plan.md +└── tasks.md -- Dependency graph hydration (spec 042) -- Cross-tenant promotion (spec 043) -- Drift reporting (spec 044) -- Lifecycle “deleted” semantics (feature 900) +app/ +├── Models/ +├── Jobs/ +├── Services/ +└── Support/ + +database/migrations/ +tests/Feature/ +tests/Unit/ +``` + +## Implementation Approach + +### Phase A — Data Model + Migrations + +1. Add `inventory_items` table + - Identity: unique constraint to prevent duplicates, recommended: + - `(tenant_id, policy_type, external_id)` + - Fields: `display_name`, `platform`/`category` (if applicable), `meta_jsonb`, `last_seen_at`, `last_seen_run_id`. + - Indexing: indexes supporting tenant/type listing; consider partials as needed. + +2. Add `inventory_sync_runs` table + - Identity: `tenant_id`, `selection_hash` + - Status fields: `status`, `started_at`, `finished_at`, `had_errors` + - Counters: `items_observed_count`, `items_upserted_count`, `errors_count` + - Error reporting: stable error code(s) list or summary field. + +### Phase B — Selection Hash (Deterministic) + +Implement canonicalization exactly as Spec Appendix: + +- Only include scope-affecting keys in `selection_payload`. +- Sort object keys; sort `policy_types[]` and `categories[]` arrays. +- Compute `selection_hash = sha256(canonical_json(selection_payload))`. + +### Phase C — Sync Run Lifecycle + Upsert + +- Create a service that: + - acquires a lock for `(tenant_id, selection_hash)` + - creates a run record + - enumerates selected policy types + - upserts inventory items by identity key + - updates `last_seen_at` and `last_seen_run_id` per observed item + - finalizes run status + counters + - never creates/modifies snapshot/backup records (`policy_versions`, `backup_*`) + +### Phase D — Derived “Missing” Semantics + +- Implement “missing” as a computed state relative to `latestRun(tenant_id, selection_hash)`. +- Do not persist “missing” or “deleted”. +- Mark missing as low-confidence when `latestRun.status != success` or `latestRun.had_errors = true`. + +### Phase E — Meta Whitelist + +- Define a whitelist of allowed `meta_jsonb` keys. +- Enforce by dropping unknown keys (never fail sync). + +### Phase F — Concurrency Limits + +- Enforce global concurrency (across tenants) and per-tenant concurrency. +- The implementation may be via queue worker limits, semaphore/lock strategy, or both; the behavior must be testable. +- When limits are hit, create an observable run record with `status=skipped`, `had_errors=true`, and stable error code(s). + +## Test Plan (Pest) + +Minimum required coverage aligned to Spec test cases: + +- Upsert identity prevents duplicates; `last_seen_*` updates. +- `selection_hash` determinism (array ordering invariant). +- Missing derived per latest completed run for same `(tenant_id, selection_hash)`. +- Low-confidence missing when latest run is partial/failed or had_errors. +- Meta whitelist drops unknown keys. +- Lock prevents overlapping runs per tenant+selection. +- No snapshot/backup rows are created/modified by inventory sync. +- Error reporting uses stable `error_codes` and stores no secrets/tokens. + +## Out of Scope (Explicit) + +- Any UI (covered by Spec 041) +- Any snapshot/backup creation +- Any restore/promotion/remediation write paths diff --git a/specs/040-inventory-core/spec.md b/specs/040-inventory-core/spec.md index b3de53a..301b7de 100644 --- a/specs/040-inventory-core/spec.md +++ b/specs/040-inventory-core/spec.md @@ -140,11 +140,64 @@ ### Meta Whitelist (Fail-safe) - `meta_jsonb` has a documented whitelist of allowed keys. - **AC:** Unknown `meta_jsonb` keys are dropped (not persisted) and MUST NOT cause sync to fail. +#### Initial `meta_jsonb` whitelist (v1) + +Allowed keys (all optional; if not applicable for a type, omit): + +- `odata_type`: string (copied from Graph `@odata.type`) +- `etag`: string|null (Graph etag if available; never treated as a secret) +- `scope_tag_ids`: array (IDs only; no display names required) +- `assignment_target_count`: int|null (count only; no target details) +- `warnings`: array (bounded, human-readable, no secrets) + +**AC:** Any other key is dropped silently (not persisted) and MUST NOT fail sync. + ### Observed Run - `inventory_items.last_seen_run_id` and `inventory_items.last_seen_at` are updated when an item is observed. - `last_seen_run_id` implies the selection via `sync_runs.selection_hash`; no per-item selection hash is required for core. +### Run Error Codes (taxonomy) + +Sync runs record: + +- `status`: one of `success|partial|failed|skipped` +- `had_errors`: bool (true if any non-ideal condition occurred) +- `error_codes[]`: array of stable machine-readable codes (no secrets) + +Minimal taxonomy (3–8 codes): + +- `lock_contended` (a run could not start because the per-tenant+selection lock is held) +- `concurrency_limit_global` (global concurrency limit reached; run skipped) +- `concurrency_limit_tenant` (per-tenant concurrency limit reached; run skipped) +- `graph_throttled` (429 encountered; run partial/failed depending on recovery) +- `graph_transient` (503/timeout/other transient errors) +- `graph_forbidden` (403/insufficient permission) +- `unexpected_exception` (unexpected failure; message must be safe/redacted) + +**Rule:** Run records MUST store codes (and safe, bounded context) rather than raw exception dumps or tokens. + +### Concurrency Limits (source, defaults, behavior) + +**Source:** Config (recommended keys): + +- `tenantpilot.inventory_sync.concurrency.global_max` +- `tenantpilot.inventory_sync.concurrency.per_tenant_max` + +**Defaults (if not configured):** + +- global_max = 2 +- per_tenant_max = 1 + +**Behavior when limits are hit:** + +- The system MUST create a Sync Run record with: + - `status = skipped` + - `had_errors = true` (so missing stays low-confidence for that selection) + - `error_codes[]` includes `concurrency_limit_global` or `concurrency_limit_tenant` + - `started_at`/`finished_at` set (observable) +- No inventory items are mutated in a skipped run. + ## Testing Guidance (non-implementation) These are test cases expressed in behavior terms (not code). @@ -154,6 +207,7 @@ ### Test Cases — Sync and Upsert - **TC-001**: Sync creates or updates inventory items and sets `last_seen_at`. - **TC-002**: Re-running sync for the same tenant+selection updates existing records and does not create duplicates. - **TC-003**: Inventory queries scoped to Tenant A never return Tenant B’s items. +- **TC-004**: Inventory sync does not create or modify snapshot/backup records (e.g., no new rows in `policy_versions`, `backup_sets`, `backup_items`, `backup_schedules`, `backup_schedule_runs`). ### Test Cases — Selection Hash Determinism diff --git a/specs/040-inventory-core/tasks.md b/specs/040-inventory-core/tasks.md index 20b4ed3..217c800 100644 --- a/specs/040-inventory-core/tasks.md +++ b/specs/040-inventory-core/tasks.md @@ -4,29 +4,35 @@ # Tasks: Inventory Core (040) ## P1 — MVP (US1/US2) -- [ ] T001 [US1] Define Inventory Item data model (tenant-scoped identity + last_seen fields) -- [ ] T002 [US1] Define Sync Run data model (tenant_id, selection_hash, status, timestamps, counts, stable error codes) -- [ ] T003 [US1] Implement deterministic selection hashing (canonical json + sha256) -- [ ] T004 [US1] Implement inventory upsert semantics (no duplicates) -- [ ] T005 [US1] Enforce tenant isolation in all inventory/run queries -- [ ] T006 [US2] Implement derived “missing” computation relative to latest completed run (tenant_id + selection_hash) -- [ ] T007 [US2] Ensure low-confidence missing when latestRun is partial/failed or had_errors -- [ ] T008 [US2] Implement meta_jsonb whitelist enforcement (drop unknown keys, never fail sync) +- [ ] T001 [US1] Add migrations: `inventory_items` (unique: tenant_id+policy_type+external_id; indexes; last_seen fields) +- [ ] T002 [US1] Add migrations: `inventory_sync_runs` (tenant_id, selection_hash, status, started/finished, counts, stable error codes) +- [ ] T003 [US1] Implement deterministic `selection_hash` (canonical JSON: sorted keys + sorted arrays; sha256) +- [ ] T004 [US1] Implement inventory upsert semantics (idempotent, no duplicates) +- [ ] T005 [US1] Enforce tenant isolation for all inventory + run read/write paths +- [ ] T006 [US2] Implement derived “missing” query semantics vs latest completed run for same (tenant_id, selection_hash) +- [ ] T007 [US2] Missing confidence rule: partial/failed or had_errors => low confidence +- [ ] T008 [US2] Enforce `meta_jsonb` whitelist (drop unknown keys; never fail sync) +- [ ] T009 [US1] Guardrail: inventory sync must not create snapshots/backups (no writes to `policy_versions`/`backup_*`) ## P2 — Observability & Safety (US3 + NFR) -- [ ] T009 [US3] Ensure run records include stable error codes and counts -- [ ] T010 [NFR] Add idempotency + locks to prevent overlapping runs per tenant+selection -- [ ] T011 [NFR] Add global + per-tenant concurrency limiting strategy -- [ ] T012 [NFR] Implement throttling handling strategy (backoff + jitter for transient Graph failures) +- [ ] T010 [US3] Run lifecycle: ensure run records include counts + stable error codes (visible and actionable) +- [ ] T011 [NFR] Locking/idempotency: prevent overlapping runs per (tenant_id, selection_hash) +- [ ] T012 [NFR] Concurrency: enforce global + per-tenant limits (queue/semaphore strategy) +- [ ] T013 [NFR] Throttling resilience: backoff + jitter for transient Graph failures (429/503) +- [ ] T014 [NFR] Safe logging & safe persistence: store only stable `error_codes` + bounded safe context in run records (no secrets/tokens; no log parsing required) ## Tests (Required for runtime behavior) -- [ ] T020 [US1] Tests: upsert does not create duplicates; last_seen updated -- [ ] T021 [US2] Tests: missing derived per latestRun(selection_hash); selection isolation -- [ ] T022 [US2] Tests: partial/failed run => low confidence missing -- [ ] T023 [US2] Tests: meta whitelist drops unknown keys without failing -- [ ] T024 [NFR] Tests: selection_hash determinism (array ordering) +- [ ] T020 [US1] Pest: upsert prevents duplicates; `last_seen_at` and `last_seen_run_id` update +- [ ] T021 [US2] Pest: missing derived per latest completed run for same (tenant_id, selection_hash) +- [ ] T022 [US2] Pest: selection isolation (run for selection Y does not affect selection X) +- [ ] T023 [US2] Pest: partial/failed/had_errors => missing is low confidence +- [ ] T024 [US2] Pest: meta whitelist drops unknown keys (no exception; no persistence) +- [ ] T025 [NFR] Pest: selection_hash determinism (array ordering invariant) +- [ ] T026 [NFR] Pest: lock prevents overlapping runs for same (tenant_id, selection_hash) +- [ ] T027 [NFR] Pest: run error persistence contains no secrets/tokens (assert error context is bounded; no “Bearer ” / access token patterns; prefer error_codes) +- [ ] T028 [US1] Pest: inventory sync creates no rows in `policy_versions` and `backup_*` tables (assert counts unchanged) ## Notes