TenantAtlas/specs/044-drift-mvp/spec.md
2026-01-13 23:28:02 +01:00

161 lines
8.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Feature Specification: Drift MVP
**Feature Branch**: `feat/044-drift-mvp`
**Created**: 2026-01-07
**Status**: Draft
## Purpose
Detect and report drift between expected and observed states using inventory and run metadata.
This MVP focuses on reporting and triage, not automatic remediation.
## Clarifications
### Session 2026-01-12
- Q: How should Drift pick the baseline run for a given tenant + scope? → A: Baseline = previous successful inventory run for the same scope; compare against the latest successful run.
- Q: Should Drift findings be persisted or computed on demand? → A: Persist findings in DB per comparison (baseline_run_id + current_run_id), including a deterministic fingerprint for stable identity + triage.
- Q: How define the fingerprint (Stable ID) for a drift finding? → A: `sha256(tenant_id + scope_key + subject_type + subject_external_id + change_type + baseline_hash + current_hash)` (normalized; excludes volatile fields).
- Q: Which inventory entities/types are in scope for Drift MVP? → A: Policies + Assignments.
- Q: When should drift findings be generated? → A: On-demand when opening Drift: if findings for (baseline,current,scope) dont exist yet, dispatch an async job to generate them.
### Session 2026-01-13
- Q: What should Drift do if there are fewer than two successful inventory runs for the same `scope_key`? → A: Show a blocked/empty state (“Need at least 2 successful runs for this scope to calculate drift”) and do not dispatch drift generation.
- Q: Should acknowledgement carry forward across comparisons? → A: No; acknowledgement is per comparison (`baseline_run_id` + `current_run_id` + `scope_key`). The same drift may re-appear as `new` in later comparisons.
- Q: Which `change_type` values are supported in Drift MVP? → A: `added`, `removed`, `modified` (assignment target/intent changes are covered under `modified`).
- Q: What is the default UI behavior for `new` vs `acknowledged` findings? → A: Default UI shows only `new`; `acknowledged` is accessible via an explicit filter.
- Q: What should the UI do if drift generation fails for a comparison? → A: Show an explicit error state (safe message + reference/run ids) and do not show findings for that comparison until a successful generation exists.
## Pinned Decisions (MVP defaults)
- Drift is implemented as a generator that writes persisted Finding rows (not only an in-memory/on-demand diff).
- Baseline selection: baseline = previous successful inventory run for the same scope_key; comparison = latest successful inventory run for the same scope_key.
- Scope is first-class via `scope_key` and must be deterministic to support future pinned baselines and compare workflows.
- Fingerprints are deterministic and stable for triage/audit workflows.
- Drift MVP only uses `finding_type=drift` and `status` in {`new`, `acknowledged`}.
- Default severity: `medium` (until a rule engine exists).
- UI must not perform render-time Graph calls. Graph access (if any) is limited to background sync/jobs.
## Key Entities / Generic Findings (Future-proof)
### Finding (generic)
We want Drift MVP to remain MVP-sized, while making it easy to add future generators (Security Suite Audits, Cross-tenant Compare) without inventing a new model.
Rationale:
- Drift = delta engine over runs.
- Audit = rule engine over inventory.
- Both write Findings with the same semantics: deterministic fingerprint + triage + minimized evidence.
- `finding_type` (enum): `drift` (MVP), later `audit`, `compare`
- `tenant_id`
- `scope_key` (string): deterministic scope identifier (see Scope Definition / FR1)
- `baseline_run_id` (nullable; e.g. audit/compare)
- `current_run_id` (nullable; e.g. audit)
- `fingerprint` (string): deterministic; unique per tenant+scope+subject+change
- `subject_type` (string): e.g. policy type (or other inventory entity type)
- `subject_external_id` (string): Graph external id
- `severity` (enum): `low` / `medium` / `high` (MVP default: `medium`)
- `status` (enum): `new` / `acknowledged` (later: `snoozed` / `assigned` / `commented`)
- `acknowledged_at` (nullable)
- `acknowledged_by_user_id` (nullable)
- `evidence_jsonb` (jsonb): sanitized, small, secrets-free (no raw payload dumps)
- Optional/nullable for later (prepared; out of MVP): `rule_id`, `control_id`, `expected_value`, `source`
MVP implementation scope: only `finding_type=drift`, statuses `new/acknowledged`, and no rule engine.
## User Scenarios & Testing
### Scenario 1: View drift summary
- Given inventory sync has run at least twice
- When the admin opens Drift
- Then they see a summary of changes since the last baseline
- If there are fewer than two successful runs for the same `scope_key`, Drift shows a blocked/empty state and does not start drift generation.
### Scenario 2: Drill into a drift finding
- Given a drift finding exists
- When the admin opens the finding
- Then they see what changed, when, and which run observed it
### Scenario 3: Acknowledge/triage
- Given a drift finding exists
- When the admin marks it acknowledged
- Then it is hidden from “new” lists but remains auditable
- Acknowledgement is per comparison; later comparisons may still surface the same drift as `new`.
## Functional Requirements
- FR1: Baseline + scope
- Define `scope_key` as the deterministic Inventory selection identifier.
- MVP definition: `scope_key = InventorySyncRun.selection_hash`.
- Rationale: selection hashing already normalizes equivalent selections; reusing it keeps drift scope stable and consistent across the product.
- Baseline run (MVP) = previous successful inventory run for the same `scope_key`.
- Comparison run (MVP) = latest successful inventory run for the same `scope_key`.
- FR2: Finding generation (Drift MVP)
- Findings are persisted per (`baseline_run_id`, `current_run_id`, `scope_key`).
- Findings cover adds, removals, and changes for supported entities (Policies + Assignments).
- MVP `change_type` values: `added`, `removed`, `modified`.
- Findings are deterministic: same baseline/current + scope_key ⇒ same set of fingerprints.
- If fewer than two successful inventory runs exist for a given `scope_key`, Drift does not generate findings and must surface a clear blocked/empty state in the UI.
- FR2a: Fingerprint definition (MVP)
- Fingerprint = `sha256(tenant_id + scope_key + subject_type + subject_external_id + change_type + baseline_hash + current_hash)`.
- `baseline_hash` / `current_hash` are hashes over normalized, sanitized comparison data (exclude volatile fields like timestamps).
- Goal: stable identity for triage + audit compatibility.
- FR2b: Drift MVP scope includes Policies and their Assignments.
- Assignment drift includes target changes (e.g., groupId) and intent changes.
- FR3: Provide Drift UI with summary and details.
- Default lists and the Drift landing summary show only `status=new` by default.
- The UI must provide a filter to include `acknowledged` findings.
- If drift generation fails for a comparison, the UI must surface an explicit error state (no secrets), including reference identifiers (e.g., run ids), and must not fall back to stale/previous results.
- FR4: Triage (MVP)
- Admin can acknowledge a finding; record `acknowledged_by_user_id` + `acknowledged_at`.
- Acknowledgement does not carry forward across comparisons in the MVP.
- Findings are never deleted in the MVP.
## Non-Functional Requirements
- NFR1: Drift generation must be deterministic for the same baseline and scope.
- NFR2: Drift must remain tenant-scoped and safe to display.
- NFR3: Evidence minimization
- `evidence_jsonb` must be sanitized (no tokens/secrets) and kept small.
- MVP drift evidence should include only:
- `change_type`
- changed_fields / metadata summary (counts, field list)
- run refs (baseline_run_id/current_run_id, timestamps)
- No raw payload dumps.
## Dependencies / Name Resolution
- Drift/Audit UI should resolve labels via Inventory + Foundations (047) + Groups Cache (051) where applicable.
- No render-time Graph calls (Graph only in background sync/jobs, never in UI render).
## Success Criteria
- SC1: Admins can identify drift across supported types (Policies + Assignments) in under 3 minutes.
- SC2: Drift results are consistent across repeated generation for the same baseline.
## Out of Scope
- Automatic revert/promotion.
- Rule engine in MVP (Audit later), but the data model is prepared via `rule_id` / `control_id` / `expected_value`.
## Future Work (non-MVP)
- Security Suite Audits: add rule-based generators that write Findings (no new Finding model).
- Cross-tenant Compare: may write Findings (`finding_type=compare`) or emit a compatible format that can be stored as Findings.
## Related Specs
- Program: `specs/039-inventory-program/spec.md`
- Core: `specs/040-inventory-core/spec.md`
- Compare: `specs/043-cross-tenant-compare-and-promotion/spec.md`