TenantAtlas/specs/044-drift-mvp/spec.md
2026-01-13 23:28:02 +01:00

8.7 KiB
Raw Blame History

Feature Specification: Drift MVP

Feature Branch: feat/044-drift-mvp
Created: 2026-01-07
Status: Draft

Purpose

Detect and report drift between expected and observed states using inventory and run metadata.

This MVP focuses on reporting and triage, not automatic remediation.

Clarifications

Session 2026-01-12

  • Q: How should Drift pick the baseline run for a given tenant + scope? → A: Baseline = previous successful inventory run for the same scope; compare against the latest successful run.
  • Q: Should Drift findings be persisted or computed on demand? → A: Persist findings in DB per comparison (baseline_run_id + current_run_id), including a deterministic fingerprint for stable identity + triage.
  • Q: How define the fingerprint (Stable ID) for a drift finding? → A: sha256(tenant_id + scope_key + subject_type + subject_external_id + change_type + baseline_hash + current_hash) (normalized; excludes volatile fields).
  • Q: Which inventory entities/types are in scope for Drift MVP? → A: Policies + Assignments.
  • Q: When should drift findings be generated? → A: On-demand when opening Drift: if findings for (baseline,current,scope) dont exist yet, dispatch an async job to generate them.

Session 2026-01-13

  • Q: What should Drift do if there are fewer than two successful inventory runs for the same scope_key? → A: Show a blocked/empty state (“Need at least 2 successful runs for this scope to calculate drift”) and do not dispatch drift generation.
  • Q: Should acknowledgement carry forward across comparisons? → A: No; acknowledgement is per comparison (baseline_run_id + current_run_id + scope_key). The same drift may re-appear as new in later comparisons.
  • Q: Which change_type values are supported in Drift MVP? → A: added, removed, modified (assignment target/intent changes are covered under modified).
  • Q: What is the default UI behavior for new vs acknowledged findings? → A: Default UI shows only new; acknowledged is accessible via an explicit filter.
  • Q: What should the UI do if drift generation fails for a comparison? → A: Show an explicit error state (safe message + reference/run ids) and do not show findings for that comparison until a successful generation exists.

Pinned Decisions (MVP defaults)

  • Drift is implemented as a generator that writes persisted Finding rows (not only an in-memory/on-demand diff).
  • Baseline selection: baseline = previous successful inventory run for the same scope_key; comparison = latest successful inventory run for the same scope_key.
  • Scope is first-class via scope_key and must be deterministic to support future pinned baselines and compare workflows.
  • Fingerprints are deterministic and stable for triage/audit workflows.
  • Drift MVP only uses finding_type=drift and status in {new, acknowledged}.
  • Default severity: medium (until a rule engine exists).
  • UI must not perform render-time Graph calls. Graph access (if any) is limited to background sync/jobs.

Key Entities / Generic Findings (Future-proof)

Finding (generic)

We want Drift MVP to remain MVP-sized, while making it easy to add future generators (Security Suite Audits, Cross-tenant Compare) without inventing a new model.

Rationale:

  • Drift = delta engine over runs.

  • Audit = rule engine over inventory.

  • Both write Findings with the same semantics: deterministic fingerprint + triage + minimized evidence.

  • finding_type (enum): drift (MVP), later audit, compare

  • tenant_id

  • scope_key (string): deterministic scope identifier (see Scope Definition / FR1)

  • baseline_run_id (nullable; e.g. audit/compare)

  • current_run_id (nullable; e.g. audit)

  • fingerprint (string): deterministic; unique per tenant+scope+subject+change

  • subject_type (string): e.g. policy type (or other inventory entity type)

  • subject_external_id (string): Graph external id

  • severity (enum): low / medium / high (MVP default: medium)

  • status (enum): new / acknowledged (later: snoozed / assigned / commented)

  • acknowledged_at (nullable)

  • acknowledged_by_user_id (nullable)

  • evidence_jsonb (jsonb): sanitized, small, secrets-free (no raw payload dumps)

  • Optional/nullable for later (prepared; out of MVP): rule_id, control_id, expected_value, source

MVP implementation scope: only finding_type=drift, statuses new/acknowledged, and no rule engine.

User Scenarios & Testing

Scenario 1: View drift summary

  • Given inventory sync has run at least twice

  • When the admin opens Drift

  • Then they see a summary of changes since the last baseline

  • If there are fewer than two successful runs for the same scope_key, Drift shows a blocked/empty state and does not start drift generation.

Scenario 2: Drill into a drift finding

  • Given a drift finding exists
  • When the admin opens the finding
  • Then they see what changed, when, and which run observed it

Scenario 3: Acknowledge/triage

  • Given a drift finding exists

  • When the admin marks it acknowledged

  • Then it is hidden from “new” lists but remains auditable

  • Acknowledgement is per comparison; later comparisons may still surface the same drift as new.

Functional Requirements

  • FR1: Baseline + scope

    • Define scope_key as the deterministic Inventory selection identifier.
      • MVP definition: scope_key = InventorySyncRun.selection_hash.
      • Rationale: selection hashing already normalizes equivalent selections; reusing it keeps drift scope stable and consistent across the product.
    • Baseline run (MVP) = previous successful inventory run for the same scope_key.
    • Comparison run (MVP) = latest successful inventory run for the same scope_key.
  • FR2: Finding generation (Drift MVP)

    • Findings are persisted per (baseline_run_id, current_run_id, scope_key).
    • Findings cover adds, removals, and changes for supported entities (Policies + Assignments).
    • MVP change_type values: added, removed, modified.
    • Findings are deterministic: same baseline/current + scope_key ⇒ same set of fingerprints.
    • If fewer than two successful inventory runs exist for a given scope_key, Drift does not generate findings and must surface a clear blocked/empty state in the UI.
  • FR2a: Fingerprint definition (MVP)

    • Fingerprint = sha256(tenant_id + scope_key + subject_type + subject_external_id + change_type + baseline_hash + current_hash).
    • baseline_hash / current_hash are hashes over normalized, sanitized comparison data (exclude volatile fields like timestamps).
    • Goal: stable identity for triage + audit compatibility.
  • FR2b: Drift MVP scope includes Policies and their Assignments.

    • Assignment drift includes target changes (e.g., groupId) and intent changes.
  • FR3: Provide Drift UI with summary and details.

    • Default lists and the Drift landing summary show only status=new by default.
    • The UI must provide a filter to include acknowledged findings.
    • If drift generation fails for a comparison, the UI must surface an explicit error state (no secrets), including reference identifiers (e.g., run ids), and must not fall back to stale/previous results.
  • FR4: Triage (MVP)

    • Admin can acknowledge a finding; record acknowledged_by_user_id + acknowledged_at.
    • Acknowledgement does not carry forward across comparisons in the MVP.
    • Findings are never deleted in the MVP.

Non-Functional Requirements

  • NFR1: Drift generation must be deterministic for the same baseline and scope.
  • NFR2: Drift must remain tenant-scoped and safe to display.
  • NFR3: Evidence minimization
    • evidence_jsonb must be sanitized (no tokens/secrets) and kept small.
    • MVP drift evidence should include only:
      • change_type
      • changed_fields / metadata summary (counts, field list)
      • run refs (baseline_run_id/current_run_id, timestamps)
    • No raw payload dumps.

Dependencies / Name Resolution

  • Drift/Audit UI should resolve labels via Inventory + Foundations (047) + Groups Cache (051) where applicable.
  • No render-time Graph calls (Graph only in background sync/jobs, never in UI render).

Success Criteria

  • SC1: Admins can identify drift across supported types (Policies + Assignments) in under 3 minutes.
  • SC2: Drift results are consistent across repeated generation for the same baseline.

Out of Scope

  • Automatic revert/promotion.
  • Rule engine in MVP (Audit later), but the data model is prepared via rule_id / control_id / expected_value.

Future Work (non-MVP)

  • Security Suite Audits: add rule-based generators that write Findings (no new Finding model).
  • Cross-tenant Compare: may write Findings (finding_type=compare) or emit a compatible format that can be stored as Findings.
  • Program: specs/039-inventory-program/spec.md
  • Core: specs/040-inventory-core/spec.md
  • Compare: specs/043-cross-tenant-compare-and-promotion/spec.md