4.1 KiB
4.1 KiB
Research — Secret Redaction Hardening & Snapshot Data Integrity (Spec 120)
This document records the design choices for the reduced Spec 120 scope after removing the pre-go-live legacy-data remediation workflow.
Decisions
1) Central classification authority
- Decision: Introduce one shared secret-classification service that evaluates protected fields by exact field name plus canonical path, and reuse it across snapshot protection, audit sanitization, verification sanitization, and ops failure sanitization.
- Rationale: The current codebase had multiple substring-based sanitizers. Spec 120 requires one authority so safe configuration fields like
passwordMinimumLengthremain visible while true secrets stay protected.
2) Canonical protected-path format
- Decision: Represent protected locations as source-bucketed RFC 6901 JSON Pointers, stored under
secret_fingerprintsbuckets:snapshot,assignments, andscope_tags. - Rationale: JSON Pointer is deterministic, array-safe, and avoids ambiguity between object keys and numeric list indexes.
3) Single ownership of persisted snapshot protection
- Decision: Make
VersionService::captureVersion()the sole write-time owner of protected snapshot generation. - Rationale:
VersionServiceis the finalPolicyVersionpersistence boundary. Removing duplicate masking fromPolicyCaptureOrchestratoreliminates double-redaction and ensures dedupe/version creation decisions use the same protected result.
4) Protected snapshot persistence contract
- Decision: Persist protected values as
[REDACTED], store the ruleset marker inpolicy_versions.redaction_version, and store path-keyed HMAC digests inpolicy_versions.secret_fingerprints. - Rationale: The placeholder preserves JSON shape for downstream consumers, while dedicated columns keep the change signal and contract version out of generic metadata.
5) Fingerprint derivation strategy
- Decision: Use HMAC-SHA256 with a signing key derived from the app key and the stable
workspace_id, then hash the tuple(source_bucket, json_pointer, normalized_secret_value). - Rationale: This satisfies the workspace-isolation requirement while keeping fingerprints deterministic inside one workspace and non-correlatable across workspaces.
6) Fingerprinting scope and version identity
- Decision: Apply the protected contract consistently to all persisted protected payload buckets:
snapshot,assignments, andscope_tags. Version identity must incorporate both the visible protected payload and the fingerprint map so secret-only changes create a newPolicyVersion. - Rationale: If dedupe ignores
secret_fingerprints, secret-only changes still collapse into one version and FR-120-007 fails.
7) Output readability and integrity messaging
- Decision: Protected-value messaging remains text-first on existing viewers and export surfaces. The product explains that protected values were intentionally hidden, but it does not ship a dedicated historical-data remediation workflow.
- Rationale: Production starts with fresh compliant data, so the feature only needs to explain current protected behavior, not historical repair.
8) Regression strategy
- Decision: Replace substring-match regression expectations with a corpus-based test matrix covering safe fields, true secrets, secret-only version changes, audit/verification readability, and notification/export behavior.
- Rationale: The existing suite proved the old broken behavior. Phase 1 needs tests that lock in exact/path-based classification and block new broad substring redactors.
Repo Facts Used
PolicySnapshotRedactorpreviously used broad regex patterns and was invoked both inPolicyCaptureOrchestratorandVersionService.AuditContextSanitizer,VerificationReportSanitizer, andRunFailureSanitizerall contained substring-based protection logic.policy_versionsalready stores immutable snapshot evidence consumed by drift, compare, and restore flows.- Pre-go-live data is disposable for this product rollout, so no supported legacy-data remediation workflow is required in this feature.