TenantAtlas/specs/154-finding-risk-acceptance/research.md

# Research: Finding Risk Acceptance Lifecycle

## Decision 1: Use a dedicated tenant-owned exception aggregate instead of overloading `Finding.closed_reason`

**Decision**: Introduce a dedicated `FindingException` aggregate as the tenant-owned governance record for accepted risk, rather than continuing to encode risk acceptance purely as `Finding.status = risk_accepted` plus `closed_reason`.

**Rationale**: The existing finding model already exposes `risk_accepted` as a terminal status, but the handover and roadmap explicitly identify the absence of a formal exception entity as the product gap. A dedicated aggregate lets the system track request, approval, rejection, renewal, revocation, expiry, accountable owner, and linked evidence without distorting the meaning of generic finding workflow fields.

**Alternatives considered**:
- Reuse `Finding.closed_reason` and audit metadata only: rejected because it cannot represent a durable approval lifecycle or one current valid exception per finding.
- Create a generic cross-domain waiver engine immediately: rejected because the spec is intentionally bounded to finding-specific exceptions in the first rollout.

## Decision 2: Preserve history via append-only decision records under one exception root

**Decision**: Model exception history as one root `FindingException` record with append-only `FindingExceptionDecision` child records for request, approval, rejection, renewal, and revocation decisions.

**Rationale**: The product needs both a stable current-state record for efficient tenant and canonical queries and a durable history that survives renewals and revocations. A root aggregate with child decisions avoids rewriting old decisions, keeps list queries fast, and aligns with the repo's audit-first lifecycle design.

**Alternatives considered**:
- Create a new top-level exception row for every renewal: rejected because current-state lookup and canonical queue filtering become noisier and require additional dedupe logic.
- Store all history only in `AuditLog`: rejected because lifecycle state and validity queries would depend on replaying historical events instead of reading domain state.

## Decision 3: Block self-approval by default in v1

**Decision**: Normal workflow blocks the requester from approving their own exception request. No self-approval override is included in the first slice.

**Rationale**: The spec requires approval-separation rules, and a default no-self-approval rule is the clearest governance baseline. It reduces ambiguity, simplifies policy design, and matches the product's least-privilege posture.

**Alternatives considered**:
- Allow self-approval for owners or managers: rejected because it weakens the governance signal and creates policy ambiguity in the first rollout.
- Introduce a special override capability immediately: rejected because it expands RBAC and exception policy complexity before the core workflow is proven.

## Decision 4: Keep normal exception decisions outside `OperationRun`

**Decision**: Request, approval, rejection, renewal, and revocation remain synchronous DB-backed mutations without a dedicated `OperationRun` in v1.

**Rationale**: These actions are local governance decisions, expected to complete quickly, and do not perform remote work. The constitution allows DB-only security-relevant actions to skip `OperationRun` as long as they remain auditable. Using `OperationRun` here would add operational surface area without adding observability value.

**Alternatives considered**:
- Use `OperationRun` for every exception decision: rejected because it violates the repo's preference to avoid long-running infrastructure for fast DB-only mutations.
- Add a scheduled reminder/expiry job in the first slice: rejected because the first release can satisfy reminder semantics through explicit expiring-state UI and canonical queue visibility.

## Decision 5: Link supporting evidence through structured references, not copied payloads

**Decision**: Exception records store structured evidence references such as `source_type`, `source_id`, `source_fingerprint`, and a small summary snapshot, following the evidence snapshot item pattern instead of embedding raw evidence payloads.

**Rationale**: The repo already uses fingerprinted and summarized evidence references in `EvidenceSnapshotItem` and review-pack generation. Reusing that pattern keeps exception history intelligible even when live artifacts change, while preserving data minimization.

**Alternatives considered**:
- Store raw evidence JSON directly on the exception: rejected because it increases payload size and risks leaking data better handled by the evidence domain.
- Store only foreign keys to live evidence records: rejected because history becomes opaque if referenced artifacts are later expired or superseded.

## Decision 6: Risk governance validity is derived from exception state, not from finding status alone

**Decision**: A finding counts as currently valid accepted risk only when it is linked to an active, unexpired, unrevoked exception. The finding's `risk_accepted` status alone is insufficient.

**Rationale**: This closes the core audit gap identified in the handover and allows evidence and reporting consumers to distinguish governed accepted risk from stale or unsupported states. The existing `FindingWorkflowService` remains the single mutation path for the finding status, but validity becomes a cross-record rule.

**Alternatives considered**:
- Treat `Finding.status = risk_accepted` as sufficient forever: rejected because it preserves the current governance gap.
- Automatically revert finding status when an exception expires: rejected because it mutates the finding lifecycle as a side effect and obscures historical operator intent.