TenantAtlas/specs/154-finding-risk-acceptance/research.md
ahmido b1e1e06861 feat: implement finding risk acceptance lifecycle (#184)
## Summary
- add a first-class finding exception domain with request, approval, rejection, renewal, and revocation lifecycle support
- add tenant-scoped exception register, finding governance surfaces, and a canonical workspace approval queue in Filament
- add audit, badge, evidence, and review-pack integrations plus focused Pest coverage for workflow, authorization, and governance validity

## Validation
- vendor/bin/sail bin pint --dirty --format agent
- CI=1 vendor/bin/sail artisan test --compact
- manual integrated-browser smoke test for the request-exception happy path, tenant register visibility, and canonical queue visibility

## Notes
- Filament implementation remains on v5 with Livewire v4-compatible surfaces
- canonical queue lives in the admin panel; provider registration stays in bootstrap/providers.php
- finding exceptions stay out of global search in this rollout

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #184
2026-03-20 01:07:55 +00:00

61 lines
5.6 KiB
Markdown

# Research: Finding Risk Acceptance Lifecycle
## Decision 1: Use a dedicated tenant-owned exception aggregate instead of overloading `Finding.closed_reason`
**Decision**: Introduce a dedicated `FindingException` aggregate as the tenant-owned governance record for accepted risk, rather than continuing to encode risk acceptance purely as `Finding.status = risk_accepted` plus `closed_reason`.
**Rationale**: The existing finding model already exposes `risk_accepted` as a terminal status, but the handover and roadmap explicitly identify the absence of a formal exception entity as the product gap. A dedicated aggregate lets the system track request, approval, rejection, renewal, revocation, expiry, accountable owner, and linked evidence without distorting the meaning of generic finding workflow fields.
**Alternatives considered**:
- Reuse `Finding.closed_reason` and audit metadata only: rejected because it cannot represent a durable approval lifecycle or one current valid exception per finding.
- Create a generic cross-domain waiver engine immediately: rejected because the spec is intentionally bounded to finding-specific exceptions in the first rollout.
## Decision 2: Preserve history via append-only decision records under one exception root
**Decision**: Model exception history as one root `FindingException` record with append-only `FindingExceptionDecision` child records for request, approval, rejection, renewal, and revocation decisions.
**Rationale**: The product needs both a stable current-state record for efficient tenant and canonical queries and a durable history that survives renewals and revocations. A root aggregate with child decisions avoids rewriting old decisions, keeps list queries fast, and aligns with the repo's audit-first lifecycle design.
**Alternatives considered**:
- Create a new top-level exception row for every renewal: rejected because current-state lookup and canonical queue filtering become noisier and require additional dedupe logic.
- Store all history only in `AuditLog`: rejected because lifecycle state and validity queries would depend on replaying historical events instead of reading domain state.
## Decision 3: Block self-approval by default in v1
**Decision**: Normal workflow blocks the requester from approving their own exception request. No self-approval override is included in the first slice.
**Rationale**: The spec requires approval-separation rules, and a default no-self-approval rule is the clearest governance baseline. It reduces ambiguity, simplifies policy design, and matches the product's least-privilege posture.
**Alternatives considered**:
- Allow self-approval for owners or managers: rejected because it weakens the governance signal and creates policy ambiguity in the first rollout.
- Introduce a special override capability immediately: rejected because it expands RBAC and exception policy complexity before the core workflow is proven.
## Decision 4: Keep normal exception decisions outside `OperationRun`
**Decision**: Request, approval, rejection, renewal, and revocation remain synchronous DB-backed mutations without a dedicated `OperationRun` in v1.
**Rationale**: These actions are local governance decisions, expected to complete quickly, and do not perform remote work. The constitution allows DB-only security-relevant actions to skip `OperationRun` as long as they remain auditable. Using `OperationRun` here would add operational surface area without adding observability value.
**Alternatives considered**:
- Use `OperationRun` for every exception decision: rejected because it violates the repo's preference to avoid long-running infrastructure for fast DB-only mutations.
- Add a scheduled reminder/expiry job in the first slice: rejected because the first release can satisfy reminder semantics through explicit expiring-state UI and canonical queue visibility.
## Decision 5: Link supporting evidence through structured references, not copied payloads
**Decision**: Exception records store structured evidence references such as `source_type`, `source_id`, `source_fingerprint`, and a small summary snapshot, following the evidence snapshot item pattern instead of embedding raw evidence payloads.
**Rationale**: The repo already uses fingerprinted and summarized evidence references in `EvidenceSnapshotItem` and review-pack generation. Reusing that pattern keeps exception history intelligible even when live artifacts change, while preserving data minimization.
**Alternatives considered**:
- Store raw evidence JSON directly on the exception: rejected because it increases payload size and risks leaking data better handled by the evidence domain.
- Store only foreign keys to live evidence records: rejected because history becomes opaque if referenced artifacts are later expired or superseded.
## Decision 6: Risk governance validity is derived from exception state, not from finding status alone
**Decision**: A finding counts as currently valid accepted risk only when it is linked to an active, unexpired, unrevoked exception. The finding's `risk_accepted` status alone is insufficient.
**Rationale**: This closes the core audit gap identified in the handover and allows evidence and reporting consumers to distinguish governed accepted risk from stale or unsupported states. The existing `FindingWorkflowService` remains the single mutation path for the finding status, but validity becomes a cross-record rule.
**Alternatives considered**:
- Treat `Finding.status = risk_accepted` as sufficient forever: rejected because it preserves the current governance gap.
- Automatically revert finding status when an exception expires: rejected because it mutates the finding lifecycle as a side effect and obscures historical operator intent.