Implements spec 111 (Findings workflow + SLA) and fixes Workspace findings SLA settings UX/validation. Key changes: - Findings workflow service + SLA policy and alerting. - Workspace settings: allow partial SLA overrides without auto-filling unset severities in the UI; effective values still resolve via defaults. - New migrations, jobs, command, UI/resource updates, and comprehensive test coverage. Tests: - `vendor/bin/sail artisan test --compact` (1779 passed, 8 skipped). Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #135
22 KiB
Feature Specification: Findings Workflow V2 + SLA
Feature Branch: 111-findings-workflow-sla
Created: 2026-02-24
Status: Draft
Depends On: specs/104-provider-permission-posture/spec.md, specs/105-entra-admin-roles-evidence-findings/spec.md, specs/109-review-pack-export/spec.md
Input: Standardize the Findings lifecycle (workflow, ownership, recurrence, SLA due dates, and alerting) so findings management is enterprise-usable and not “noise”.
Clarifications
Session 2026-02-24
- Q: What should happen when the same finding is detected again, but its current status is terminal? → A: Auto-reopen only from
resolved;closedandrisk_acceptedremain terminal (still update seen tracking fields). - Q: When backfilling legacy open findings, how should the initial due date be set? → A: Compute from the backfill operation time (backfill time + SLA days).
- Q: When SLA due alerts fire, what should a single alert event represent? → A: At most one event per tenant per alert-evaluation window, emitted only when newly-overdue open findings exist; the event summarizes current overdue counts.
- Q: Which statuses should count as “Open” for the default Findings list and for SLA due evaluation? → A: Open =
new,triaged,in_progress,reopened. - Q: From which statuses should a user be able to manually “Reopen” a finding (into
reopenedstatus)? → A: Allow manual reopen fromresolved,closed, andrisk_accepted. - Q: Where is the SLA policy configured, and what scope does it apply to? → A: Workspace-scoped setting (
findings.sla_days) in Workspace Settings; applies to all tenants in the workspace. - Q: How is the “alert-evaluation window” defined for SLA due gating? → A: Use the Alerts evaluation window start time (previous completed
alerts.evaluateOperationRuncompleted_at; fallback to initial lookback). “Newly overdue” meansdue_atin(window_start, now]for open findings. - Q: What must an
sla_dueevent contain? → A: One event per tenant per evaluation window;metadataincludesoverdue_totalandoverdue_by_severity(critical/high/medium/low) for currently overdue open findings; fingerprint is stable per tenant+window. - Q: If severity changes while a finding remains open, should
due_atbe recalculated? → A: No —due_atis set on create and reset only on reopen/backfill. - Q: If a user resolves a finding while a detection run is processing, how is consistency maintained? → A: Detection updates may still advance seen counters, but automatic reopen MUST occur only when the observation time is after
resolved_at.
Spec Scope Fields (mandatory)
- Scope: tenant (Findings management) + workspace (SLA policy + Alert rules configuration)
- Primary Routes:
- Tenant-context: Findings list + view (
/admin/t/{tenant}/...) - Workspace-context Monitoring: Alert rules list + edit (
/admin/...) - Workspace-context Settings: Workspace Settings (Findings SLA policy) (
/admin/...)
- Tenant-context: Findings list + view (
- Data Ownership:
- Tenant-owned: Findings and their lifecycle metadata
- Workspace-owned: SLA policy settings (
findings.sla_days) - Workspace-owned: Alert rules configuration (event types)
- RBAC:
- Findings view + workflow actions are tenant-context capability-gated
- Workspace Settings + Alert rules remain workspace capability/policy-gated (existing behavior)
Canonical-view fields not applicable — this spec updates tenant-context Findings and workspace-scoped Alert Rules.
User Scenarios & Testing (mandatory)
User Story 1 - See Open Findings (Priority: P1)
As a tenant operator, I can open the Findings page and immediately see the current open findings across all finding types, so I don’t miss non-drift issues and can focus on what needs attention now.
Why this priority: If open findings are hidden by default filters or type assumptions, findings become unreliable as an operational surface.
Independent Test: Seed a tenant with findings across multiple types and statuses, then verify the default list shows open workflow statuses across all types without adjusting filters.
Acceptance Scenarios:
- Given a tenant has findings of types drift, permission posture, and Entra admin roles, When I open the Findings list, Then I can see open findings from all types without changing any filters.
- Given a tenant has a mix of open and terminal findings, When I open the Findings list, Then the default list shows only open workflow statuses.
- Given a tenant has overdue findings, When I use the “Overdue” quick filter, Then only findings past their due date are shown.
- Given a tenant has open findings, When I view the list, Then I can see each finding’s status, severity, due date, and assignee (when set).
User Story 2 - Triage, Assign, And Resolve (Priority: P1)
As a tenant manager, I can triage findings, assign ownership, and move findings through a consistent workflow (including reasons and auditability), so the team can reliably manage remediation.
Why this priority: Without a consistent workflow and ownership, findings degrade into noisy, un-actioned rows with unclear accountability.
Independent Test: Create an open finding, execute each allowed status transition, and verify transitions are enforced server-side, recorded with timestamps/actors, and audited.
Acceptance Scenarios:
- Given a finding in
new(orreopened) status, When I triage it, Then the status becomestriagedand the triage timestamp is recorded. - Given a finding in
triagedstatus, When I start progress, Then the status becomesin_progressand the progress timestamp is recorded. - Given a finding in an open status, When I assign an assignee (and optional owner), Then those fields are saved and displayed on the finding.
- Given a finding in an open status, When I resolve it with a resolution reason, Then it becomes
resolvedand the resolution reason is persisted. - Given a finding in any status, When I close it with a close reason, Then it becomes
closedand the close reason is persisted. - Given a finding in any status, When I mark it as risk accepted with a reason, Then it becomes
risk_acceptedand the reason is persisted. - Given a user without the relevant capability, When they attempt any workflow mutation, Then the server denies it (403 for members lacking capability; 404 for non-members / not entitled).
User Story 3 - SLA Due Visibility And Alerts (Priority: P1)
As a workspace operator, I can configure alerting for findings that are past their due date (SLA due), so overdue findings reliably escalate beyond the Findings page.
Why this priority: An SLA without alerting becomes “best effort” and is easy to ignore in busy operations.
Independent Test: Create newly-overdue open findings for a tenant, run alert evaluation, and verify a single tenant-level SLA due event is produced and can match an enabled alert rule.
Acceptance Scenarios:
- Given a tenant has one or more newly-overdue open findings since the previous evaluation window, When alert evaluation runs, Then exactly one SLA due event is produced for that tenant and can trigger an enabled alert rule.
- Given a tenant has no overdue open findings (including when only terminal findings have past due dates), When alert evaluation runs, Then no SLA due event is produced for that tenant.
- Given I edit an alert rule, When I choose the event type, Then “SLA due” is available as a selectable event type.
- Given a tenant has overdue open findings but no newly-overdue open findings since the previous evaluation window, When alert evaluation runs, Then no additional SLA due event is produced for that tenant.
- Given an SLA due event is produced, When I inspect the event payload, Then it includes overdue counts total and by severity.
User Story 4 - Recurrence Reopens (Priority: P2)
As a tenant operator, when a previously resolved finding reappears in later detection runs, it reopens the original finding (instead of creating a new duplicate), so recurrence is visible and manageable.
Why this priority: Recurrence is operationally important, and duplicate rows create confusion and reporting noise.
Independent Test: Simulate a finding being resolved and then being detected again, verifying it transitions to reopened, counters update, and due date resets.
Acceptance Scenarios:
- Given a finding was
resolved, When it is detected again, Then the same finding transitions toreopenedand records a reopened timestamp. - Given a finding is detected in successive runs, When it appears again, Then the last-seen timestamp updates and the seen counter increases.
- Given a drift finding is no longer detected in the latest run, When stale detection is evaluated, Then the drift finding is auto-resolved with reason “no longer detected”.
- Given a finding is
closedorrisk_accepted, When it is detected again, Then it remains terminal and only its seen tracking fields update.
User Story 5 - Bulk Manage Findings (Priority: P3)
As a tenant manager, I can triage/assign/resolve/close findings in bulk, so I can manage high volumes efficiently while preserving auditability and safety.
Why this priority: Bulk workflow reduces operational load, but can ship after the single-record workflow is correct.
Independent Test: Select multiple findings and run each bulk action, verifying that all selected findings update consistently and each change is audited.
Acceptance Scenarios:
- Given I select multiple open findings, When I bulk triage them, Then all selected findings become
triaged. - Given I select multiple open findings, When I bulk assign an assignee, Then all selected findings are assigned.
- Given I select multiple open findings, When I bulk resolve them with a reason, Then all selected findings become
resolvedand record the reason. - Given I select multiple open findings, When I bulk close them with a reason, Then all selected findings become
closedand record the close reason. - Given I select multiple open findings, When I bulk risk accept them with a reason, Then all selected findings become
risk_acceptedand record the reason. - Given more than 100 open findings match my current filters, When I run “Triage all matching”, Then the action requires typed confirmation, updates all matching findings safely, and audits each change.
User Story 6 - Backfill Existing Findings (Priority: P2)
As a tenant operator, I can run a one-time backfill/consolidation operation to upgrade existing findings into the v2 workflow model, so older data is usable (due dates, counters, recurrence) without manual cleanup.
Why this priority: Without backfill, existing tenants keep legacy/incomplete findings and the new workflow appears inconsistent or broken.
Independent Test: Seed legacy findings (missing lifecycle fields, acknowledged status, drift duplicates), run the backfill operation, and verify fields are populated, statuses are mapped, and duplicates are consolidated.
Acceptance Scenarios:
- Given legacy open findings exist without due dates or lifecycle timestamps, When I run the backfill operation, Then open findings receive due dates set to the backfill operation time plus the SLA days for their severity, and lifecycle metadata is populated.
- Given legacy findings in
acknowledgedstatus exist, When I run the backfill operation, Then they appear astriagedin the v2 workflow surface. - Given duplicate drift findings exist for the same recurring issue, When I run the backfill operation, Then duplicates are consolidated so only one canonical open finding remains.
Edge Cases
- Legacy findings exist without lifecycle timestamps or due dates (backfill required).
- A previously assigned/owned user is no longer a tenant member (retain historical assignment, but prevent selecting non-members for new assignments).
- A finding’s severity changes while it remains open (assumption on due date recalculation documented below).
- An SLA due alert rule exists from earlier versions (should begin working once the producer exists; no data loss).
- Concurrent actions: a user resolves a finding while a detection run marks it seen again (system remains consistent and auditable).
Requirements (mandatory)
Governance And Safety Requirements
- This feature introduces no new external API calls.
- All user-initiated workflow mutations (triage/assign/resolve/close/risk accept/reopen) MUST be audited with actor, tenant, action, target, before/after, and timestamp.
- Audit before/after MUST be limited to workflow/assignment metadata (e.g.,
status,severity,due_at,assignee_id,owner_id,triaged_at,in_progress_at,resolved_at,closed_at,resolution_reason,close_reason,risk_accepted_reason) and MUST NOT include raw evidence payloads or secrets/tokens.
- Audit before/after MUST be limited to workflow/assignment metadata (e.g.,
- The lifecycle backfill/consolidation operation MUST be observable as an operation with:
- clear start feedback (accepted/queued),
- progress visibility while running, and
- a single terminal outcome notification for the initiator.
- Authorization MUST be enforced server-side for every mutation with deny-as-not-found semantics:
- non-members or users not entitled to the tenant scope → 404
- members missing capability → 403
- Destructive-like actions (resolve/close/risk accept) MUST require explicit confirmation.
- Findings status badge semantics MUST remain centralized and cover every allowed status.
Functional Requirements
- FR-001: System MUST support a Findings lifecycle with statuses:
new,triaged,in_progress,reopened,resolved,closed,risk_accepted. - FR-002: System MUST enforce allowed status transitions server-side:
new|reopened→triagedtriaged→in_progressnew|reopened|triaged|in_progress→resolved(resolution reason required)resolved|closed|risk_accepted→reopened(manual allowed; requires confirmation; automatic only when detected again fromresolved)*→closed(close reason required)*→risk_accepted(reason required)
- FR-003: Each finding MUST track lifecycle metadata: owner, assignee, first-seen time, last-seen time, seen count, and (when open) an SLA due date.
- FR-004: The system MUST assign an SLA due date to open findings using a configurable severity-based policy with defaults:
- critical: 3 days
- high: 7 days
- medium: 14 days
- low: 30 days
- FR-005: When a finding reopens (automatic or manual), the system MUST reset the SLA due date based on the current severity-based SLA policy.
- FR-006: SLA due alerting MUST exist:
- “SLA due” MUST be available as an alert rule event type (
sla_due). - The SLA due producer MUST use the same alert-evaluation window start time (
window_start) used by Alerts evaluation (previous completedalerts.evaluateOperationRuncompleted_at; fallback to initial lookback). - “Newly overdue” means: an open finding with
due_atin(window_start, now]. - The system MUST emit exactly one SLA due event per tenant per alert-evaluation window when that tenant has one or more newly-overdue open findings since
window_start. - Each SLA due event MUST summarize current overdue open findings for the tenant and include:
overdue_total(count)overdue_by_severity(critical,high,medium,low)
- A tenant with persistently overdue open findings MUST NOT emit repeated SLA due events on every evaluation run unless additional findings become newly overdue.
- Terminal statuses (
resolved,closed,risk_accepted) MUST NOT contribute to the overdue counts. - Open workflow statuses are
new,triaged,in_progress,reopened. - The event’s
fingerprint_keyMUST be stable per tenant + alert-evaluation window for idempotency.
- “SLA due” MUST be available as an alert rule event type (
- FR-007: The system MUST track recurrence:
- When a previously
resolvedfinding is detected again, it MUST transition toreopened(not create a duplicate open finding for the same recurring issue). - When a
closedorrisk_acceptedfinding is detected again, it MUST NOT change status automatically; it only updates seen tracking fields. - Each detection run where the finding is observed MUST update last-seen time and increment seen count.
- Concurrency safety: automatic reopen MUST occur only when the observation time is after the finding’s
resolved_at.
- When a previously
- FR-008: Drift findings MUST avoid “new row per re-drift” noise by using a stable recurrence identity so recurring drift reopens the canonical finding.
- FR-009: Drift findings MUST auto-resolve when they are no longer detected in the latest run, with a consistent resolved reason (e.g., “no longer detected”).
- FR-010: Findings list defaults MUST be safe and visible:
- Default list shows open statuses (
new,triaged,in_progress,reopened) across all finding types (no drift-only default). - Quick filters exist for: Open, Overdue, High severity, My assigned.
- Default list shows open statuses (
- FR-011: Findings UI MUST provide safe workflow actions:
- Single-record actions: triage, start progress, assign (assignee and optional owner), resolve (reason required), close (reason required), risk accept (reason required), reopen (where allowed).
- Bulk actions: bulk triage, bulk assign, bulk resolve, bulk close, bulk risk accept.
- FR-012: The system MUST introduce tenant-context capabilities for Findings management:
TENANT_FINDINGS_VIEWTENANT_FINDINGS_TRIAGETENANT_FINDINGS_ASSIGNTENANT_FINDINGS_RESOLVETENANT_FINDINGS_CLOSETENANT_FINDINGS_RISK_ACCEPT
- FR-013: Assignment/ownership selection MUST be limited to users who are currently tenant members, while preserving historical assignment/ownership values for already-assigned findings.
- FR-014: Legacy compatibility MUST be maintained:
- Existing
acknowledgedstatus MUST be treated astriagedin the v2 workflow surface. - Existing
TENANT_FINDINGS_ACKNOWLEDGEcapability MUST act as a deprecated alias for v2 triage permission.
- Existing
- FR-015: A backfill/consolidation operation MUST exist to migrate existing findings to the v2 lifecycle model, including:
- mapping
acknowledged→triaged - populating lifecycle timestamps and seen counters for existing data
- setting due dates for legacy open findings based on the backfill operation time (backfill time + SLA days)
- consolidating duplicates where recurrence identity indicates the same recurring finding (canonical record retained; duplicates marked terminal with a consistent reason, e.g.
consolidated_duplicate)
- mapping
- FR-016: Severity changes while a finding remains open MUST NOT retroactively change
due_at.due_atis assigned on create and reset only on reopen/backfill. - FR-017: Review pack generation MUST treat “open findings” using the v2 open-status set (not drift-only defaults) to keep existing exports/review packs consistent.
UI Action Matrix (mandatory when Filament is changed)
Action Surface Contract: Satisfied for Findings and Alert Rules (explicit exemptions noted).
| Surface | Location | Header Actions | Inspect Affordance (List/Table) | Row Actions (max 2 visible) | Bulk Actions (grouped) | Empty-State CTA(s) | View Header Actions | Create/Edit Save+Cancel | Audit log? | Notes / Exemptions |
|---|---|---|---|---|---|---|---|---|---|---|
| Findings Resource | Admin UI: Findings | Optional: “Triage all matching” (capability-gated) | View action | View, More | Bulk triage, bulk assign, bulk resolve, bulk close, bulk risk accept (under More) | None | Triage, Start progress, Assign, Resolve, Close, Risk accept, Reopen (where allowed) | N/A | Yes | Empty-state exemption: findings are system-generated; no create CTA |
| Alert Rules Resource | Monitoring UI: Alert rules | Create (capability/policy-gated) | Clickable row | Edit, More | None (exempt) | Create alert rule | N/A (edit surface) | Save/Cancel | Yes | “SLA due” event type is available once the producer exists |
Key Entities (include if feature involves data)
- Finding: Represents a detected issue for a tenant, including type, severity, lifecycle status, recurrence behavior, and lifecycle metadata (ownership, due date, seen tracking).
- SLA policy: Severity-based due-date expectations applied to open findings, with configurable defaults.
- Alert rule: Workspace-defined routing rules that can trigger delivery when an SLA due event occurs.
- Audit entry: Immutable record of user-initiated workflow changes for accountability and compliance.
Success Criteria (mandatory)
Measurable Outcomes
- SC-001: 100% of open findings have a computed due date (SLA) at creation and after any reopen event.
- SC-002: Recurring findings reopen instead of creating duplicate open rows for the same recurring issue.
- SC-003: The default Findings list shows open findings across all finding types without requiring users to remove type-specific filters.
- SC-004: SLA due alerting is functional: tenants with newly-overdue open findings since the previous evaluation window can trigger alert rules and produce at most one SLA due event per tenant per evaluation window; terminal findings never contribute to SLA due alerts.
- SC-005: Authorization behavior is correct and non-enumerable: non-members receive 404; members missing capability receive 403.
- SC-006: Admins can triage/assign/resolve/close findings in bulk for at least 100 findings in a single action without needing per-record manual updates.
Assumptions
risk_acceptedis a workflow status only in v2 (no expiry model in this feature).- SLA due dates are set on create and on reopen. Severity changes while a finding remains open do not retroactively change the existing due date unless the finding is reopened.
- Backfill sets due dates for legacy open findings from the backfill operation time (backfill time + SLA days) to avoid an immediate “overdue” surge on rollout.
- Assignment/ownership pickers show only current tenant members, but historical assignments remain visible for audit/history even if membership is later removed.
- Existing alert rules with
event_type = sla_dueare preserved and should become effective once the SLA due producer is implemented (no destructive data migration of workspace-owned alert rules).