Implements Spec 115 (Baseline Operability & Alert Integration). Key changes - Baseline compare: safe auto-close of stale baseline findings (gated on successful/complete compares) - Baseline alerts: `baseline_high_drift` + `baseline_compare_failed` with dedupe/cooldown semantics - Workspace settings: baseline severity mapping + minimum severity threshold + auto-close toggle - Baseline Compare UX: shared stats layer + landing/widget consistency Notes - Livewire v4 / Filament v5 compatible. - Destructive-like actions require confirmation (no new destructive actions added here). Tests - `vendor/bin/sail artisan test --compact tests/Feature/Baselines/ tests/Feature/Alerts/` Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de> Reviewed-on: #140
62 lines
4.4 KiB
Markdown
62 lines
4.4 KiB
Markdown
# Research — Baseline Operability & Alert Integration (Spec 115)
|
||
|
||
This document resolves planning unknowns and records implementation decisions.
|
||
|
||
## Decisions
|
||
|
||
### 1) Completeness counters for safe auto-close
|
||
- Decision: Treat compare “completeness counters” as `OperationRun.summary_counts.total`, `processed`, and `failed`.
|
||
- Rationale: Ops-UX contracts already standardize these keys via `OperationSummaryKeys::all()`; they’re the metrics the UI understands for determinate progress.
|
||
- Alternatives considered:
|
||
- Add new keys like `total_count` / `processed_count` / `failed_item_count` → rejected because it would require expanding `OperationSummaryKeys::all()` and updating Ops-UX guard tests without a strong benefit.
|
||
|
||
### 2) Where auto-close runs
|
||
- Decision: Perform auto-close at the end of `CompareBaselineToTenantJob` (after findings upsert), using the run’s computed “seen” fingerprint set.
|
||
- Rationale: The job already has the full drift result set for the tenant+profile; it’s the only place that can reliably know what was evaluated.
|
||
- Alternatives considered:
|
||
- Separate queued job for auto-close → rejected (extra run coordination and more complex observability for no benefit).
|
||
|
||
### 3) Baseline finding lifecycle semantics (new vs reopened vs existing open)
|
||
- Decision: Mirror the existing drift lifecycle behavior (as implemented in `DriftFindingGenerator`):
|
||
- New fingerprint → `status = new`.
|
||
- Previously terminal fingerprint (at least `resolved`) observed again → `status = reopened` and set `reopened_at`.
|
||
- Existing open finding → do not overwrite workflow status (avoid resetting `triaged`/`in_progress`).
|
||
- Rationale: This preserves operator workflow state and enables “alert only on new/reopened” logic.
|
||
- Alternatives considered:
|
||
- Always set `status = new` on every compare (current behavior) → rejected because it can overwrite workflow state.
|
||
|
||
### 4) Alert deduplication key for baseline drift
|
||
- Decision: Set `fingerprint_key` to a stable string derived from the finding fingerprint (e.g. `finding_fingerprint:{fingerprint}`) for baseline drift events.
|
||
- Rationale: Alert delivery dedupe uses `fingerprint_key` (or `idempotency_key`) via `AlertFingerprintService`.
|
||
- Alternatives considered:
|
||
- Use `finding:{id}` → rejected because it ties dedupe to a DB surrogate rather than the domain fingerprint.
|
||
|
||
### 5) Baseline-specific event types
|
||
- Decision: Add two new alert event types and produce them in `EvaluateAlertsJob`:
|
||
- `baseline_high_drift`: for baseline compare findings (`source = baseline.compare`) that are `new`/`reopened` in the evaluation window and meet severity threshold.
|
||
- `baseline_compare_failed`: for `OperationRun.type = baseline_compare` with `outcome in {failed, partially_succeeded}` in the evaluation window.
|
||
- Rationale: The spec requires strict separation from generic drift alerts and precise triggering rules.
|
||
- Alternatives considered:
|
||
- Reuse `high_drift` / `compare_failed` → rejected because it would mix baseline and non-baseline meaning.
|
||
|
||
### 6) Cooldown behavior for baseline_compare_failed
|
||
- Decision: Reuse the existing per-rule cooldown + quiet-hours suppression implemented in `AlertDispatchService` (no baseline-specific cooldown setting).
|
||
- Rationale: Matches spec clarification and existing patterns.
|
||
|
||
### 7) Workspace settings implementation approach
|
||
- Decision: Implement baseline settings using the existing `SettingsRegistry`/`SettingsResolver`/`SettingsWriter` system with new keys under a new `baseline` domain:
|
||
- `baseline.severity_mapping` (json map with restricted keys)
|
||
- `baseline.alert_min_severity` (string)
|
||
- `baseline.auto_close_enabled` (bool)
|
||
- Rationale: This matches existing settings infrastructure and ensures consistent “effective value” semantics.
|
||
|
||
### 8) Information architecture (IA) and planes
|
||
- Decision: Keep baseline profile CRUD as workspace-owned (non-tenant scoped) and baseline compare monitoring as tenant-context only.
|
||
- Rationale: Matches SCOPE-001 and spec FR-018.
|
||
|
||
## Notes / Repo Facts Used
|
||
- Ops-UX allowed summary keys are defined in `App\Support\OpsUx\OperationSummaryKeys`.
|
||
- Drift lifecycle patterns exist in `App\Services\Drift\DriftFindingGenerator` (reopen + resolve stale).
|
||
- Alert dispatch dedupe/cooldown/quiet-hours are centralized in `App\Services\Alerts\AlertDispatchService` and `AlertFingerprintService`.
|
||
- Workspace settings are handled by `App\Support\Settings\SettingsRegistry` + `SettingsResolver` + `SettingsWriter`.
|