TenantAtlas/specs/115-baseline-operability-alerts/research.md

# Research — Baseline Operability & Alert Integration (Spec 115)

This document resolves planning unknowns and records implementation decisions.

## Decisions

### 1) Completeness counters for safe auto-close
- Decision: Treat compare “completeness counters” as `OperationRun.summary_counts.total`, `processed`, and `failed`.
- Rationale: Ops-UX contracts already standardize these keys via `OperationSummaryKeys::all()`; they’re the metrics the UI understands for determinate progress.
- Alternatives considered:
  - Add new keys like `total_count` / `processed_count` / `failed_item_count` → rejected because it would require expanding `OperationSummaryKeys::all()` and updating Ops-UX guard tests without a strong benefit.

### 2) Where auto-close runs
- Decision: Perform auto-close at the end of `CompareBaselineToTenantJob` (after findings upsert), using the run’s computed “seen” fingerprint set.
- Rationale: The job already has the full drift result set for the tenant+profile; it’s the only place that can reliably know what was evaluated.
- Alternatives considered:
  - Separate queued job for auto-close → rejected (extra run coordination and more complex observability for no benefit).

### 3) Baseline finding lifecycle semantics (new vs reopened vs existing open)
- Decision: Mirror the existing drift lifecycle behavior (as implemented in `DriftFindingGenerator`):
  - New fingerprint → `status = new`.
  - Previously terminal fingerprint (at least `resolved`) observed again → `status = reopened` and set `reopened_at`.
  - Existing open finding → do not overwrite workflow status (avoid resetting `triaged`/`in_progress`).
- Rationale: This preserves operator workflow state and enables “alert only on new/reopened” logic.
- Alternatives considered:
  - Always set `status = new` on every compare (current behavior) → rejected because it can overwrite workflow state.

### 4) Alert deduplication key for baseline drift
- Decision: Set `fingerprint_key` to a stable string derived from the finding fingerprint (e.g. `finding_fingerprint:{fingerprint}`) for baseline drift events.
- Rationale: Alert delivery dedupe uses `fingerprint_key` (or `idempotency_key`) via `AlertFingerprintService`.
- Alternatives considered:
  - Use `finding:{id}` → rejected because it ties dedupe to a DB surrogate rather than the domain fingerprint.

### 5) Baseline-specific event types
- Decision: Add two new alert event types and produce them in `EvaluateAlertsJob`:
  - `baseline_high_drift`: for baseline compare findings (`source = baseline.compare`) that are `new`/`reopened` in the evaluation window and meet severity threshold.
  - `baseline_compare_failed`: for `OperationRun.type = baseline_compare` with `outcome in {failed, partially_succeeded}` in the evaluation window.
- Rationale: The spec requires strict separation from generic drift alerts and precise triggering rules.
- Alternatives considered:
  - Reuse `high_drift` / `compare_failed` → rejected because it would mix baseline and non-baseline meaning.

### 6) Cooldown behavior for baseline_compare_failed
- Decision: Reuse the existing per-rule cooldown + quiet-hours suppression implemented in `AlertDispatchService` (no baseline-specific cooldown setting).
- Rationale: Matches spec clarification and existing patterns.

### 7) Workspace settings implementation approach
- Decision: Implement baseline settings using the existing `SettingsRegistry`/`SettingsResolver`/`SettingsWriter` system with new keys under a new `baseline` domain:
  - `baseline.severity_mapping` (json map with restricted keys)
  - `baseline.alert_min_severity` (string)
  - `baseline.auto_close_enabled` (bool)
- Rationale: This matches existing settings infrastructure and ensures consistent “effective value” semantics.

### 8) Information architecture (IA) and planes
- Decision: Keep baseline profile CRUD as workspace-owned (non-tenant scoped) and baseline compare monitoring as tenant-context only.
- Rationale: Matches SCOPE-001 and spec FR-018.

## Notes / Repo Facts Used
- Ops-UX allowed summary keys are defined in `App\Support\OpsUx\OperationSummaryKeys`.
- Drift lifecycle patterns exist in `App\Services\Drift\DriftFindingGenerator` (reopen + resolve stale).
- Alert dispatch dedupe/cooldown/quiet-hours are centralized in `App\Services\Alerts\AlertDispatchService` and `AlertFingerprintService`.
- Workspace settings are handled by `App\Support\Settings\SettingsRegistry` + `SettingsResolver` + `SettingsWriter`.