4.4 KiB
4.4 KiB
Research — Baseline Operability & Alert Integration (Spec 115)
This document resolves planning unknowns and records implementation decisions.
Decisions
1) Completeness counters for safe auto-close
- Decision: Treat compare “completeness counters” as
OperationRun.summary_counts.total,processed, andfailed. - Rationale: Ops-UX contracts already standardize these keys via
OperationSummaryKeys::all(); they’re the metrics the UI understands for determinate progress. - Alternatives considered:
- Add new keys like
total_count/processed_count/failed_item_count→ rejected because it would require expandingOperationSummaryKeys::all()and updating Ops-UX guard tests without a strong benefit.
- Add new keys like
2) Where auto-close runs
- Decision: Perform auto-close at the end of
CompareBaselineToTenantJob(after findings upsert), using the run’s computed “seen” fingerprint set. - Rationale: The job already has the full drift result set for the tenant+profile; it’s the only place that can reliably know what was evaluated.
- Alternatives considered:
- Separate queued job for auto-close → rejected (extra run coordination and more complex observability for no benefit).
3) Baseline finding lifecycle semantics (new vs reopened vs existing open)
- Decision: Mirror the existing drift lifecycle behavior (as implemented in
DriftFindingGenerator):- New fingerprint →
status = new. - Previously terminal fingerprint (at least
resolved) observed again →status = reopenedand setreopened_at. - Existing open finding → do not overwrite workflow status (avoid resetting
triaged/in_progress).
- New fingerprint →
- Rationale: This preserves operator workflow state and enables “alert only on new/reopened” logic.
- Alternatives considered:
- Always set
status = newon every compare (current behavior) → rejected because it can overwrite workflow state.
- Always set
4) Alert deduplication key for baseline drift
- Decision: Set
fingerprint_keyto a stable string derived from the finding fingerprint (e.g.finding_fingerprint:{fingerprint}) for baseline drift events. - Rationale: Alert delivery dedupe uses
fingerprint_key(oridempotency_key) viaAlertFingerprintService. - Alternatives considered:
- Use
finding:{id}→ rejected because it ties dedupe to a DB surrogate rather than the domain fingerprint.
- Use
5) Baseline-specific event types
- Decision: Add two new alert event types and produce them in
EvaluateAlertsJob:baseline_high_drift: for baseline compare findings (source = baseline.compare) that arenew/reopenedin the evaluation window and meet severity threshold.baseline_compare_failed: forOperationRun.type = baseline_comparewithoutcome in {failed, partially_succeeded}in the evaluation window.
- Rationale: The spec requires strict separation from generic drift alerts and precise triggering rules.
- Alternatives considered:
- Reuse
high_drift/compare_failed→ rejected because it would mix baseline and non-baseline meaning.
- Reuse
6) Cooldown behavior for baseline_compare_failed
- Decision: Reuse the existing per-rule cooldown + quiet-hours suppression implemented in
AlertDispatchService(no baseline-specific cooldown setting). - Rationale: Matches spec clarification and existing patterns.
7) Workspace settings implementation approach
- Decision: Implement baseline settings using the existing
SettingsRegistry/SettingsResolver/SettingsWritersystem with new keys under a newbaselinedomain:baseline.severity_mapping(json map with restricted keys)baseline.alert_min_severity(string)baseline.auto_close_enabled(bool)
- Rationale: This matches existing settings infrastructure and ensures consistent “effective value” semantics.
8) Information architecture (IA) and planes
- Decision: Keep baseline profile CRUD as workspace-owned (non-tenant scoped) and baseline compare monitoring as tenant-context only.
- Rationale: Matches SCOPE-001 and spec FR-018.
Notes / Repo Facts Used
- Ops-UX allowed summary keys are defined in
App\Support\OpsUx\OperationSummaryKeys. - Drift lifecycle patterns exist in
App\Services\Drift\DriftFindingGenerator(reopen + resolve stale). - Alert dispatch dedupe/cooldown/quiet-hours are centralized in
App\Services\Alerts\AlertDispatchServiceandAlertFingerprintService. - Workspace settings are handled by
App\Support\Settings\SettingsRegistry+SettingsResolver+SettingsWriter.