TenantAtlas/specs/099-alerts-v1-teams-email/plan.md
ahmido 3ed275cef3 feat(alerts): Monitoring cluster + v1 resources (spec 099) (#121)
Implements spec `099-alerts-v1-teams-email`.

- Monitoring navigation: Alerts as a cluster under Monitoring; default landing is Alert deliveries.
- Tenant panel: Alerts points to `/admin/alerts` and the cluster navigation is hidden in tenant panel.
- Guard compliance: removes direct `Gate::` usage from Alert resources so `NoAdHocFilamentAuthPatternsTest` passes.

Verification:
- Full suite: `1348 passed, 7 skipped` (EXIT=0).

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #121
2026-02-18 15:20:43 +00:00

218 lines
8.7 KiB
Markdown

# Implementation Plan: 099 — Alerts v1 (Teams + Email)
**Branch**: `099-alerts-v1-teams-email` | **Date**: 2026-02-16 | **Spec**: `/specs/099-alerts-v1-teams-email/spec.md`
**Input**: Feature specification from `/specs/099-alerts-v1-teams-email/spec.md`
## Summary
Implement workspace-scoped alerting with:
- **Destinations (Targets)**: Microsoft Teams incoming webhook and Email recipients.
- **Rules**: route by event type, minimum severity, and tenant scope.
- **Noise controls**: deterministic fingerprint dedupe, per-rule cooldown suppression, and quiet-hours deferral.
- **Delivery history**: read-only, includes `suppressed` entries.
Delivery is queue-driven with bounded exponential backoff retries. All alert pages remain DB-only at render time and never expose destination secrets.
## Technical Context
**Language/Version**: PHP 8.4 (Laravel 12)
**Primary Dependencies**: Filament v5 (Livewire v4.0+), Laravel Queue (database default)
**Storage**: PostgreSQL (Sail)
**Testing**: Pest v4 via `vendor/bin/sail artisan test --compact`
**Target Platform**: Laravel web app (Filament Admin)
**Project Type**: Web application
**Performance Goals**: Eligible alerts delivered within ~2 minutes outside quiet hours (SC-002)
**Constraints**:
- DB-only rendering for Targets/Rules/Deliveries pages (FR-015)
- No destination secrets in logs/audit payloads (FR-011)
- Retries use exponential backoff + bounded max attempts (FR-017)
**Scale/Scope**: Workspace-owned configuration + tenant-owned delivery history (90-day retention) (FR-016)
## Constitution Check
*GATE: Must pass before Phase 0 research. Re-check after Phase 1 design.*
- **Livewire/Filament**: Filament v5 implies Livewire v4.0+ (compliant).
- **Provider registration**: No new panel provider required; existing registration remains in `bootstrap/providers.php`.
- **RBAC semantics**: Enforce non-member → 404 (deny-as-not-found) and member missing capability → 403.
- **Capability registry**: Add `ALERTS_VIEW` and `ALERTS_MANAGE` to canonical registry; role maps reference only registry constants.
- **Destructive actions**: Deletes and other destructive-like actions use `->requiresConfirmation()` and execute via `->action(...)`.
- **Run observability**: Scheduled/queued scanning + deliveries create/reuse `OperationRun` for Monitoring → Operations visibility.
- **Safe logging**: Audit logging uses `WorkspaceAuditLogger` (sanitizes context) and never records webhook URLs / recipient lists.
- **Global search**: No new global search surfaces are required for v1; if enabled later, resources must have Edit/View pages and remain workspace-safe.
Result: **PASS**, assuming the above constraints are implemented and covered by tests.
## Project Structure
### Documentation (this feature)
```text
specs/099-alerts-v1-teams-email/
├── plan.md
├── research.md
├── data-model.md
├── quickstart.md
├── contracts/
└── tasks.md # created later by /speckit.tasks
```
### Source Code (repository root)
```text
app/
├── Filament/
│ ├── Pages/
│ └── Resources/
├── Jobs/
├── Models/
├── Policies/
├── Services/
│ ├── Audit/
│ ├── Auth/
│ └── Settings/
└── Support/
├── Auth/
└── Rbac/
database/
└── migrations/
tests/
├── Feature/
└── Unit/
```
**Structure Decision**: Use standard Laravel + Filament discovery conventions. Add Eloquent models + migrations for workspace-owned alert configuration + tenant-owned alert deliveries, queue jobs for evaluation + delivery, and Filament Resources/Pages under the existing Admin panel.
## Phase 0 — Outline & Research (output: research.md)
Unknowns / decisions to lock:
- Teams delivery should use Laravel HTTP client (`Http::post()`) with timeouts and safe error capture.
- Email delivery should use Laravel mail/notifications and be queued.
- Quiet-hours timezone fallback: rule timezone if set; else workspace timezone; if no workspace timezone exists yet, fallback to `config('app.timezone')`.
- Secrets storage: use encrypted casts (`encrypted` / `encrypted:array`) for webhook URLs and recipient lists.
- Retries/backoff: use job `tries` and `backoff()` for exponential backoff with a max attempt cap.
## Phase 1 — Design & Contracts (outputs: data-model.md, contracts/*, quickstart.md)
### Data model
Workspace-owned entities:
- `AlertDestination`:
- `workspace_id`
- `name`
- `type` (`teams_webhook` | `email`)
- `is_enabled`
- `config` (encrypted array; contains webhook URL or recipient list)
- `AlertRule`:
- `workspace_id`
- `name`
- `is_enabled`
- `event_type` (high_drift | compare_failed | sla_due)
- `minimum_severity`
- `tenant_scope_mode` (all | allowlist)
- `tenant_allowlist` (array of tenant IDs)
- `cooldown_seconds`
- `quiet_hours_enabled`, `quiet_hours_start`, `quiet_hours_end`, `quiet_hours_timezone`
- `AlertRuleDestination` (pivot): `workspace_id`, `alert_rule_id`, `alert_destination_id`
- `AlertDelivery` (history):
- `workspace_id`
- `tenant_id`
- `alert_rule_id`, `alert_destination_id`
- `fingerprint_hash`
- `status` (queued | deferred | sent | failed | suppressed | canceled)
- `send_after` (for quiet-hours deferral)
- `attempt_count`, `last_error_code`, `last_error_message` (sanitized)
- timestamps
Retention: prune deliveries older than 90 days (default).
### Contracts
Create explicit schema/contracts for:
- Alert rule/destination create/edit payloads (validation expectations)
- Delivery record shape (what UI displays)
- Domain event shapes used for fingerprinting (no secrets)
### Filament surfaces
- **Targets**: CRUD destinations. Confirm on delete. Never display secrets once saved.
- **Rules**: CRUD rules, enable/disable. Confirm destructive actions.
- **Deliveries**: read-only viewer.
RBAC enforcement:
- Page access: `ALERTS_VIEW`.
- Mutations: `ALERTS_MANAGE`.
- Non-member: deny-as-not-found (404) consistently.
- Non-member: deny-as-not-found (404) consistently.
- Deliveries are tenant-owned and MUST only be listed/viewable for tenants the actor is entitled to; non-entitled tenants are filtered and treated as not found (404 semantics).
- If a tenant-context is active in the current session, the Deliveries view SHOULD default-filter to that tenant.
### Background processing (jobs + OperationRuns)
- `alerts.evaluate` run: scans for new triggering events and creates `AlertDelivery` rows (including `suppressed`).
- `alerts.deliver` run: sends due deliveries (respecting `send_after`).
Trigger sources (repo-grounded):
- **High Drift**: derived from persisted drift findings (`Finding` records) with severity High/Critical where the finding is in `status=new` (unacknowledged). “Newly active/visible” means the finding first appears (a new `Finding` row is created), not that the same existing finding is re-alerted on every evaluation cycle.
- **Compare Failed**: derived from failed drift-generation operations (`OperationRun` where `type = drift_generate_findings` and `outcome = failed`).
- **SLA Due**: v1 implements this trigger as a safe no-op unless/until the underlying data model provides a due-date signal.
Scheduling convention:
- A scheduled console command (`tenantpilot:alerts:dispatch`) runs every minute (registered in `routes/console.php`) and dispatches the evaluate + deliver work idempotently.
Idempotency:
- Deterministic fingerprint; unique constraints where appropriate.
- Delivery send job transitions statuses atomically; if already terminal (`sent`/`failed`/`canceled`), it no-ops.
### Audit logging
All destination/rule mutations log via `WorkspaceAuditLogger` with redacted metadata:
- Record IDs, names, types, enabled flags, rule criteria.
- Never include webhook URLs or recipient lists.
## Phase 2 — Task Planning (outline; tasks.md comes next)
1) Capabilities & policies
- Add `ALERTS_VIEW` / `ALERTS_MANAGE` to `App\Support\Auth\Capabilities`.
- Update `WorkspaceRoleCapabilityMap`.
- Add Policies for new models and enforce 404/403 semantics.
2) Migrations + models
- Create migrations + Eloquent models for destinations/rules/pivot/deliveries.
- Add encrypted casts and safe `$hidden` where appropriate.
3) Services
- Fingerprint builder
- Quiet hours evaluator
- Dispatcher to create deliveries and enqueue send jobs
4) Jobs
- Evaluate triggers job
- Send delivery job with exponential backoff + max attempts
5) Filament UI
- Implement Targets/Rules/Deliveries pages with action surfaces and confirmation.
6) Tests (Pest)
- RBAC: 404 for non-members; 403 for members missing capability.
- Cooldown/dedupe: persists `suppressed` delivery history.
- Retry policy: transitions to `failed` after bounded attempts.
## Complexity Tracking
No constitution violations are required for this feature.