spec: clarify audit logs, auth, retries (032)

This commit is contained in:
Ahmed Darrazi 2026-01-05 00:50:23 +01:00
parent fa8e15f4c2
commit b05a60e392
4 changed files with 28 additions and 8 deletions

View File

@ -7,5 +7,7 @@ # Requirements Checklist (032)
- [ ] Run stores status + summary + error_code/error_message. - [ ] Run stores status + summary + error_code/error_message.
- [ ] UI shows schedule list + run history + link to backup set. - [ ] UI shows schedule list + run history + link to backup set.
- [ ] Run now + Retry are permission-gated and write DB notifications. - [ ] Run now + Retry are permission-gated and write DB notifications.
- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets).
- [ ] Retry/backoff policy implemented (no retry for 401/403).
- [ ] Retention keeps last N and soft-deletes older backup sets. - [ ] Retention keeps last N and soft-deletes older backup sets.
- [ ] Tests cover due-calculation, idempotency, job success/failure, retention. - [ ] Tests cover due-calculation, idempotency, job success/failure, retention.

View File

@ -20,6 +20,7 @@ ## Scheduling Mechanism
4) dispatch `RunBackupScheduleJob(schedule_id, run_id)` 4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
- Concurrency: - Concurrency:
- Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency. - Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
- If lock is held: mark run as `skipped` with a clear error_code (no parallel execution).
## Run Execution ## Run Execution
- `RunBackupScheduleJob`: - `RunBackupScheduleJob`:
@ -36,12 +37,21 @@ ## Run Execution
- failed if nothing backed up / hard error - failed if nothing backed up / hard error
7) update schedule last_run_* and compute/persist next_run_at 7) update schedule last_run_* and compute/persist next_run_at
8) dispatch retention job 8) dispatch retention job
9) audit logs:
- log run start + completion (status, counts, error_code; no secrets)
## Retry / Backoff
- Configure job retry behavior based on error classification:
- Throttling/transient (e.g. 429/503): backoff + retry
- Auth/permission (401/403): no retry
- Unknown: limited retries
## Retention ## Retention
- `ApplyBackupScheduleRetentionJob(schedule_id)`: - `ApplyBackupScheduleRetentionJob(schedule_id)`:
- identify runs ordered newest→oldest - identify runs ordered newest→oldest
- keep last N runs that created a backup_set_id - keep last N runs that created a backup_set_id
- for older ones: soft-delete referenced BackupSets (and cascade soft-delete items) - for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
- audit log: number of deleted BackupSets
## Filament UX ## Filament UX
- Tenant-scoped resources: - Tenant-scoped resources:
@ -49,6 +59,7 @@ ## Filament UX
- Runs UI via RelationManager under schedule (or a dedicated resource if needed) - Runs UI via RelationManager under schedule (or a dedicated resource if needed)
- Actions: enable/disable, run now, retry - Actions: enable/disable, run now, retry
- Notifications: persist via `->sendToDatabase($user)` for the DB info panel. - Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
- MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.
## Ops / Deployment Notes ## Ops / Deployment Notes
- Requires queue worker. - Requires queue worker.

View File

@ -41,7 +41,7 @@ ### Functional Requirements
- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job. - **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule. - **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets). - **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. - **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode).
### UX Requirements (Filament) ### UX Requirements (Filament)
- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run. - **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
@ -50,19 +50,22 @@ ### UX Requirements (Filament)
### Security / Authorization ### Security / Authorization
- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants. - **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
- **SEC-002**: Permissions (RBAC): - **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio):
- `backup_schedules.view` - `readonly`: Schedules ansehen + Runs ansehen
- `backup_schedules.manage` - `operator`: zusätzlich “Run now” / “Retry”
- `backup_schedules.run_now` - `manager` / `owner`: zusätzlich Schedules verwalten (CRUD)
- `backup_schedules.runs.view` - **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets).
- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens).
### Reliability / Non-Functional Requirements ### Reliability / Non-Functional Requirements
- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`). - **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). - **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry. - **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed.
- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot. - **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.
### Scheduling Semantics
- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone.
- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet.
## Data Model ## Data Model
### backup_schedules ### backup_schedules

View File

@ -20,11 +20,14 @@ ## Phase 4: Jobs
- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule). - [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets). - [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). - [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets).
- [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403).
## Phase 5: Filament UI ## Phase 5: Filament UI
- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable. - [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet. - [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB. - [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
- [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions.
## Phase 6: Tests ## Phase 6: Tests
- [ ] T014 Unit: due-calculation + next_run_at. - [ ] T014 Unit: due-calculation + next_run_at.
@ -32,6 +35,7 @@ ## Phase 6: Tests
- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked). - [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status. - [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
- [ ] T018 Retention: keeps last N and deletes older backup sets. - [ ] T018 Retention: keeps last N and deletes older backup sets.
- [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior.
## Phase 7: Verification ## Phase 7: Verification
- [ ] T019 Run targeted tests (Pest). - [ ] T019 Run targeted tests (Pest).