diff --git a/specs/032-backup-scheduling-mvp/checklists/requirements.md b/specs/032-backup-scheduling-mvp/checklists/requirements.md index f3fe8af..db99631 100644 --- a/specs/032-backup-scheduling-mvp/checklists/requirements.md +++ b/specs/032-backup-scheduling-mvp/checklists/requirements.md @@ -7,5 +7,7 @@ # Requirements Checklist (032) - [ ] Run stores status + summary + error_code/error_message. - [ ] UI shows schedule list + run history + link to backup set. - [ ] Run now + Retry are permission-gated and write DB notifications. +- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets). +- [ ] Retry/backoff policy implemented (no retry for 401/403). - [ ] Retention keeps last N and soft-deletes older backup sets. - [ ] Tests cover due-calculation, idempotency, job success/failure, retention. diff --git a/specs/032-backup-scheduling-mvp/plan.md b/specs/032-backup-scheduling-mvp/plan.md index dfc65ae..468d288 100644 --- a/specs/032-backup-scheduling-mvp/plan.md +++ b/specs/032-backup-scheduling-mvp/plan.md @@ -20,6 +20,7 @@ ## Scheduling Mechanism 4) dispatch `RunBackupScheduleJob(schedule_id, run_id)` - Concurrency: - Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency. + - If lock is held: mark run as `skipped` with a clear error_code (no parallel execution). ## Run Execution - `RunBackupScheduleJob`: @@ -36,12 +37,21 @@ ## Run Execution - failed if nothing backed up / hard error 7) update schedule last_run_* and compute/persist next_run_at 8) dispatch retention job + 9) audit logs: + - log run start + completion (status, counts, error_code; no secrets) + +## Retry / Backoff +- Configure job retry behavior based on error classification: + - Throttling/transient (e.g. 429/503): backoff + retry + - Auth/permission (401/403): no retry + - Unknown: limited retries ## Retention - `ApplyBackupScheduleRetentionJob(schedule_id)`: - identify runs ordered newest→oldest - keep last N runs that created a backup_set_id - for older ones: soft-delete referenced BackupSets (and cascade soft-delete items) + - audit log: number of deleted BackupSets ## Filament UX - Tenant-scoped resources: @@ -49,6 +59,7 @@ ## Filament UX - Runs UI via RelationManager under schedule (or a dedicated resource if needed) - Actions: enable/disable, run now, retry - Notifications: persist via `->sendToDatabase($user)` for the DB info panel. + - MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history. ## Ops / Deployment Notes - Requires queue worker. diff --git a/specs/032-backup-scheduling-mvp/spec.md b/specs/032-backup-scheduling-mvp/spec.md index f0f93bc..37f8e8c 100644 --- a/specs/032-backup-scheduling-mvp/spec.md +++ b/specs/032-backup-scheduling-mvp/spec.md @@ -41,7 +41,7 @@ ### Functional Requirements - **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job. - **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule. - **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets). -- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. +- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode). ### UX Requirements (Filament) - **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run. @@ -50,19 +50,22 @@ ### UX Requirements (Filament) ### Security / Authorization - **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants. -- **SEC-002**: Permissions (RBAC): - - `backup_schedules.view` - - `backup_schedules.manage` - - `backup_schedules.run_now` - - `backup_schedules.runs.view` -- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens). +- **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio): + - `readonly`: Schedules ansehen + Runs ansehen + - `operator`: zusätzlich “Run now” / “Retry” + - `manager` / `owner`: zusätzlich Schedules verwalten (CRUD) +- **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets). ### Reliability / Non-Functional Requirements - **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`). - **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). -- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry. +- **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed. - **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot. +### Scheduling Semantics +- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone. +- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet. + ## Data Model ### backup_schedules diff --git a/specs/032-backup-scheduling-mvp/tasks.md b/specs/032-backup-scheduling-mvp/tasks.md index 69f733e..61cef70 100644 --- a/specs/032-backup-scheduling-mvp/tasks.md +++ b/specs/032-backup-scheduling-mvp/tasks.md @@ -20,11 +20,14 @@ ## Phase 4: Jobs - [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule). - [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets). - [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). + - [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets). + - [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403). ## Phase 5: Filament UI - [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable. - [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet. - [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB. + - [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions. ## Phase 6: Tests - [ ] T014 Unit: due-calculation + next_run_at. @@ -32,6 +35,7 @@ ## Phase 6: Tests - [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked). - [ ] T017 Job-level: token/permission/throttle errors map to error_code and status. - [ ] T018 Retention: keeps last N and deletes older backup sets. + - [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior. ## Phase 7: Verification - [ ] T019 Run targeted tests (Pest).