2026-01-07 01:12:13 +00:00
4 changed files with 28 additions and 8 deletions
--- a/specs/032-backup-scheduling-mvp/checklists/requirements.md
+++ b/specs/032-backup-scheduling-mvp/checklists/requirements.md
@ -7,5 +7,7 @@ # Requirements Checklist (032)
 - [ ] Run stores status + summary + error_code/error_message.
 - [ ] UI shows schedule list + run history + link to backup set.
 - [ ] Run now + Retry are permission-gated and write DB notifications.
+- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets).
+- [ ] Retry/backoff policy implemented (no retry for 401/403).
 - [ ] Retention keeps last N and soft-deletes older backup sets.
 - [ ] Tests cover due-calculation, idempotency, job success/failure, retention.
--- a/specs/032-backup-scheduling-mvp/plan.md
+++ b/specs/032-backup-scheduling-mvp/plan.md
@ -20,6 +20,7 @@ ## Scheduling Mechanism
  4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
 - Concurrency:
  - Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
+  - If lock is held: mark run as `skipped` with a clear error_code (no parallel execution).

 ## Run Execution
 - `RunBackupScheduleJob`:
@ -36,12 +37,21 @@ ## Run Execution
     - failed if nothing backed up / hard error
  7) update schedule last_run_* and compute/persist next_run_at
  8) dispatch retention job
+  9) audit logs:
+    - log run start + completion (status, counts, error_code; no secrets)
+
+## Retry / Backoff
+- Configure job retry behavior based on error classification:
+  - Throttling/transient (e.g. 429/503): backoff + retry
+  - Auth/permission (401/403): no retry
+  - Unknown: limited retries

 ## Retention
 - `ApplyBackupScheduleRetentionJob(schedule_id)`:
  - identify runs ordered newest→oldest
  - keep last N runs that created a backup_set_id
  - for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
+  - audit log: number of deleted BackupSets

 ## Filament UX
 - Tenant-scoped resources:
@ -49,6 +59,7 @@ ## Filament UX
  - Runs UI via RelationManager under schedule (or a dedicated resource if needed)
 - Actions: enable/disable, run now, retry
 - Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
+  - MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.

 ## Ops / Deployment Notes
 - Requires queue worker.
--- a/specs/032-backup-scheduling-mvp/spec.md
+++ b/specs/032-backup-scheduling-mvp/spec.md
@ -41,7 +41,7 @@ ### Functional Requirements
 - **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
 - **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
 - **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen.
+- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode).

 ### UX Requirements (Filament)
 - **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
@ -50,19 +50,22 @@ ### UX Requirements (Filament)

 ### Security / Authorization
 - **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
- **SEC-002**: Permissions (RBAC):
-  - `backup_schedules.view`
-  - `backup_schedules.manage`
-  - `backup_schedules.run_now`
-  - `backup_schedules.runs.view`
- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens).
+- **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio):
+  - `readonly`: Schedules ansehen + Runs ansehen
+  - `operator`: zusätzlich “Run now” / “Retry”
+  - `manager` / `owner`: zusätzlich Schedules verwalten (CRUD)
+- **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets).

 ### Reliability / Non-Functional Requirements
 - **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
 - **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry.
+- **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed.
 - **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.

+### Scheduling Semantics
+- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone.
+- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet.
+
 ## Data Model

 ### backup_schedules
--- a/specs/032-backup-scheduling-mvp/tasks.md
+++ b/specs/032-backup-scheduling-mvp/tasks.md
@ -20,11 +20,14 @@ ## Phase 4: Jobs
 - [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
 - [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
 - [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
+ - [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets).
+ - [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403).

 ## Phase 5: Filament UI
 - [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
 - [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
 - [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
+ - [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions.

 ## Phase 6: Tests
 - [ ] T014 Unit: due-calculation + next_run_at.
@ -32,6 +35,7 @@ ## Phase 6: Tests
 - [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
 - [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
 - [ ] T018 Retention: keeps last N and deletes older backup sets.
+ - [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior.

 ## Phase 7: Verification
 - [ ] T019 Run targeted tests (Pest).