feat/032-backup-scheduling-mvp #36

Merged
ahmido merged 14 commits from feat/032-backup-scheduling-mvp into dev 2026-01-07 01:12:13 +00:00
4 changed files with 28 additions and 8 deletions
Showing only changes of commit b05a60e392 - Show all commits

View File

@ -7,5 +7,7 @@ # Requirements Checklist (032)
- [ ] Run stores status + summary + error_code/error_message.
- [ ] UI shows schedule list + run history + link to backup set.
- [ ] Run now + Retry are permission-gated and write DB notifications.
- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets).
- [ ] Retry/backoff policy implemented (no retry for 401/403).
- [ ] Retention keeps last N and soft-deletes older backup sets.
- [ ] Tests cover due-calculation, idempotency, job success/failure, retention.

View File

@ -20,6 +20,7 @@ ## Scheduling Mechanism
4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
- Concurrency:
- Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
- If lock is held: mark run as `skipped` with a clear error_code (no parallel execution).
## Run Execution
- `RunBackupScheduleJob`:
@ -36,12 +37,21 @@ ## Run Execution
- failed if nothing backed up / hard error
7) update schedule last_run_* and compute/persist next_run_at
8) dispatch retention job
9) audit logs:
- log run start + completion (status, counts, error_code; no secrets)
## Retry / Backoff
- Configure job retry behavior based on error classification:
- Throttling/transient (e.g. 429/503): backoff + retry
- Auth/permission (401/403): no retry
- Unknown: limited retries
## Retention
- `ApplyBackupScheduleRetentionJob(schedule_id)`:
- identify runs ordered newest→oldest
- keep last N runs that created a backup_set_id
- for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
- audit log: number of deleted BackupSets
## Filament UX
- Tenant-scoped resources:
@ -49,6 +59,7 @@ ## Filament UX
- Runs UI via RelationManager under schedule (or a dedicated resource if needed)
- Actions: enable/disable, run now, retry
- Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
- MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.
## Ops / Deployment Notes
- Requires queue worker.

View File

@ -41,7 +41,7 @@ ### Functional Requirements
- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen.
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode).
### UX Requirements (Filament)
- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
@ -50,19 +50,22 @@ ### UX Requirements (Filament)
### Security / Authorization
- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
- **SEC-002**: Permissions (RBAC):
- `backup_schedules.view`
- `backup_schedules.manage`
- `backup_schedules.run_now`
- `backup_schedules.runs.view`
- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens).
- **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio):
- `readonly`: Schedules ansehen + Runs ansehen
- `operator`: zusätzlich “Run now” / “Retry”
- `manager` / `owner`: zusätzlich Schedules verwalten (CRUD)
- **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets).
### Reliability / Non-Functional Requirements
- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry.
- **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed.
- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.
### Scheduling Semantics
- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone.
- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet.
## Data Model
### backup_schedules

View File

@ -20,11 +20,14 @@ ## Phase 4: Jobs
- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets).
- [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403).
## Phase 5: Filament UI
- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
- [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions.
## Phase 6: Tests
- [ ] T014 Unit: due-calculation + next_run_at.
@ -32,6 +35,7 @@ ## Phase 6: Tests
- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
- [ ] T018 Retention: keeps last N and deletes older backup sets.
- [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior.
## Phase 7: Verification
- [ ] T019 Run targeted tests (Pest).