feat/032-backup-scheduling-mvp (#33)

Ziel: MVP-Spezifikation für “Automatisierte Backups per Zeitplan (pro Tenant)” als Grundlage für die Implementierung (Spec-first). Scope (MVP): Tenant-scoped backup_schedules + backup_schedule_runs Dispatcher erstellt idempotente Runs (Unique Slot) + Queue-Job führt Run aus “Run now” / “Retry”, Run-History, Retention (keep last N) No catch-up für verpasste Slots Wichtige Klarstellungen (aus Constitution abgeleitet): Jede Operation ist tenant-scoped und schreibt Audit Logs (Dispatcher/Run/Retention; keine Secrets/Tokens) Graph-Aufrufe laufen über die bestehende Abstraktion (keine Hardcodings) Retry/Backoff: Throttling → Backoff; 401/403 → kein Retry Authorization (MVP): TenantRole-Matrix (readonly/operator/manager/owner) statt neuer Permission-Registry Nicht im MVP: Kein Restore-Scheduling Kein Cross-Tenant Bulk Scheduling / Templates Kein Catch-up von missed runs Review-Fokus: Semantik “1 Run = 1 BackupSet” Concurrency/Lock-Verhalten (bei laufendem Run → skipped) DST/Timezone-Regeln + Slot-Minutenpräzision Artefakte: spec.md plan.md tasks.md requirements.md Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local> Reviewed-on: #33
2026-01-04 23:54:56 +00:00 · 2026-01-04 23:54:56 +00:00 · beffbfca4c
commit beffbfca4c
parent 2ca989c00f
4 changed files with 239 additions and 0 deletions
--- a/specs/032-backup-scheduling-mvp/checklists/requirements.md
+++ b/specs/032-backup-scheduling-mvp/checklists/requirements.md
@ -0,0 +1,13 @@
+# Requirements Checklist (032)
+
+- [ ] Tenant-scoped tables use `tenant_id` consistently.
+- [ ] 1 Run = 1 BackupSet (no rolling reuse in MVP).
+- [ ] Dispatcher is idempotent (unique schedule_id + scheduled_for).
+- [ ] Concurrency lock prevents parallel runs per schedule.
+- [ ] Run stores status + summary + error_code/error_message.
+- [ ] UI shows schedule list + run history + link to backup set.
+- [ ] Run now + Retry are permission-gated and write DB notifications.
+- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets).
+- [ ] Retry/backoff policy implemented (no retry for 401/403).
+- [ ] Retention keeps last N and soft-deletes older backup sets.
+- [ ] Tests cover due-calculation, idempotency, job success/failure, retention.
--- a/specs/032-backup-scheduling-mvp/plan.md
+++ b/specs/032-backup-scheduling-mvp/plan.md
@ -0,0 +1,67 @@
+# Plan: Backup Scheduling MVP (032)
+
+**Date**: 2026-01-05
+**Input**: spec.md
+
+## Architecture / Reuse
+- Reuse existing services:
+  - `PolicySyncService::syncPoliciesWithReport()` for selected policy types
+  - `BackupService::createBackupSet()` to create immutable snapshots + items (include_foundations supported)
+- Store selection as `policy_types` (config keys), not free-form categories.
+- Use tenant scoping (`tenant_id`) consistent with existing tables (`backup_sets`, `backup_items`).
+
+## Scheduling Mechanism
+- Add Artisan command: `tenantpilot:schedules:dispatch`.
+- Scheduler integration (Laravel 12): schedule the command every minute via `routes/console.php` + ops configuration (Dokploy cron `schedule:run` or long-running `schedule:work`).
+- Dispatcher algorithm:
+  1) load enabled schedules
+  2) compute whether due for the current minute in schedule timezone
+  3) create run with `scheduled_for` slot (minute precision) using DB unique constraint
+  4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
+- Concurrency:
+  - Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
+  - If lock is held: mark run as `skipped` with a clear error_code (no parallel execution).
+
+## Run Execution
+- `RunBackupScheduleJob`:
+  1) load schedule + tenant
+  2) preflight: tenant active; Graph/auth errors mapped to error_code
+  3) sync policies for selected types (collect report)
+  4) select policy IDs from local DB for those types (exclude ignored)
+  5) create backup set:
+     - name: `{schedule_name} - {Y-m-d H:i}`
+     - includeFoundations: schedule flag
+  6) set run status:
+     - success if backup_set.status == completed
+     - partial if backup_set.status == partial OR sync had failures but backup succeeded
+     - failed if nothing backed up / hard error
+  7) update schedule last_run_* and compute/persist next_run_at
+  8) dispatch retention job
+  9) audit logs:
+    - log run start + completion (status, counts, error_code; no secrets)
+
+## Retry / Backoff
+- Configure job retry behavior based on error classification:
+  - Throttling/transient (e.g. 429/503): backoff + retry
+  - Auth/permission (401/403): no retry
+  - Unknown: limited retries
+
+## Retention
+- `ApplyBackupScheduleRetentionJob(schedule_id)`:
+  - identify runs ordered newest→oldest
+  - keep last N runs that created a backup_set_id
+  - for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
+  - audit log: number of deleted BackupSets
+
+## Filament UX
+- Tenant-scoped resources:
+  - `BackupScheduleResource`
+  - Runs UI via RelationManager under schedule (or a dedicated resource if needed)
+- Actions: enable/disable, run now, retry
+- Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
+  - MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.
+
+## Ops / Deployment Notes
+- Requires queue worker.
+- Requires scheduler running.
+- Missed runs policy (MVP): no catch-up.
--- a/specs/032-backup-scheduling-mvp/spec.md
+++ b/specs/032-backup-scheduling-mvp/spec.md
@ -0,0 +1,117 @@
+# Feature Specification: Backup Scheduling MVP (032)
+
+**Feature**: Automatisierte Backups per Zeitplan (pro Tenant)
+**Created**: 2026-01-05
+**Status**: Ready for implementation (MVP)
+**Risk**: Medium (Backup-only, no restore scheduling)
+**Dependencies**: Tenant Portfolio + Tenant Context Switch ✅
+
+## Context
+TenantPilot unterstützt manuelle Backups. Kunden/MSPs benötigen regelmäßige, zuverlässige Backups pro Tenant (z. B. nightly), inkl. nachvollziehbarer Runs, Fehlercodes und Retention.
+
+## Goals
+- Pro Tenant können 1..n Backup Schedules angelegt werden.
+- Schedules laufen automatisch via Queue/Worker.
+- Jeder Lauf wird als Run auditierbar gespeichert (Status, Counts, Fehler).
+- Retention löscht alte Backups nach Policy.
+- Filament UI: Schedules verwalten, Run-History ansehen, “Run now”, “Retry”.
+
+## Non-Goals (MVP)
+- Kein Kalender-UI als Pflicht (kann später ergänzt werden).
+- Kein Cross-Tenant Bulk Scheduling (MSP-Templates später).
+- Kein “drift-triggered scheduling” (kommt nach Drift-MVP).
+- Kein Restore via Scheduling (nur Backup).
+
+## Definitions
+- **Schedule**: Wiederkehrender Plan (daily/weekly, timezone).
+- **Run**: Konkrete Ausführung eines Schedules (scheduled_for + status).
+- **BackupSet**: Ergebniscontainer eines Runs.
+
+**MVP Semantik**: **1 Run = 1 neues BackupSet** (kein Rolling-Reuse im MVP).
+
+## Requirements
+
+### Functional Requirements
+- **FR-001**: Schedules sind tenant-scoped via `tenant_id` (FK auf `tenants.id`).
+- **FR-002**: Dispatcher erkennt “due” schedules und erstellt genau einen Run pro Zeit-Slot (idempotent).
+- **FR-003**: Run nutzt bestehende Services:
+  - Sync Policies (nur selektierte policy types)
+  - Create BackupSet aus lokalen Policy-IDs (inkl. Foundations optional)
+- **FR-004**: Run schreibt `backup_schedule_runs` mit Status + Summary + Error-Codes.
+- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
+- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
+- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
+- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode).
+
+### UX Requirements (Filament)
+- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
+- **UX-002**: Run-History pro Schedule zeigt scheduled_for, status, duration, counts, error_code/message, Link zum BackupSet.
+- **UX-003**: “Run now” und “Retry” sind nur mit passenden Rechten verfügbar.
+
+### Security / Authorization
+- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
+- **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio):
+  - `readonly`: Schedules ansehen + Runs ansehen
+  - `operator`: zusätzlich “Run now” / “Retry”
+  - `manager` / `owner`: zusätzlich Schedules verwalten (CRUD)
+- **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets).
+
+### Reliability / Non-Functional Requirements
+- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
+- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
+- **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed.
+- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.
+
+### Scheduling Semantics
+- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone.
+- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet.
+
+## Data Model
+
+### backup_schedules
+- `id` bigint
+- `tenant_id` FK tenants.id
+- `name` string
+- `is_enabled` bool default true
+- `timezone` string default 'UTC'
+- `frequency` string enum: daily|weekly
+- `time_of_day` time
+- `days_of_week` json nullable (array<int>, weekly only; 1=Mon..7=Sun)
+- `policy_types` jsonb (array<string>)
+- `include_foundations` bool default true
+- `retention_keep_last` int default 30
+- `last_run_at` datetime nullable
+- `last_run_status` string nullable
+- `next_run_at` datetime nullable
+- timestamps
+
+Indexes:
+- (tenant_id, is_enabled)
+- (next_run_at) optional
+
+### backup_schedule_runs
+- `id` bigint
+- `backup_schedule_id` FK
+- `tenant_id` FK (denormalisiert)
+- `scheduled_for` datetime
+- `started_at` datetime nullable
+- `finished_at` datetime nullable
+- `status` string enum: running|success|partial|failed|canceled|skipped
+- `summary` jsonb (policies_total, policies_backed_up, errors_count, type_breakdown, warnings)
+- `error_code` string nullable
+- `error_message` text nullable
+- `backup_set_id` FK nullable
+- timestamps
+
+Indexes:
+- (backup_schedule_id, scheduled_for)
+- (tenant_id, created_at)
+- **Unique**: (backup_schedule_id, scheduled_for)
+
+## Acceptance Criteria
+- User kann pro Tenant einen Schedule anlegen (daily/weekly, time, timezone, policy types, retention).
+- Dispatcher erstellt Runs zur geplanten Zeit (Queue Worker vorausgesetzt).
+- UI zeigt Last Run + Next Run + Run-History.
+- Run now startet sofort.
+- Fehlerfälle (Token/Permission/Throttle) werden als failed/partial markiert mit error_code.
+- Retention hält nur die letzten N BackupSets pro Schedule.
--- a/specs/032-backup-scheduling-mvp/tasks.md
+++ b/specs/032-backup-scheduling-mvp/tasks.md
@ -0,0 +1,42 @@
+# Tasks: Backup Scheduling MVP (032)
+
+**Date**: 2026-01-05
+**Input**: spec.md, plan.md
+
+## Phase 1: Spec & Setup
+- [ ] T001 Create specs/032-backup-scheduling-mvp (spec/plan/tasks + checklist).
+
+## Phase 2: Data Model
+- [ ] T002 Add migrations: backup_schedules + backup_schedule_runs (tenant-scoped, indexes, unique slot).
+- [ ] T003 Add models + relationships (Tenant->schedules, Schedule->runs, Run->backupSet).
+
+## Phase 3: Scheduling + Dispatch
+- [ ] T004 Add command `tenantpilot:schedules:dispatch`.
+- [ ] T005 Register scheduler to run every minute.
+- [ ] T006 Implement due-calculation (timezone, daily/weekly) + next_run_at computation.
+- [ ] T007 Implement idempotent run creation (unique slot) + cache lock.
+
+## Phase 4: Jobs
+- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
+- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
+- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
+ - [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets).
+ - [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403).
+
+## Phase 5: Filament UI
+- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
+- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
+- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
+ - [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions.
+
+## Phase 6: Tests
+- [ ] T014 Unit: due-calculation + next_run_at.
+- [ ] T015 Feature: dispatcher idempotency (unique slot); lock behavior.
+- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
+- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
+- [ ] T018 Retention: keeps last N and deletes older backup sets.
+ - [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior.
+
+## Phase 7: Verification
+- [ ] T019 Run targeted tests (Pest).
+- [ ] T020 Run Pint (`./vendor/bin/pint --dirty`).