feat/032-backup-scheduling-mvp (#33)

Ziel: MVP-Spezifikation für “Automatisierte Backups per Zeitplan (pro Tenant)” als Grundlage für die Implementierung (Spec-first).
Scope (MVP):
Tenant-scoped backup_schedules + backup_schedule_runs
Dispatcher erstellt idempotente Runs (Unique Slot) + Queue-Job führt Run aus
“Run now” / “Retry”, Run-History, Retention (keep last N)
No catch-up für verpasste Slots
Wichtige Klarstellungen (aus Constitution abgeleitet):
Jede Operation ist tenant-scoped und schreibt Audit Logs (Dispatcher/Run/Retention; keine Secrets/Tokens)
Graph-Aufrufe laufen über die bestehende Abstraktion (keine Hardcodings)
Retry/Backoff: Throttling → Backoff; 401/403 → kein Retry
Authorization (MVP):
TenantRole-Matrix (readonly/operator/manager/owner) statt neuer Permission-Registry
Nicht im MVP:
Kein Restore-Scheduling
Kein Cross-Tenant Bulk Scheduling / Templates
Kein Catch-up von missed runs
Review-Fokus:
Semantik “1 Run = 1 BackupSet”
Concurrency/Lock-Verhalten (bei laufendem Run → skipped)
DST/Timezone-Regeln + Slot-Minutenpräzision
Artefakte:
spec.md
plan.md
tasks.md
requirements.md

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #33
This commit is contained in:
ahmido 2026-01-04 23:54:56 +00:00
parent 2ca989c00f
commit beffbfca4c
4 changed files with 239 additions and 0 deletions

View File

@ -0,0 +1,13 @@
# Requirements Checklist (032)
- [ ] Tenant-scoped tables use `tenant_id` consistently.
- [ ] 1 Run = 1 BackupSet (no rolling reuse in MVP).
- [ ] Dispatcher is idempotent (unique schedule_id + scheduled_for).
- [ ] Concurrency lock prevents parallel runs per schedule.
- [ ] Run stores status + summary + error_code/error_message.
- [ ] UI shows schedule list + run history + link to backup set.
- [ ] Run now + Retry are permission-gated and write DB notifications.
- [ ] Audit logs are written for dispatcher, runs, and retention (tenant-scoped; no secrets).
- [ ] Retry/backoff policy implemented (no retry for 401/403).
- [ ] Retention keeps last N and soft-deletes older backup sets.
- [ ] Tests cover due-calculation, idempotency, job success/failure, retention.

View File

@ -0,0 +1,67 @@
# Plan: Backup Scheduling MVP (032)
**Date**: 2026-01-05
**Input**: spec.md
## Architecture / Reuse
- Reuse existing services:
- `PolicySyncService::syncPoliciesWithReport()` for selected policy types
- `BackupService::createBackupSet()` to create immutable snapshots + items (include_foundations supported)
- Store selection as `policy_types` (config keys), not free-form categories.
- Use tenant scoping (`tenant_id`) consistent with existing tables (`backup_sets`, `backup_items`).
## Scheduling Mechanism
- Add Artisan command: `tenantpilot:schedules:dispatch`.
- Scheduler integration (Laravel 12): schedule the command every minute via `routes/console.php` + ops configuration (Dokploy cron `schedule:run` or long-running `schedule:work`).
- Dispatcher algorithm:
1) load enabled schedules
2) compute whether due for the current minute in schedule timezone
3) create run with `scheduled_for` slot (minute precision) using DB unique constraint
4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
- Concurrency:
- Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
- If lock is held: mark run as `skipped` with a clear error_code (no parallel execution).
## Run Execution
- `RunBackupScheduleJob`:
1) load schedule + tenant
2) preflight: tenant active; Graph/auth errors mapped to error_code
3) sync policies for selected types (collect report)
4) select policy IDs from local DB for those types (exclude ignored)
5) create backup set:
- name: `{schedule_name} - {Y-m-d H:i}`
- includeFoundations: schedule flag
6) set run status:
- success if backup_set.status == completed
- partial if backup_set.status == partial OR sync had failures but backup succeeded
- failed if nothing backed up / hard error
7) update schedule last_run_* and compute/persist next_run_at
8) dispatch retention job
9) audit logs:
- log run start + completion (status, counts, error_code; no secrets)
## Retry / Backoff
- Configure job retry behavior based on error classification:
- Throttling/transient (e.g. 429/503): backoff + retry
- Auth/permission (401/403): no retry
- Unknown: limited retries
## Retention
- `ApplyBackupScheduleRetentionJob(schedule_id)`:
- identify runs ordered newest→oldest
- keep last N runs that created a backup_set_id
- for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
- audit log: number of deleted BackupSets
## Filament UX
- Tenant-scoped resources:
- `BackupScheduleResource`
- Runs UI via RelationManager under schedule (or a dedicated resource if needed)
- Actions: enable/disable, run now, retry
- Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
- MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.
## Ops / Deployment Notes
- Requires queue worker.
- Requires scheduler running.
- Missed runs policy (MVP): no catch-up.

View File

@ -0,0 +1,117 @@
# Feature Specification: Backup Scheduling MVP (032)
**Feature**: Automatisierte Backups per Zeitplan (pro Tenant)
**Created**: 2026-01-05
**Status**: Ready for implementation (MVP)
**Risk**: Medium (Backup-only, no restore scheduling)
**Dependencies**: Tenant Portfolio + Tenant Context Switch ✅
## Context
TenantPilot unterstützt manuelle Backups. Kunden/MSPs benötigen regelmäßige, zuverlässige Backups pro Tenant (z. B. nightly), inkl. nachvollziehbarer Runs, Fehlercodes und Retention.
## Goals
- Pro Tenant können 1..n Backup Schedules angelegt werden.
- Schedules laufen automatisch via Queue/Worker.
- Jeder Lauf wird als Run auditierbar gespeichert (Status, Counts, Fehler).
- Retention löscht alte Backups nach Policy.
- Filament UI: Schedules verwalten, Run-History ansehen, “Run now”, “Retry”.
## Non-Goals (MVP)
- Kein Kalender-UI als Pflicht (kann später ergänzt werden).
- Kein Cross-Tenant Bulk Scheduling (MSP-Templates später).
- Kein “drift-triggered scheduling” (kommt nach Drift-MVP).
- Kein Restore via Scheduling (nur Backup).
## Definitions
- **Schedule**: Wiederkehrender Plan (daily/weekly, timezone).
- **Run**: Konkrete Ausführung eines Schedules (scheduled_for + status).
- **BackupSet**: Ergebniscontainer eines Runs.
**MVP Semantik**: **1 Run = 1 neues BackupSet** (kein Rolling-Reuse im MVP).
## Requirements
### Functional Requirements
- **FR-001**: Schedules sind tenant-scoped via `tenant_id` (FK auf `tenants.id`).
- **FR-002**: Dispatcher erkennt “due” schedules und erstellt genau einen Run pro Zeit-Slot (idempotent).
- **FR-003**: Run nutzt bestehende Services:
- Sync Policies (nur selektierte policy types)
- Create BackupSet aus lokalen Policy-IDs (inkl. Foundations optional)
- **FR-004**: Run schreibt `backup_schedule_runs` mit Status + Summary + Error-Codes.
- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. Wenn bereits ein Run läuft, wird ein neuer Run nicht parallel gestartet und stattdessen als `skipped` markiert (mit Fehlercode).
### UX Requirements (Filament)
- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
- **UX-002**: Run-History pro Schedule zeigt scheduled_for, status, duration, counts, error_code/message, Link zum BackupSet.
- **UX-003**: “Run now” und “Retry” sind nur mit passenden Rechten verfügbar.
### Security / Authorization
- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
- **SEC-002 (MVP)**: Authorization erfolgt über TenantRole (wie Tenant Portfolio):
- `readonly`: Schedules ansehen + Runs ansehen
- `operator`: zusätzlich “Run now” / “Retry”
- `manager` / `owner`: zusätzlich Schedules verwalten (CRUD)
- **SEC-003**: Dispatcher, Run-Execution und Retention schreiben tenant-scoped Audit Logs (keine Secrets/Tokens), inkl. Run-Start/Run-Ende und Retention-Ergebnis (z. B. Anzahl gelöschter BackupSets).
### Reliability / Non-Functional Requirements
- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- **NFR-003**: Retries: Throttling (z. B. 429/503) → Backoff; 401/403 → kein Retry; Unknown → begrenzte Retries und danach failed.
- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.
### Scheduling Semantics
- `scheduled_for` ist **minute-basiert** (Slot), in UTC gespeichert. Due-Berechnung erfolgt in der Schedule-Timezone.
- DST (MVP): Bei ungültiger lokaler Zeit wird der Slot übersprungen (Run `skipped`). Bei ambiger lokaler Zeit wird die erste Occurrence verwendet.
## Data Model
### backup_schedules
- `id` bigint
- `tenant_id` FK tenants.id
- `name` string
- `is_enabled` bool default true
- `timezone` string default 'UTC'
- `frequency` string enum: daily|weekly
- `time_of_day` time
- `days_of_week` json nullable (array<int>, weekly only; 1=Mon..7=Sun)
- `policy_types` jsonb (array<string>)
- `include_foundations` bool default true
- `retention_keep_last` int default 30
- `last_run_at` datetime nullable
- `last_run_status` string nullable
- `next_run_at` datetime nullable
- timestamps
Indexes:
- (tenant_id, is_enabled)
- (next_run_at) optional
### backup_schedule_runs
- `id` bigint
- `backup_schedule_id` FK
- `tenant_id` FK (denormalisiert)
- `scheduled_for` datetime
- `started_at` datetime nullable
- `finished_at` datetime nullable
- `status` string enum: running|success|partial|failed|canceled|skipped
- `summary` jsonb (policies_total, policies_backed_up, errors_count, type_breakdown, warnings)
- `error_code` string nullable
- `error_message` text nullable
- `backup_set_id` FK nullable
- timestamps
Indexes:
- (backup_schedule_id, scheduled_for)
- (tenant_id, created_at)
- **Unique**: (backup_schedule_id, scheduled_for)
## Acceptance Criteria
- User kann pro Tenant einen Schedule anlegen (daily/weekly, time, timezone, policy types, retention).
- Dispatcher erstellt Runs zur geplanten Zeit (Queue Worker vorausgesetzt).
- UI zeigt Last Run + Next Run + Run-History.
- Run now startet sofort.
- Fehlerfälle (Token/Permission/Throttle) werden als failed/partial markiert mit error_code.
- Retention hält nur die letzten N BackupSets pro Schedule.

View File

@ -0,0 +1,42 @@
# Tasks: Backup Scheduling MVP (032)
**Date**: 2026-01-05
**Input**: spec.md, plan.md
## Phase 1: Spec & Setup
- [ ] T001 Create specs/032-backup-scheduling-mvp (spec/plan/tasks + checklist).
## Phase 2: Data Model
- [ ] T002 Add migrations: backup_schedules + backup_schedule_runs (tenant-scoped, indexes, unique slot).
- [ ] T003 Add models + relationships (Tenant->schedules, Schedule->runs, Run->backupSet).
## Phase 3: Scheduling + Dispatch
- [ ] T004 Add command `tenantpilot:schedules:dispatch`.
- [ ] T005 Register scheduler to run every minute.
- [ ] T006 Implement due-calculation (timezone, daily/weekly) + next_run_at computation.
- [ ] T007 Implement idempotent run creation (unique slot) + cache lock.
## Phase 4: Jobs
- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- [ ] T021 Add audit logging for dispatcher/run/retention (tenant-scoped; no secrets).
- [ ] T022 Implement retry/backoff strategy for `RunBackupScheduleJob` (no retry on 401/403).
## Phase 5: Filament UI
- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
- [ ] T023 Wire authorization to TenantRole (readonly/operator/manager/owner) for schedule CRUD and run actions.
## Phase 6: Tests
- [ ] T014 Unit: due-calculation + next_run_at.
- [ ] T015 Feature: dispatcher idempotency (unique slot); lock behavior.
- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
- [ ] T018 Retention: keeps last N and deletes older backup sets.
- [ ] T024 Tests: audit logs written (run success + retention delete) and retry policy behavior.
## Phase 7: Verification
- [ ] T019 Run targeted tests (Pest).
- [ ] T020 Run Pint (`./vendor/bin/pint --dirty`).