diff --git a/specs/032-backup-scheduling-mvp/checklists/requirements.md b/specs/032-backup-scheduling-mvp/checklists/requirements.md new file mode 100644 index 0000000..f3fe8af --- /dev/null +++ b/specs/032-backup-scheduling-mvp/checklists/requirements.md @@ -0,0 +1,11 @@ +# Requirements Checklist (032) + +- [ ] Tenant-scoped tables use `tenant_id` consistently. +- [ ] 1 Run = 1 BackupSet (no rolling reuse in MVP). +- [ ] Dispatcher is idempotent (unique schedule_id + scheduled_for). +- [ ] Concurrency lock prevents parallel runs per schedule. +- [ ] Run stores status + summary + error_code/error_message. +- [ ] UI shows schedule list + run history + link to backup set. +- [ ] Run now + Retry are permission-gated and write DB notifications. +- [ ] Retention keeps last N and soft-deletes older backup sets. +- [ ] Tests cover due-calculation, idempotency, job success/failure, retention. diff --git a/specs/032-backup-scheduling-mvp/plan.md b/specs/032-backup-scheduling-mvp/plan.md new file mode 100644 index 0000000..dfc65ae --- /dev/null +++ b/specs/032-backup-scheduling-mvp/plan.md @@ -0,0 +1,56 @@ +# Plan: Backup Scheduling MVP (032) + +**Date**: 2026-01-05 +**Input**: spec.md + +## Architecture / Reuse +- Reuse existing services: + - `PolicySyncService::syncPoliciesWithReport()` for selected policy types + - `BackupService::createBackupSet()` to create immutable snapshots + items (include_foundations supported) +- Store selection as `policy_types` (config keys), not free-form categories. +- Use tenant scoping (`tenant_id`) consistent with existing tables (`backup_sets`, `backup_items`). + +## Scheduling Mechanism +- Add Artisan command: `tenantpilot:schedules:dispatch`. +- Scheduler integration (Laravel 12): schedule the command every minute via `routes/console.php` + ops configuration (Dokploy cron `schedule:run` or long-running `schedule:work`). +- Dispatcher algorithm: + 1) load enabled schedules + 2) compute whether due for the current minute in schedule timezone + 3) create run with `scheduled_for` slot (minute precision) using DB unique constraint + 4) dispatch `RunBackupScheduleJob(schedule_id, run_id)` +- Concurrency: + - Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency. + +## Run Execution +- `RunBackupScheduleJob`: + 1) load schedule + tenant + 2) preflight: tenant active; Graph/auth errors mapped to error_code + 3) sync policies for selected types (collect report) + 4) select policy IDs from local DB for those types (exclude ignored) + 5) create backup set: + - name: `{schedule_name} - {Y-m-d H:i}` + - includeFoundations: schedule flag + 6) set run status: + - success if backup_set.status == completed + - partial if backup_set.status == partial OR sync had failures but backup succeeded + - failed if nothing backed up / hard error + 7) update schedule last_run_* and compute/persist next_run_at + 8) dispatch retention job + +## Retention +- `ApplyBackupScheduleRetentionJob(schedule_id)`: + - identify runs ordered newest→oldest + - keep last N runs that created a backup_set_id + - for older ones: soft-delete referenced BackupSets (and cascade soft-delete items) + +## Filament UX +- Tenant-scoped resources: + - `BackupScheduleResource` + - Runs UI via RelationManager under schedule (or a dedicated resource if needed) +- Actions: enable/disable, run now, retry +- Notifications: persist via `->sendToDatabase($user)` for the DB info panel. + +## Ops / Deployment Notes +- Requires queue worker. +- Requires scheduler running. +- Missed runs policy (MVP): no catch-up. diff --git a/specs/032-backup-scheduling-mvp/spec.md b/specs/032-backup-scheduling-mvp/spec.md new file mode 100644 index 0000000..f0f93bc --- /dev/null +++ b/specs/032-backup-scheduling-mvp/spec.md @@ -0,0 +1,114 @@ +# Feature Specification: Backup Scheduling MVP (032) + +**Feature**: Automatisierte Backups per Zeitplan (pro Tenant) +**Created**: 2026-01-05 +**Status**: Ready for implementation (MVP) +**Risk**: Medium (Backup-only, no restore scheduling) +**Dependencies**: Tenant Portfolio + Tenant Context Switch ✅ + +## Context +TenantPilot unterstützt manuelle Backups. Kunden/MSPs benötigen regelmäßige, zuverlässige Backups pro Tenant (z. B. nightly), inkl. nachvollziehbarer Runs, Fehlercodes und Retention. + +## Goals +- Pro Tenant können 1..n Backup Schedules angelegt werden. +- Schedules laufen automatisch via Queue/Worker. +- Jeder Lauf wird als Run auditierbar gespeichert (Status, Counts, Fehler). +- Retention löscht alte Backups nach Policy. +- Filament UI: Schedules verwalten, Run-History ansehen, “Run now”, “Retry”. + +## Non-Goals (MVP) +- Kein Kalender-UI als Pflicht (kann später ergänzt werden). +- Kein Cross-Tenant Bulk Scheduling (MSP-Templates später). +- Kein “drift-triggered scheduling” (kommt nach Drift-MVP). +- Kein Restore via Scheduling (nur Backup). + +## Definitions +- **Schedule**: Wiederkehrender Plan (daily/weekly, timezone). +- **Run**: Konkrete Ausführung eines Schedules (scheduled_for + status). +- **BackupSet**: Ergebniscontainer eines Runs. + +**MVP Semantik**: **1 Run = 1 neues BackupSet** (kein Rolling-Reuse im MVP). + +## Requirements + +### Functional Requirements +- **FR-001**: Schedules sind tenant-scoped via `tenant_id` (FK auf `tenants.id`). +- **FR-002**: Dispatcher erkennt “due” schedules und erstellt genau einen Run pro Zeit-Slot (idempotent). +- **FR-003**: Run nutzt bestehende Services: + - Sync Policies (nur selektierte policy types) + - Create BackupSet aus lokalen Policy-IDs (inkl. Foundations optional) +- **FR-004**: Run schreibt `backup_schedule_runs` mit Status + Summary + Error-Codes. +- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job. +- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule. +- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets). +- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen. + +### UX Requirements (Filament) +- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run. +- **UX-002**: Run-History pro Schedule zeigt scheduled_for, status, duration, counts, error_code/message, Link zum BackupSet. +- **UX-003**: “Run now” und “Retry” sind nur mit passenden Rechten verfügbar. + +### Security / Authorization +- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants. +- **SEC-002**: Permissions (RBAC): + - `backup_schedules.view` + - `backup_schedules.manage` + - `backup_schedules.run_now` + - `backup_schedules.runs.view` +- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens). + +### Reliability / Non-Functional Requirements +- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`). +- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). +- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry. +- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot. + +## Data Model + +### backup_schedules +- `id` bigint +- `tenant_id` FK tenants.id +- `name` string +- `is_enabled` bool default true +- `timezone` string default 'UTC' +- `frequency` string enum: daily|weekly +- `time_of_day` time +- `days_of_week` json nullable (array, weekly only; 1=Mon..7=Sun) +- `policy_types` jsonb (array) +- `include_foundations` bool default true +- `retention_keep_last` int default 30 +- `last_run_at` datetime nullable +- `last_run_status` string nullable +- `next_run_at` datetime nullable +- timestamps + +Indexes: +- (tenant_id, is_enabled) +- (next_run_at) optional + +### backup_schedule_runs +- `id` bigint +- `backup_schedule_id` FK +- `tenant_id` FK (denormalisiert) +- `scheduled_for` datetime +- `started_at` datetime nullable +- `finished_at` datetime nullable +- `status` string enum: running|success|partial|failed|canceled|skipped +- `summary` jsonb (policies_total, policies_backed_up, errors_count, type_breakdown, warnings) +- `error_code` string nullable +- `error_message` text nullable +- `backup_set_id` FK nullable +- timestamps + +Indexes: +- (backup_schedule_id, scheduled_for) +- (tenant_id, created_at) +- **Unique**: (backup_schedule_id, scheduled_for) + +## Acceptance Criteria +- User kann pro Tenant einen Schedule anlegen (daily/weekly, time, timezone, policy types, retention). +- Dispatcher erstellt Runs zur geplanten Zeit (Queue Worker vorausgesetzt). +- UI zeigt Last Run + Next Run + Run-History. +- Run now startet sofort. +- Fehlerfälle (Token/Permission/Throttle) werden als failed/partial markiert mit error_code. +- Retention hält nur die letzten N BackupSets pro Schedule. diff --git a/specs/032-backup-scheduling-mvp/tasks.md b/specs/032-backup-scheduling-mvp/tasks.md new file mode 100644 index 0000000..69f733e --- /dev/null +++ b/specs/032-backup-scheduling-mvp/tasks.md @@ -0,0 +1,38 @@ +# Tasks: Backup Scheduling MVP (032) + +**Date**: 2026-01-05 +**Input**: spec.md, plan.md + +## Phase 1: Spec & Setup +- [ ] T001 Create specs/032-backup-scheduling-mvp (spec/plan/tasks + checklist). + +## Phase 2: Data Model +- [ ] T002 Add migrations: backup_schedules + backup_schedule_runs (tenant-scoped, indexes, unique slot). +- [ ] T003 Add models + relationships (Tenant->schedules, Schedule->runs, Run->backupSet). + +## Phase 3: Scheduling + Dispatch +- [ ] T004 Add command `tenantpilot:schedules:dispatch`. +- [ ] T005 Register scheduler to run every minute. +- [ ] T006 Implement due-calculation (timezone, daily/weekly) + next_run_at computation. +- [ ] T007 Implement idempotent run creation (unique slot) + cache lock. + +## Phase 4: Jobs +- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule). +- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets). +- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN). + +## Phase 5: Filament UI +- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable. +- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet. +- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB. + +## Phase 6: Tests +- [ ] T014 Unit: due-calculation + next_run_at. +- [ ] T015 Feature: dispatcher idempotency (unique slot); lock behavior. +- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked). +- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status. +- [ ] T018 Retention: keeps last N and deletes older backup sets. + +## Phase 7: Verification +- [ ] T019 Run targeted tests (Pest). +- [ ] T020 Run Pint (`./vendor/bin/pint --dirty`).