feat/032-backup-scheduling-mvp #36

Merged
ahmido merged 14 commits from feat/032-backup-scheduling-mvp into dev 2026-01-07 01:12:13 +00:00
4 changed files with 219 additions and 0 deletions
Showing only changes of commit fa8e15f4c2 - Show all commits

View File

@ -0,0 +1,11 @@
# Requirements Checklist (032)
- [ ] Tenant-scoped tables use `tenant_id` consistently.
- [ ] 1 Run = 1 BackupSet (no rolling reuse in MVP).
- [ ] Dispatcher is idempotent (unique schedule_id + scheduled_for).
- [ ] Concurrency lock prevents parallel runs per schedule.
- [ ] Run stores status + summary + error_code/error_message.
- [ ] UI shows schedule list + run history + link to backup set.
- [ ] Run now + Retry are permission-gated and write DB notifications.
- [ ] Retention keeps last N and soft-deletes older backup sets.
- [ ] Tests cover due-calculation, idempotency, job success/failure, retention.

View File

@ -0,0 +1,56 @@
# Plan: Backup Scheduling MVP (032)
**Date**: 2026-01-05
**Input**: spec.md
## Architecture / Reuse
- Reuse existing services:
- `PolicySyncService::syncPoliciesWithReport()` for selected policy types
- `BackupService::createBackupSet()` to create immutable snapshots + items (include_foundations supported)
- Store selection as `policy_types` (config keys), not free-form categories.
- Use tenant scoping (`tenant_id`) consistent with existing tables (`backup_sets`, `backup_items`).
## Scheduling Mechanism
- Add Artisan command: `tenantpilot:schedules:dispatch`.
- Scheduler integration (Laravel 12): schedule the command every minute via `routes/console.php` + ops configuration (Dokploy cron `schedule:run` or long-running `schedule:work`).
- Dispatcher algorithm:
1) load enabled schedules
2) compute whether due for the current minute in schedule timezone
3) create run with `scheduled_for` slot (minute precision) using DB unique constraint
4) dispatch `RunBackupScheduleJob(schedule_id, run_id)`
- Concurrency:
- Cache lock per schedule (`lock:backup_schedule:{id}`) plus DB unique slot constraint for idempotency.
## Run Execution
- `RunBackupScheduleJob`:
1) load schedule + tenant
2) preflight: tenant active; Graph/auth errors mapped to error_code
3) sync policies for selected types (collect report)
4) select policy IDs from local DB for those types (exclude ignored)
5) create backup set:
- name: `{schedule_name} - {Y-m-d H:i}`
- includeFoundations: schedule flag
6) set run status:
- success if backup_set.status == completed
- partial if backup_set.status == partial OR sync had failures but backup succeeded
- failed if nothing backed up / hard error
7) update schedule last_run_* and compute/persist next_run_at
8) dispatch retention job
## Retention
- `ApplyBackupScheduleRetentionJob(schedule_id)`:
- identify runs ordered newest→oldest
- keep last N runs that created a backup_set_id
- for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
## Filament UX
- Tenant-scoped resources:
- `BackupScheduleResource`
- Runs UI via RelationManager under schedule (or a dedicated resource if needed)
- Actions: enable/disable, run now, retry
- Notifications: persist via `->sendToDatabase($user)` for the DB info panel.
## Ops / Deployment Notes
- Requires queue worker.
- Requires scheduler running.
- Missed runs policy (MVP): no catch-up.

View File

@ -0,0 +1,114 @@
# Feature Specification: Backup Scheduling MVP (032)
**Feature**: Automatisierte Backups per Zeitplan (pro Tenant)
**Created**: 2026-01-05
**Status**: Ready for implementation (MVP)
**Risk**: Medium (Backup-only, no restore scheduling)
**Dependencies**: Tenant Portfolio + Tenant Context Switch ✅
## Context
TenantPilot unterstützt manuelle Backups. Kunden/MSPs benötigen regelmäßige, zuverlässige Backups pro Tenant (z. B. nightly), inkl. nachvollziehbarer Runs, Fehlercodes und Retention.
## Goals
- Pro Tenant können 1..n Backup Schedules angelegt werden.
- Schedules laufen automatisch via Queue/Worker.
- Jeder Lauf wird als Run auditierbar gespeichert (Status, Counts, Fehler).
- Retention löscht alte Backups nach Policy.
- Filament UI: Schedules verwalten, Run-History ansehen, “Run now”, “Retry”.
## Non-Goals (MVP)
- Kein Kalender-UI als Pflicht (kann später ergänzt werden).
- Kein Cross-Tenant Bulk Scheduling (MSP-Templates später).
- Kein “drift-triggered scheduling” (kommt nach Drift-MVP).
- Kein Restore via Scheduling (nur Backup).
## Definitions
- **Schedule**: Wiederkehrender Plan (daily/weekly, timezone).
- **Run**: Konkrete Ausführung eines Schedules (scheduled_for + status).
- **BackupSet**: Ergebniscontainer eines Runs.
**MVP Semantik**: **1 Run = 1 neues BackupSet** (kein Rolling-Reuse im MVP).
## Requirements
### Functional Requirements
- **FR-001**: Schedules sind tenant-scoped via `tenant_id` (FK auf `tenants.id`).
- **FR-002**: Dispatcher erkennt “due” schedules und erstellt genau einen Run pro Zeit-Slot (idempotent).
- **FR-003**: Run nutzt bestehende Services:
- Sync Policies (nur selektierte policy types)
- Create BackupSet aus lokalen Policy-IDs (inkl. Foundations optional)
- **FR-004**: Run schreibt `backup_schedule_runs` mit Status + Summary + Error-Codes.
- **FR-005**: “Run now” erzeugt sofort einen Run (scheduled_for=now) und dispatcht Job.
- **FR-006**: “Retry” erzeugt einen neuen Run für denselben Schedule.
- **FR-007**: Retention hält nur die letzten N Runs/BackupSets pro Schedule (soft delete BackupSets).
- **FR-008**: Concurrency: Pro Schedule darf nur ein Run gleichzeitig laufen.
### UX Requirements (Filament)
- **UX-001**: Schedule-Liste zeigt Enabled, Frequency, Time+Timezone, Policy Types Summary, Retention, Last Run, Next Run.
- **UX-002**: Run-History pro Schedule zeigt scheduled_for, status, duration, counts, error_code/message, Link zum BackupSet.
- **UX-003**: “Run now” und “Retry” sind nur mit passenden Rechten verfügbar.
### Security / Authorization
- **SEC-001**: Tenant Isolation: User sieht/managt nur Schedules des aktuellen Tenants.
- **SEC-002**: Permissions (RBAC):
- `backup_schedules.view`
- `backup_schedules.manage`
- `backup_schedules.run_now`
- `backup_schedules.runs.view`
- **SEC-003**: Runs schreiben tenant-scoped Audit Logs (keine Secrets/Tokens).
### Reliability / Non-Functional Requirements
- **NFR-001**: Idempotency durch Unique Slot-Constraint (`backup_schedule_id` + `scheduled_for`).
- **NFR-002**: Klare Fehlercodes (z. B. TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
- **NFR-003**: Retries: Throttling → Backoff; 401/403 → kein blind retry.
- **NFR-004**: Missed runs policy (MVP): **No catch-up** — wenn offline, wird nicht nachgeholt, nur nächster Slot.
## Data Model
### backup_schedules
- `id` bigint
- `tenant_id` FK tenants.id
- `name` string
- `is_enabled` bool default true
- `timezone` string default 'UTC'
- `frequency` string enum: daily|weekly
- `time_of_day` time
- `days_of_week` json nullable (array<int>, weekly only; 1=Mon..7=Sun)
- `policy_types` jsonb (array<string>)
- `include_foundations` bool default true
- `retention_keep_last` int default 30
- `last_run_at` datetime nullable
- `last_run_status` string nullable
- `next_run_at` datetime nullable
- timestamps
Indexes:
- (tenant_id, is_enabled)
- (next_run_at) optional
### backup_schedule_runs
- `id` bigint
- `backup_schedule_id` FK
- `tenant_id` FK (denormalisiert)
- `scheduled_for` datetime
- `started_at` datetime nullable
- `finished_at` datetime nullable
- `status` string enum: running|success|partial|failed|canceled|skipped
- `summary` jsonb (policies_total, policies_backed_up, errors_count, type_breakdown, warnings)
- `error_code` string nullable
- `error_message` text nullable
- `backup_set_id` FK nullable
- timestamps
Indexes:
- (backup_schedule_id, scheduled_for)
- (tenant_id, created_at)
- **Unique**: (backup_schedule_id, scheduled_for)
## Acceptance Criteria
- User kann pro Tenant einen Schedule anlegen (daily/weekly, time, timezone, policy types, retention).
- Dispatcher erstellt Runs zur geplanten Zeit (Queue Worker vorausgesetzt).
- UI zeigt Last Run + Next Run + Run-History.
- Run now startet sofort.
- Fehlerfälle (Token/Permission/Throttle) werden als failed/partial markiert mit error_code.
- Retention hält nur die letzten N BackupSets pro Schedule.

View File

@ -0,0 +1,38 @@
# Tasks: Backup Scheduling MVP (032)
**Date**: 2026-01-05
**Input**: spec.md, plan.md
## Phase 1: Spec & Setup
- [ ] T001 Create specs/032-backup-scheduling-mvp (spec/plan/tasks + checklist).
## Phase 2: Data Model
- [ ] T002 Add migrations: backup_schedules + backup_schedule_runs (tenant-scoped, indexes, unique slot).
- [ ] T003 Add models + relationships (Tenant->schedules, Schedule->runs, Run->backupSet).
## Phase 3: Scheduling + Dispatch
- [ ] T004 Add command `tenantpilot:schedules:dispatch`.
- [ ] T005 Register scheduler to run every minute.
- [ ] T006 Implement due-calculation (timezone, daily/weekly) + next_run_at computation.
- [ ] T007 Implement idempotent run creation (unique slot) + cache lock.
## Phase 4: Jobs
- [ ] T008 Implement `RunBackupScheduleJob` (sync -> select policy IDs -> create backup set -> update run + schedule).
- [ ] T009 Implement `ApplyBackupScheduleRetentionJob` (keep last N, soft-delete backup sets).
- [ ] T010 Add error mapping to `error_code` (TOKEN_EXPIRED, PERMISSION_MISSING, GRAPH_THROTTLE, UNKNOWN).
## Phase 5: Filament UI
- [ ] T011 Add `BackupScheduleResource` (tenant-scoped): CRUD + enable/disable.
- [ ] T012 Add Runs UI (relation manager or resource) with details + link to BackupSet.
- [ ] T013 Add actions: Run now + Retry (permission-gated); notifications persisted to DB.
## Phase 6: Tests
- [ ] T014 Unit: due-calculation + next_run_at.
- [ ] T015 Feature: dispatcher idempotency (unique slot); lock behavior.
- [ ] T016 Job-level: successful run creates backup set, updates run/schedule (Graph mocked).
- [ ] T017 Job-level: token/permission/throttle errors map to error_code and status.
- [ ] T018 Retention: keeps last N and deletes older backup sets.
## Phase 7: Verification
- [ ] T019 Run targeted tests (Pest).
- [ ] T020 Run Pint (`./vendor/bin/pint --dirty`).