TenantAtlas/specs/032-backup-scheduling-mvp/plan.md
2026-01-05 00:50:23 +01:00

2.9 KiB

Plan: Backup Scheduling MVP (032)

Date: 2026-01-05 Input: spec.md

Architecture / Reuse

  • Reuse existing services:
    • PolicySyncService::syncPoliciesWithReport() for selected policy types
    • BackupService::createBackupSet() to create immutable snapshots + items (include_foundations supported)
  • Store selection as policy_types (config keys), not free-form categories.
  • Use tenant scoping (tenant_id) consistent with existing tables (backup_sets, backup_items).

Scheduling Mechanism

  • Add Artisan command: tenantpilot:schedules:dispatch.
  • Scheduler integration (Laravel 12): schedule the command every minute via routes/console.php + ops configuration (Dokploy cron schedule:run or long-running schedule:work).
  • Dispatcher algorithm:
    1. load enabled schedules
    2. compute whether due for the current minute in schedule timezone
    3. create run with scheduled_for slot (minute precision) using DB unique constraint
    4. dispatch RunBackupScheduleJob(schedule_id, run_id)
  • Concurrency:
    • Cache lock per schedule (lock:backup_schedule:{id}) plus DB unique slot constraint for idempotency.
    • If lock is held: mark run as skipped with a clear error_code (no parallel execution).

Run Execution

  • RunBackupScheduleJob:
    1. load schedule + tenant
    2. preflight: tenant active; Graph/auth errors mapped to error_code
    3. sync policies for selected types (collect report)
    4. select policy IDs from local DB for those types (exclude ignored)
    5. create backup set:
      • name: {schedule_name} - {Y-m-d H:i}
      • includeFoundations: schedule flag
    6. set run status:
      • success if backup_set.status == completed
      • partial if backup_set.status == partial OR sync had failures but backup succeeded
      • failed if nothing backed up / hard error
    7. update schedule last_run_* and compute/persist next_run_at
    8. dispatch retention job
    9. audit logs:
    • log run start + completion (status, counts, error_code; no secrets)

Retry / Backoff

  • Configure job retry behavior based on error classification:
    • Throttling/transient (e.g. 429/503): backoff + retry
    • Auth/permission (401/403): no retry
    • Unknown: limited retries

Retention

  • ApplyBackupScheduleRetentionJob(schedule_id):
    • identify runs ordered newest→oldest
    • keep last N runs that created a backup_set_id
    • for older ones: soft-delete referenced BackupSets (and cascade soft-delete items)
    • audit log: number of deleted BackupSets

Filament UX

  • Tenant-scoped resources:
    • BackupScheduleResource
    • Runs UI via RelationManager under schedule (or a dedicated resource if needed)
  • Actions: enable/disable, run now, retry
  • Notifications: persist via ->sendToDatabase($user) for the DB info panel.
    • MVP notification scope: only interactive actions notify the acting user; scheduled runs rely on Run history.

Ops / Deployment Notes

  • Requires queue worker.
  • Requires scheduler running.
  • Missed runs policy (MVP): no catch-up.