TenantAtlas/specs/032-backup-scheduling-mvp/research.md
ahmido 4d3fcd28a9 feat/032-backup-scheduling-mvp (#34)
What
Implements tenant-scoped backup scheduling end-to-end: schedules CRUD, minute-based dispatch, queued execution, run history, manual “Run now/Retry”, retention (keep last N), and auditability.

Key changes

Filament UI: Backup Schedules resource with tenant scoping + SEC-002 role gating.
Scheduler + queue: tenantpilot:schedules:dispatch command wired in scheduler (runs every minute), creates idempotent BackupScheduleRun records and dispatches jobs.
Execution: RunBackupScheduleJob syncs policies, creates immutable backup sets, updates run status, writes audit logs, applies retry/backoff mapping, and triggers retention.
Run history: Relation manager + “View” modal rendering run details.
UX polish: row actions grouped; bulk actions grouped (run now / retry / delete). Bulk dispatch writes DB notifications (shows in notifications panel).
Validation: policy type hard-validation on save; unknown policy types handled safely at runtime (skipped/partial).
Tests: comprehensive Pest coverage for CRUD/scoping/validation, idempotency, job outcomes, error mapping, retention, view modal, run-now/retry notifications, bulk delete (incl. operator forbidden).
Files / Areas

Filament: BackupScheduleResource.php and app/Filament/Resources/BackupScheduleResource/*
Scheduling/Jobs: app/Console/Commands/TenantpilotDispatchBackupSchedules.php, app/Jobs/RunBackupScheduleJob.php, app/Jobs/ApplyBackupScheduleRetentionJob.php, console.php
Models/Migrations: app/Models/BackupSchedule.php, app/Models/BackupScheduleRun.php, database/migrations/backup_schedules, backup_schedule_runs
Notifications: BackupScheduleRunDispatchedNotification.php
Specs: specs/032-backup-scheduling-mvp/* (tasks/checklist/quickstart updates)
How to test (Sail)

Run tests: ./vendor/bin/sail artisan test tests/Feature/BackupScheduling
Run formatter: ./vendor/bin/sail php ./vendor/bin/pint --dirty
Apply migrations: ./vendor/bin/sail artisan migrate
Manual dispatch: ./vendor/bin/sail artisan tenantpilot:schedules:dispatch
Notes

Uses DB notifications for queued UI actions to ensure they appear in the notifications panel even under queue fakes in tests.
Checklist gate for 032 is PASS; tasks updated accordingly.

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #34
2026-01-05 04:22:13 +00:00

4.2 KiB

Research: Backup Scheduling MVP (032)

Date: 2026-01-05

This document resolves technical decisions and clarifies implementation approach for Feature 032.

Decisions

1) Reuse existing sync + backup services

  • Decision: Use App\Services\Intune\PolicySyncService::syncPoliciesWithReport(Tenant $tenant, ?array $supportedTypes = null): array and App\Services\Intune\BackupService::createBackupSet(...).
  • Rationale: These are already tenant-aware, use GraphClientInterface behind the scenes (via PolicySyncService), and BackupService already writes a backup.created audit log entry.
  • Alternatives considered:
    • Implement new Graph calls directly in the scheduler job → rejected (violates Graph abstraction gate; duplicates logic).

2) Policy type source of truth + validation

  • Decision:
    • Persist backup_schedules.policy_types as array<string> of type keys present in config('tenantpilot.supported_policy_types').
    • Hard validation at save-time: unknown keys are rejected.
    • Runtime defensive check (legacy/DB): unknown keys are skipped.
      • If ≥1 valid type remains → run becomes partial and error_code=UNKNOWN_POLICY_TYPE.
      • If 0 valid types remain → run becomes skipped and error_code=UNKNOWN_POLICY_TYPE (no BackupSet created).
  • Rationale: Prevent silent misconfiguration and enforce fail-safe behavior at entry points, while still handling legacy data safely.
  • Alternatives considered:
    • Save unknown keys and ignore silently → rejected (silent misconfiguration).
    • Fail the run for any unknown type → rejected (too brittle for legacy).

3) Graph calls and contracts

  • Decision: Do not hardcode Graph endpoints. All Graph access happens via GraphClientInterface (through PolicySyncService and BackupService).
  • Rationale: Matches constitution requirements and existing code paths.
  • Alternatives considered:
    • Calling deviceManagement/{type} directly → rejected (explicitly forbidden by constitution; also unsafe for unknown types).

4) Scheduling mechanism

  • Decision: Add an Artisan command tenantpilot:schedules:dispatch and register it with Laravel scheduler to run every minute.
  • Rationale: Fits Laravel 12 structure (no Kernel), supports Dokploy operation models (schedule:run cron or schedule:work).
  • Alternatives considered:
    • Long-running daemon polling DB directly → rejected (less idiomatic; harder ops).

5) Due calculation + time semantics

  • Decision:
    • scheduled_for is minute-slot based and stored in UTC.
    • Due calculation uses the schedule timezone.
    • DST (MVP): invalid local time → skip; ambiguous local time → first occurrence.
  • Rationale: Predictable and testable; avoids “surprise catch-up”.
  • Alternatives considered:
    • Catch-up missed slots → rejected by spec (MVP explicitly “no catch-up”).

6) Idempotency + concurrency

  • Decision:
    • DB unique constraint: (backup_schedule_id, scheduled_for).
    • Cache lock per schedule (lock:backup_schedule:{id}) to prevent parallel execution.
    • If lock held, do not run in parallel: mark run skipped with a clear error_code.
  • Rationale: Prevents double runs and provides deterministic behavior.
  • Alternatives considered:
    • Only cache lock (no DB constraint) → rejected (less robust under crashes/restarts).

7) Retry/backoff policy

  • Decision:
    • Transient/throttling failures (e.g. 429/503) → retries with backoff.
    • Auth/permission failures (401/403) → no retry.
    • Unknown failures → limited retries, then fail.
  • Rationale: Avoid noisy retry loops for non-recoverable errors.

8) Audit logging

  • Decision: Use App\Services\Intune\AuditLogger for:
    • dispatch cycle (optional aggregated)
    • run start + completion
    • retention applied (count deletions)
  • Rationale: Constitution requires audit log for every operation; existing BackupService already writes backup.created.

9) Notifications

  • Decision: Only interactive actions (Run now / Retry) notify the acting user (database notifications). Scheduled runs rely on Run history.
  • Rationale: Avoid undefined “who gets notified” without adding new ownership fields.

Open Items

None blocking Phase 1 design.