TenantAtlas/specs/032-backup-scheduling-mvp/research.md
2026-01-05 01:11:59 +01:00

4.2 KiB

Research: Backup Scheduling MVP (032)

Date: 2026-01-05

This document resolves technical decisions and clarifies implementation approach for Feature 032.

Decisions

1) Reuse existing sync + backup services

  • Decision: Use App\Services\Intune\PolicySyncService::syncPoliciesWithReport(Tenant $tenant, ?array $supportedTypes = null): array and App\Services\Intune\BackupService::createBackupSet(...).
  • Rationale: These are already tenant-aware, use GraphClientInterface behind the scenes (via PolicySyncService), and BackupService already writes a backup.created audit log entry.
  • Alternatives considered:
    • Implement new Graph calls directly in the scheduler job → rejected (violates Graph abstraction gate; duplicates logic).

2) Policy type source of truth + validation

  • Decision:
    • Persist backup_schedules.policy_types as array<string> of type keys present in config('tenantpilot.supported_policy_types').
    • Hard validation at save-time: unknown keys are rejected.
    • Runtime defensive check (legacy/DB): unknown keys are skipped.
      • If ≥1 valid type remains → run becomes partial and error_code=UNKNOWN_POLICY_TYPE.
      • If 0 valid types remain → run becomes skipped and error_code=UNKNOWN_POLICY_TYPE (no BackupSet created).
  • Rationale: Prevent silent misconfiguration and enforce fail-safe behavior at entry points, while still handling legacy data safely.
  • Alternatives considered:
    • Save unknown keys and ignore silently → rejected (silent misconfiguration).
    • Fail the run for any unknown type → rejected (too brittle for legacy).

3) Graph calls and contracts

  • Decision: Do not hardcode Graph endpoints. All Graph access happens via GraphClientInterface (through PolicySyncService and BackupService).
  • Rationale: Matches constitution requirements and existing code paths.
  • Alternatives considered:
    • Calling deviceManagement/{type} directly → rejected (explicitly forbidden by constitution; also unsafe for unknown types).

4) Scheduling mechanism

  • Decision: Add an Artisan command tenantpilot:schedules:dispatch and register it with Laravel scheduler to run every minute.
  • Rationale: Fits Laravel 12 structure (no Kernel), supports Dokploy operation models (schedule:run cron or schedule:work).
  • Alternatives considered:
    • Long-running daemon polling DB directly → rejected (less idiomatic; harder ops).

5) Due calculation + time semantics

  • Decision:
    • scheduled_for is minute-slot based and stored in UTC.
    • Due calculation uses the schedule timezone.
    • DST (MVP): invalid local time → skip; ambiguous local time → first occurrence.
  • Rationale: Predictable and testable; avoids “surprise catch-up”.
  • Alternatives considered:
    • Catch-up missed slots → rejected by spec (MVP explicitly “no catch-up”).

6) Idempotency + concurrency

  • Decision:
    • DB unique constraint: (backup_schedule_id, scheduled_for).
    • Cache lock per schedule (lock:backup_schedule:{id}) to prevent parallel execution.
    • If lock held, do not run in parallel: mark run skipped with a clear error_code.
  • Rationale: Prevents double runs and provides deterministic behavior.
  • Alternatives considered:
    • Only cache lock (no DB constraint) → rejected (less robust under crashes/restarts).

7) Retry/backoff policy

  • Decision:
    • Transient/throttling failures (e.g. 429/503) → retries with backoff.
    • Auth/permission failures (401/403) → no retry.
    • Unknown failures → limited retries, then fail.
  • Rationale: Avoid noisy retry loops for non-recoverable errors.

8) Audit logging

  • Decision: Use App\Services\Intune\AuditLogger for:
    • dispatch cycle (optional aggregated)
    • run start + completion
    • retention applied (count deletions)
  • Rationale: Constitution requires audit log for every operation; existing BackupService already writes backup.created.

9) Notifications

  • Decision: Only interactive actions (Run now / Retry) notify the acting user (database notifications). Scheduled runs rely on Run history.
  • Rationale: Avoid undefined “who gets notified” without adding new ownership fields.

Open Items

None blocking Phase 1 design.