Research: Backup Scheduling MVP (032)

Date: 2026-01-05

This document resolves technical decisions and clarifies implementation approach for Feature 032.

Decisions

Decision: Use App\Services\Intune\PolicySyncService::syncPoliciesWithReport(Tenant $tenant, ?array $supportedTypes = null): array and App\Services\Intune\BackupService::createBackupSet(...).
Rationale: These are already tenant-aware, use GraphClientInterface behind the scenes (via PolicySyncService), and BackupService already writes a backup.created audit log entry.
Alternatives considered:
- Implement new Graph calls directly in the scheduler job → rejected (violates Graph abstraction gate; duplicates logic).

Decision:
- Persist backup_schedules.policy_types as array<string> of type keys present in config('tenantpilot.supported_policy_types').
- Hard validation at save-time: unknown keys are rejected.
- Runtime defensive check (legacy/DB): unknown keys are skipped.
  - If ≥1 valid type remains → run becomes partial and error_code=UNKNOWN_POLICY_TYPE.
  - If 0 valid types remain → run becomes skipped and error_code=UNKNOWN_POLICY_TYPE (no BackupSet created).
Rationale: Prevent silent misconfiguration and enforce fail-safe behavior at entry points, while still handling legacy data safely.
Alternatives considered:
- Save unknown keys and ignore silently → rejected (silent misconfiguration).
- Fail the run for any unknown type → rejected (too brittle for legacy).

Decision: Do not hardcode Graph endpoints. All Graph access happens via GraphClientInterface (through PolicySyncService and BackupService).
Rationale: Matches constitution requirements and existing code paths.
Alternatives considered:
- Calling deviceManagement/{type} directly → rejected (explicitly forbidden by constitution; also unsafe for unknown types).

Decision: Add an Artisan command tenantpilot:schedules:dispatch and register it with Laravel scheduler to run every minute.
Rationale: Fits Laravel 12 structure (no Kernel), supports Dokploy operation models (schedule:run cron or schedule:work).
Alternatives considered:
- Long-running daemon polling DB directly → rejected (less idiomatic; harder ops).

Decision:
- scheduled_for is minute-slot based and stored in UTC.
- Due calculation uses the schedule timezone.
- DST (MVP): invalid local time → skip; ambiguous local time → first occurrence.
Rationale: Predictable and testable; avoids “surprise catch-up”.
Alternatives considered:
- Catch-up missed slots → rejected by spec (MVP explicitly “no catch-up”).

Decision:
- DB unique constraint: (backup_schedule_id, scheduled_for).
- Cache lock per schedule (lock:backup_schedule:{id}) to prevent parallel execution.
- If lock held, do not run in parallel: mark run skipped with a clear error_code.
Rationale: Prevents double runs and provides deterministic behavior.
Alternatives considered:
- Only cache lock (no DB constraint) → rejected (less robust under crashes/restarts).

Decision:
- Transient/throttling failures (e.g. 429/503) → retries with backoff.
- Auth/permission failures (401/403) → no retry.
- Unknown failures → limited retries, then fail.
Rationale: Avoid noisy retry loops for non-recoverable errors.

Decision: Use App\Services\Intune\AuditLogger for:
- dispatch cycle (optional aggregated)
- run start + completion
- retention applied (count deletions)
Rationale: Constitution requires audit log for every operation; existing BackupService already writes backup.created.

Decision: Only interactive actions (Run now / Retry) notify the acting user (database notifications). Scheduled runs rely on Run history.
Rationale: Avoid undefined “who gets notified” without adding new ownership fields.

None blocking Phase 1 design.