TenantAtlas/specs/032-backup-scheduling-mvp/research.md
2026-01-05 01:11:59 +01:00

78 lines
4.2 KiB
Markdown

# Research: Backup Scheduling MVP (032)
**Date**: 2026-01-05
This document resolves technical decisions and clarifies implementation approach for Feature 032.
## Decisions
### 1) Reuse existing sync + backup services
- **Decision**: Use `App\Services\Intune\PolicySyncService::syncPoliciesWithReport(Tenant $tenant, ?array $supportedTypes = null): array` and `App\Services\Intune\BackupService::createBackupSet(...)`.
- **Rationale**: These are already tenant-aware, use `GraphClientInterface` behind the scenes (via `PolicySyncService`), and `BackupService` already writes a `backup.created` audit log entry.
- **Alternatives considered**:
- Implement new Graph calls directly in the scheduler job → rejected (violates Graph abstraction gate; duplicates logic).
### 2) Policy type source of truth + validation
- **Decision**:
- Persist `backup_schedules.policy_types` as `array<string>` of **type keys** present in `config('tenantpilot.supported_policy_types')`.
- **Hard validation at save-time**: unknown keys are rejected.
- **Runtime defensive check** (legacy/DB): unknown keys are skipped.
- If ≥1 valid type remains → run becomes `partial` and `error_code=UNKNOWN_POLICY_TYPE`.
- If 0 valid types remain → run becomes `skipped` and `error_code=UNKNOWN_POLICY_TYPE` (no `BackupSet` created).
- **Rationale**: Prevent silent misconfiguration and enforce fail-safe behavior at entry points, while still handling legacy data safely.
- **Alternatives considered**:
- Save unknown keys and ignore silently → rejected (silent misconfiguration).
- Fail the run for any unknown type → rejected (too brittle for legacy).
### 3) Graph calls and contracts
- **Decision**: Do not hardcode Graph endpoints. All Graph access happens via `GraphClientInterface` (through `PolicySyncService` and `BackupService`).
- **Rationale**: Matches constitution requirements and existing code paths.
- **Alternatives considered**:
- Calling `deviceManagement/{type}` directly → rejected (explicitly forbidden by constitution; also unsafe for unknown types).
### 4) Scheduling mechanism
- **Decision**: Add an Artisan command `tenantpilot:schedules:dispatch` and register it with Laravel scheduler to run every minute.
- **Rationale**: Fits Laravel 12 structure (no Kernel), supports Dokploy operation models (`schedule:run` cron or `schedule:work`).
- **Alternatives considered**:
- Long-running daemon polling DB directly → rejected (less idiomatic; harder ops).
### 5) Due calculation + time semantics
- **Decision**:
- `scheduled_for` is minute-slot based and stored in UTC.
- Due calculation uses the schedule timezone.
- DST (MVP): invalid local time → skip; ambiguous local time → first occurrence.
- **Rationale**: Predictable and testable; avoids “surprise catch-up”.
- **Alternatives considered**:
- Catch-up missed slots → rejected by spec (MVP explicitly “no catch-up”).
### 6) Idempotency + concurrency
- **Decision**:
- DB unique constraint: `(backup_schedule_id, scheduled_for)`.
- Cache lock per schedule (`lock:backup_schedule:{id}`) to prevent parallel execution.
- If lock held, do not run in parallel: mark run `skipped` with a clear error_code.
- **Rationale**: Prevents double runs and provides deterministic behavior.
- **Alternatives considered**:
- Only cache lock (no DB constraint) → rejected (less robust under crashes/restarts).
### 7) Retry/backoff policy
- **Decision**:
- Transient/throttling failures (e.g. 429/503) → retries with backoff.
- Auth/permission failures (401/403) → no retry.
- Unknown failures → limited retries, then fail.
- **Rationale**: Avoid noisy retry loops for non-recoverable errors.
### 8) Audit logging
- **Decision**: Use `App\Services\Intune\AuditLogger` for:
- dispatch cycle (optional aggregated)
- run start + completion
- retention applied (count deletions)
- **Rationale**: Constitution requires audit log for every operation; existing `BackupService` already writes `backup.created`.
### 9) Notifications
- **Decision**: Only interactive actions (Run now / Retry) notify the acting user (database notifications). Scheduled runs rely on Run history.
- **Rationale**: Avoid undefined “who gets notified” without adding new ownership fields.
## Open Items
None blocking Phase 1 design.