TenantAtlas/specs/032-backup-scheduling-mvp/research.md
ahmido 4d3fcd28a9 feat/032-backup-scheduling-mvp (#34)
What
Implements tenant-scoped backup scheduling end-to-end: schedules CRUD, minute-based dispatch, queued execution, run history, manual “Run now/Retry”, retention (keep last N), and auditability.

Key changes

Filament UI: Backup Schedules resource with tenant scoping + SEC-002 role gating.
Scheduler + queue: tenantpilot:schedules:dispatch command wired in scheduler (runs every minute), creates idempotent BackupScheduleRun records and dispatches jobs.
Execution: RunBackupScheduleJob syncs policies, creates immutable backup sets, updates run status, writes audit logs, applies retry/backoff mapping, and triggers retention.
Run history: Relation manager + “View” modal rendering run details.
UX polish: row actions grouped; bulk actions grouped (run now / retry / delete). Bulk dispatch writes DB notifications (shows in notifications panel).
Validation: policy type hard-validation on save; unknown policy types handled safely at runtime (skipped/partial).
Tests: comprehensive Pest coverage for CRUD/scoping/validation, idempotency, job outcomes, error mapping, retention, view modal, run-now/retry notifications, bulk delete (incl. operator forbidden).
Files / Areas

Filament: BackupScheduleResource.php and app/Filament/Resources/BackupScheduleResource/*
Scheduling/Jobs: app/Console/Commands/TenantpilotDispatchBackupSchedules.php, app/Jobs/RunBackupScheduleJob.php, app/Jobs/ApplyBackupScheduleRetentionJob.php, console.php
Models/Migrations: app/Models/BackupSchedule.php, app/Models/BackupScheduleRun.php, database/migrations/backup_schedules, backup_schedule_runs
Notifications: BackupScheduleRunDispatchedNotification.php
Specs: specs/032-backup-scheduling-mvp/* (tasks/checklist/quickstart updates)
How to test (Sail)

Run tests: ./vendor/bin/sail artisan test tests/Feature/BackupScheduling
Run formatter: ./vendor/bin/sail php ./vendor/bin/pint --dirty
Apply migrations: ./vendor/bin/sail artisan migrate
Manual dispatch: ./vendor/bin/sail artisan tenantpilot:schedules:dispatch
Notes

Uses DB notifications for queued UI actions to ensure they appear in the notifications panel even under queue fakes in tests.
Checklist gate for 032 is PASS; tasks updated accordingly.

Co-authored-by: Ahmed Darrazi <ahmeddarrazi@adsmac.local>
Reviewed-on: #34
2026-01-05 04:22:13 +00:00

78 lines
4.2 KiB
Markdown

# Research: Backup Scheduling MVP (032)
**Date**: 2026-01-05
This document resolves technical decisions and clarifies implementation approach for Feature 032.
## Decisions
### 1) Reuse existing sync + backup services
- **Decision**: Use `App\Services\Intune\PolicySyncService::syncPoliciesWithReport(Tenant $tenant, ?array $supportedTypes = null): array` and `App\Services\Intune\BackupService::createBackupSet(...)`.
- **Rationale**: These are already tenant-aware, use `GraphClientInterface` behind the scenes (via `PolicySyncService`), and `BackupService` already writes a `backup.created` audit log entry.
- **Alternatives considered**:
- Implement new Graph calls directly in the scheduler job → rejected (violates Graph abstraction gate; duplicates logic).
### 2) Policy type source of truth + validation
- **Decision**:
- Persist `backup_schedules.policy_types` as `array<string>` of **type keys** present in `config('tenantpilot.supported_policy_types')`.
- **Hard validation at save-time**: unknown keys are rejected.
- **Runtime defensive check** (legacy/DB): unknown keys are skipped.
- If ≥1 valid type remains → run becomes `partial` and `error_code=UNKNOWN_POLICY_TYPE`.
- If 0 valid types remain → run becomes `skipped` and `error_code=UNKNOWN_POLICY_TYPE` (no `BackupSet` created).
- **Rationale**: Prevent silent misconfiguration and enforce fail-safe behavior at entry points, while still handling legacy data safely.
- **Alternatives considered**:
- Save unknown keys and ignore silently → rejected (silent misconfiguration).
- Fail the run for any unknown type → rejected (too brittle for legacy).
### 3) Graph calls and contracts
- **Decision**: Do not hardcode Graph endpoints. All Graph access happens via `GraphClientInterface` (through `PolicySyncService` and `BackupService`).
- **Rationale**: Matches constitution requirements and existing code paths.
- **Alternatives considered**:
- Calling `deviceManagement/{type}` directly → rejected (explicitly forbidden by constitution; also unsafe for unknown types).
### 4) Scheduling mechanism
- **Decision**: Add an Artisan command `tenantpilot:schedules:dispatch` and register it with Laravel scheduler to run every minute.
- **Rationale**: Fits Laravel 12 structure (no Kernel), supports Dokploy operation models (`schedule:run` cron or `schedule:work`).
- **Alternatives considered**:
- Long-running daemon polling DB directly → rejected (less idiomatic; harder ops).
### 5) Due calculation + time semantics
- **Decision**:
- `scheduled_for` is minute-slot based and stored in UTC.
- Due calculation uses the schedule timezone.
- DST (MVP): invalid local time → skip; ambiguous local time → first occurrence.
- **Rationale**: Predictable and testable; avoids “surprise catch-up”.
- **Alternatives considered**:
- Catch-up missed slots → rejected by spec (MVP explicitly “no catch-up”).
### 6) Idempotency + concurrency
- **Decision**:
- DB unique constraint: `(backup_schedule_id, scheduled_for)`.
- Cache lock per schedule (`lock:backup_schedule:{id}`) to prevent parallel execution.
- If lock held, do not run in parallel: mark run `skipped` with a clear error_code.
- **Rationale**: Prevents double runs and provides deterministic behavior.
- **Alternatives considered**:
- Only cache lock (no DB constraint) → rejected (less robust under crashes/restarts).
### 7) Retry/backoff policy
- **Decision**:
- Transient/throttling failures (e.g. 429/503) → retries with backoff.
- Auth/permission failures (401/403) → no retry.
- Unknown failures → limited retries, then fail.
- **Rationale**: Avoid noisy retry loops for non-recoverable errors.
### 8) Audit logging
- **Decision**: Use `App\Services\Intune\AuditLogger` for:
- dispatch cycle (optional aggregated)
- run start + completion
- retention applied (count deletions)
- **Rationale**: Constitution requires audit log for every operation; existing `BackupService` already writes `backup.created`.
### 9) Notifications
- **Decision**: Only interactive actions (Run now / Retry) notify the acting user (database notifications). Scheduled runs rely on Run history.
- **Rationale**: Avoid undefined “who gets notified” without adding new ownership fields.
## Open Items
None blocking Phase 1 design.