TenantAtlas/.specify/plan.md
2025-12-14 20:23:18 +01:00

88 lines
9.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Implementation Plan: TenantPilot v1
**Branch**: `tenantpilot-v1`
**Date**: 2025-12-12
**Spec Source**: `.specify/spec.md` (scope/restore matrix unchanged)
## Summary
TenantPilot v1 already delivers tenant-scoped Intune inventory, immutable backups, version history with diffs, defensive restore flows, tenant setup, permissions/health, settings normalization/display, and Highlander enforcement. Remaining priority work is the delegated Intune RBAC onboarding wizard (US7) and afterwards the Graph Contract Registry & Drift Guard (US8). All Graph calls stay behind the abstraction with audit logging; snapshots remain JSONB with safety gates (preview-only for high-risk types).
## Status Snapshot (tasks.md is source of truth)
- **Done**: US1 inventory, US2 backups, US3 versions/diffs, US4 restore preview/exec, scope config, soft-deletes/housekeeping, Highlander single current tenant, tenant setup & verify (US6), permissions/health overview (US6), table ActionGroup UX, settings normalization/display (US1b), Dokploy/Sail runbooks.
- **Next up**: **US7** Intune RBAC onboarding wizard (delegated, synchronous Filament flow).
- **Upcoming**: **US8** Graph Contract Registry & Drift Guard (contract registry, type-family handling, verification command, fallback strategies).
## Technical Baseline
- Laravel 12, Filament 4, PHP 8.4; Sail-first with PostgreSQL.
- JSONB for policy/backup/version payloads; FK/time indexes, GIN where needed.
- Graph abstraction with standardized error mapping/retries; no secrets in logs.
- Audit trail across backup/restore/version/tenant/permission/wizard steps; tenant isolation enforced.
- Restore matrix and supported types remain config-driven single sources of truth.
- Safety: preview/dry-run, confirmation gates, warnings for high-risk types; no implicit tenants (Highlander).
## Completed Workstreams (no new action needed)
- **US1 Inventory (Phase 3)**: Filament policy listing with type/category/platform filters; tenant-scoped.
- **US2 Backups (Phase 4)**: Backup sets/items in JSONB, immutable snapshots, audit logging, relation manager UX for attaching policies, soft-delete rules with restore-run guard.
- **US3 Versions/Diffs (Phase 5)**: Version capture, timelines, human+JSON diffs, soft-deletes with audit.
- **US4 Restore (Phase 6)**: Preview, selective execution, conflict warnings, per-type restore level (enabled vs preview-only), PowerShell decode/encode respected, audit of outcomes.
- **US6 Tenant Setup & Highlander (Phases 8 & 12)**: Tenant CRUD/verify, INTUNE_TENANT_ID override, `is_current` unique enforcement, “Make current” action, block deactivated tenants.
- **US6 Permissions/Health (Phase 9)**: Required permissions list, compare/check service, Verify action updates status and audit, permissions panel in Tenant detail.
- **US1b Settings Display (Phase 13)**: PolicyNormalizer + SnapshotValidator, warnings for malformed snapshots, normalized settings and pretty JSON on policy/version detail, list badges, README section.
- **Housekeeping/UX (Phases 1012)**: Soft/force deletes for tenants/backups/versions/restore runs with guards; table actions in ActionGroup per UX guideline.
- **Ops (Phase 7)**: Sail runbook and Dokploy staging→prod guidance captured.
## Execution Plan: US7 Intune RBAC Onboarding Wizard (Phase 14)
- Objectives: deliver delegated, tenant-scoped wizard that safely converges the Intune RBAC state for the configured service principal; fully audited, idempotent, least-privilege by default.
- Scope alignment: FR-023FR-030, constitution (Safety-First, Auditability, Tenant-Aware, Graph Abstraction). No secret/token persistence; delegated tokens stay request-local and are not stored in DB/cache.
- Design decisions:
- Service: `RbacOnboardingService` orchestrates steps using `GraphClientInterface`; reuse `RbacHealthService` for verification; all calls through abstraction with error mapping.
- Data: use existing tenant RBAC columns (`rbac_group_id`, `rbac_group_name`, `rbac_role_assignment_id`, `rbac_role_key`, `rbac_scope_mode`, `rbac_scope_id`, status fields). No new entities; ensure casts + guards.
- Audit: log start, delegated login outcome, group ensure, membership ensure, role assignment ensure/update, verify results. No payload logging; only IDs/status codes.
- Wizard flow (Filament, Tenant detail ActionGroup):
1) Preconditions/config step with review screen: show tenant/app info, required permissions, least-privilege warning; inputs for role (default Policy/Profile Manager; Intune Administrator shows warning), scope (global default; optional group picker), group mode (create default `TenantPilot-Intune-RBAC` vs pick existing security-enabled group). Summarize planned changes before proceeding.
2) Delegated auth step: initiate login; on failure stop with actionable message + audit; do not store token beyond request.
3) Execute (synchronous): resolve service principal by `app_client_id`; on missing SP stop with consent-required hint + audit reason `sp_not_found`; ensure/create security group (validate `securityEnabled=true`); ensure SP membership (idempotent “already exists” OK); ensure/create/patch Intune role assignment for chosen role/scope; persist discovered IDs on tenant for idempotency.
4) Post-verify: force fresh token acquisition; run canary reads (deviceConfigurations, deviceCompliancePolicies, conditionalAccess if enabled); update RBAC/permission health; surface warnings if scope-limited; audit verify result.
5) Summary: show IDs (group, role assignment), role/scope used, verify status, CTA to retry policy sync.
- UX rules: action only for active tenants with `app_client_id`; keep in ActionGroup with Admin consent/Verify; show badge/hint if RBAC missing; warnings on selecting Intune Administrator role; block execution if tenant inactive or missing consent/SP.
- Safety/idempotency: handle “already exists” as success; no self-heal jobs; retry-safe writes; no queue usage to avoid token expiry; timeouts surfaced clearly; no delegated token persistence.
- Tests: happy path, rerun idempotent, SP missing, insufficient privileges, non-security-enabled group failure, scope-limited warning, delegated auth failure path; Filament wizard visibility + summary rendering; health prompts to run wizard when RBAC missing.
- Documentation: add wizard behavior, least-privilege defaults, audit expectations, “no token storage”, and how to rerun safely; note CTA to retry policy sync.
- Operational note: After admin-consent or RBAC changes, force a fresh token acquisition (e.g., clear app token cache) before re-trying sync/backup/restore; Verify should run with a non-stale token. Optional CHECK/REPORT jobs only (no grant) remain out-of-scope for this phase.
- Testing plan (Pest):
- Service unit tests: happy path, rerun idempotent, SP missing, insufficient privileges, scope-limited warning, group exists/not security-enabled failure.
- Filament feature: wizard visibility gating, delegated failure path, successful run shows summary and updates health, warnings rendered.
- Health integration: Verify reflects RBAC status and prompts to run wizard when missing.
- Deployment/ops: no new env vars; ensure migrations for tenant RBAC columns are applied; run targeted tests `php artisan test tests/Unit/RbacOnboardingServiceTest.php tests/Feature/Filament/TenantRbacWizardTest.php`; Pint on touched files.
## Upcoming: US8 Graph Contract Registry & Drift Guard (Phase 15)
- Objectives: centralize Graph contract assumptions per supported type/endpoint and provide drift detection + safe fallbacks so preview/restore remain stable on Graph shape/capability changes.
- Scope alignment: FR-031FR-034 (spec), constitution (Safety-First, Auditability, Graph Abstraction, Tenant-Aware).
- Approach:
- Artifact: `config/graph_contracts.php` (or similar) with per-type contract data:
- resource paths (collection + single item)
- allowed `$select` / allowed `$expand`
- **type families / allowed `@odata.type` values**
- create/update methods, id field
- hydration strategy (member expansion vs follow-up fetch vs unavailable)
- Service: registry + checker; integrate with Graph client to enforce allowed capabilities and downgrade on capability errors (retry without expands/selects), recording warnings/audit entries.
- Type families: treat derived `@odata.type` values **within a declared family** as compatible (no `odata_mismatch`) for routing preview/restore.
- Verification: `php artisan graph:contract:check` (staging/CI) to probe endpoints and surface actionable diffs when Graph changes; opt-in/guarded for prod.
- Docs: explain registry format and update process when Graph changes.
- Testing outline: unit for registry lookups/type-family matching/fallback selection; integration/Pest to simulate capability errors and ensure downgrade path + correct routing for derived types.
## Testing & Quality Gates
- Continue using targeted Pest runs per change set; add/extend tests for US7 wizard now, and for US8 contracts when implemented.
- Run Pint on touched files before finalizing.
- Maintain tenant isolation, audit logging, and restore safety gates; validate snapshot shape and type-family compatibility prior to restore execution.
### Restore Safety Gate
- Restore execution MUST be blocked if a snapshots `@odata.type` is **outside** the declared **type family** for the target policy type (prevent cross-type/platform restores).
- Restore preview MAY still render details + warnings for out-of-family snapshots, but MUST NOT offer an apply action.
## Coordination
- Update `.specify/tasks.md` to reflect progress on US7 wizard and future US8 contract tasks; no new entities or scope changes introduced here.
- Stage validation required before production for any migration or restore-impacting change.
- Keep Graph integration behind abstraction; no secrets in logs; follow existing UX patterns (ActionGroup, warnings for risky ops).