# Phase 0 — Research (Spec 114: System Console Control Tower) ## Goal Deliver a platform-operator “/system” control plane that is **strictly separated** from “/admin”, is **metadata-only by default**, and provides fast routing into canonical `OperationRun` detail. ## Existing primitives (reuse) ### System panel + plane separation - `app/Providers/Filament/SystemPanelProvider.php` - Panel: `id=system`, `path=system`, `authGuard('platform')` - Uses `UseSystemSessionCookie` to isolate sessions from `/admin` - Uses middleware `ensure-correct-guard:platform` and capability gate `ensure-platform-capability:` - `app/Http/Middleware/UseSystemSessionCookie.php` - Implements Spec 114 clarification: separate session cookie name for `/system` ### Authorization semantics (404 vs 403) - Existing tests already enforce the clarified behavior: - Non-platform (wrong guard) → 404 (deny-as-not-found) - Platform user missing capability → 403 ### Operation runs (Monitoring source of truth) - `app/Models/OperationRun.php` + migrations under `database/migrations/*operation_runs*` - `workspace_id` is required; `tenant_id` is nullable (supports tenantless runs) - `failure_summary`, `summary_counts`, `context` are JSON arrays and already used in UI - `app/Services/OperationRunService.php` - Canonical lifecycle transitions, summary-count normalization, failure sanitization - Has stale queued run helper (`isStaleQueuedRun()` + `failStaleQueuedRun()`) - Canonical System run links: - `app/Support/System/SystemOperationRunLinks.php` (index + view) ### Sanitization / data minimization - Failures: `app/Support/OpsUx/RunFailureSanitizer.php` (reason normalization + message redaction) - Audit metadata: `app/Support/Audit/AuditContextSanitizer.php` (redacts token/secret/password-like keys + bearer/JWT strings) ### Access logs signal source - `app/Models/AuditLog.php` - System login auditing: - `app/Filament/System/Pages/Auth/Login.php` writes `AuditLog` events with action `platform.auth.login` - Break-glass auditing: - `app/Services/Auth/BreakGlassSession.php` writes `platform.break_glass.enter|exit|expired` ## Key gaps to implement (Spec 114) ### Navigation/IA - Add System pages: - `/system/directory/workspaces` (+ detail) - `/system/directory/tenants` (+ detail) - `/system/ops/runs` (global) + canonical detail already exists but is currently *runbook-type scoped* - `/system/ops/failures` (prefilter) - `/system/ops/stuck` (prefilter) - `/system/security/access-logs` ### RBAC (platform capabilities) - `app/Support/Auth/PlatformCapabilities.php` currently contains only Ops/runbooks/break-glass/core panel access. - Spec 114 introduces additional capabilities (e.g. `platform.console.view`, `platform.directory.view`, `platform.operations.manage`). Decision: - Extend `PlatformCapabilities` registry with Spec 114 capabilities and update system pages to gate via the registry constants (no raw strings). ### Stuck definition - There is a helper for “stale queued” in `OperationRunService`, but no “running too long” classification. Decision: - Introduce configurable stuck thresholds for `queued` and `running` (minutes) under a single config namespace (e.g. `config/tenantpilot.php`), and implement stuck classification in a dedicated helper/service used by the System pages. ### Control Tower aggregation - Spec 114 requires KPIs + top offenders in a selectable time window. Decision: - Use DB-only aggregation on `operation_runs` for the selected time window: - KPIs: counts by outcome/status, and “failed/stuck” counts - Top offenders: group by tenant/workspace for failed runs - Default time window: 24h; supported: 1h/24h/7d ## Non-functional decisions (resolving “NEEDS CLARIFICATION”) ### Technical context (resolved) - Language/runtime: PHP 8.4 (Laravel 12) - Admin framework: Filament v5 + Livewire v4 - Storage: PostgreSQL (Sail locally) - Testing: Pest v4 - Target: web app (server-rendered Livewire/Filament) ### Performance goals (assumptions, but explicit) - System list pages are DB-only at render time; no external calls. - Target: p95 < 1.0s for index pages at typical production volumes, using: - time-window defaults (24h) - pagination - indexes for `operation_runs(status,outcome,created_at,type,workspace_id,tenant_id)` and `audit_logs(action,recorded_at,actor_id)` ### Data minimization - Default run detail surfaces only sanitized `failure_summary` + normalized `summary_counts`. - `context` rendering remains sanitized/limited (avoid raw payload dumps by default). ## Alternatives considered - New “SystemOperationRun” table: rejected; existing `OperationRun` is already the canonical monitoring artifact. - Building Access Logs from web server logs: rejected; `AuditLog` already exists, is sanitized, and includes platform-auth events.