TenantAtlas/docs/performance-guidelines.md
ahmido bf43dad3d1 fix: enforce workspace surface scope for customer review workspace (#366)
## Summary
- keep `/admin/reviews/workspace` workspace-scoped in shell and sidebar context
- treat `tenant` query hints on the customer review workspace as page-level filters only
- update the customer review workspace tests and Spec 311 navigation contract to match the workspace-hub IA

## Testing
- `cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Reviews/CustomerReviewWorkspacePageTest.php`
- `cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament/WorkspaceContextTopbarAndTenantSelectionTest.php tests/Feature/Filament/PanelNavigationSegregationTest.php`
- `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`
- `git diff --check`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #366
2026-05-15 20:52:37 +00:00

102 lines
4.0 KiB
Markdown

# TenantPilot Performance Guidelines
Status: 2026-05-15
Applies to: Laravel 12, Filament 5, Livewire 4, PostgreSQL 16, Microsoft Graph.
## Performance Target
TenantPilot should keep interactive admin requests short and move remote, large, retryable, or long-running work into queued operations with visible `OperationRun` state.
## Current Performance Risks
| Risk | Evidence | Priority | Mitigation |
|---|---|---:|---|
| Queryable payloads still in `json` | policy versions, backup items, restore runs, audit logs | P1 | Convert to JSONB where queried; add targeted GIN/expression indexes. |
| Large Filament pages/resources | 1,000-5,700 LOC classes | P1 | Extract tables/actions and review N+1 risks per surface. |
| Database queue for all work | `.env.example` and queue config | P2 | Move high-volume Graph/restore work to Redis queue when load grows. |
| Dashboard/widget query cost | multiple KPI/list widgets | P2 | Cache or precompute expensive aggregate metrics. |
| Graph throttling | Microsoft Graph 429/503 behavior | P1 | Honor `Retry-After`, use exponential backoff with jitter, avoid polling. |
## Synchronous vs Asynchronous
Keep synchronous:
- Rendering Filament pages.
- Validating form/action input.
- Creating operation intent records.
- Small DB-only state transitions.
- Showing preview summaries from already persisted data.
Move asynchronous:
- Microsoft Graph reads/writes.
- Backup set item capture.
- Restore execution.
- Bulk export/import.
- Compliance/evidence snapshots.
- Long report generation.
- Notification delivery retries.
- Any workflow likely to exceed 2-5 seconds.
## Filament Table Rules
- Always define a default sort.
- Eager-load relationships used by visible columns.
- Use `withCount()`/aggregate subqueries instead of per-row counts.
- Hide technical columns by default.
- Use session persistence only on investigative resources.
- Avoid computed columns that perform per-row service calls.
- Avoid Graph calls during table render.
## Database Rules
- Prefer `jsonb` for raw Graph snapshots, backup payloads, restore previews/results, evidence summaries, and audit metadata that must be queried.
- Add GIN indexes only when a query path exists; prefer expression indexes for common JSON paths.
- Add composite indexes for workspace/tenant/time/status list filters.
- Add partial unique indexes for active run/idempotency constraints.
- Keep migrations incremental and reversible where practical.
## Queue Strategy
MVP:
- Database queue is acceptable for local and low-volume staging.
- Jobs must be idempotent and observable.
- Worker timeout must be lower than `retry_after`.
Scale-up:
- Move production queues to Redis.
- Split queues: `high`, `default`, `graph`, `restore`, `reports`, `notifications`.
- Run separate worker counts per queue.
- Use process supervision in Dokploy/container runtime.
- Restart/reload workers on every deploy.
## Caching Strategy
- Cache stable config-derived capability maps.
- Cache dashboard aggregates only when invalidation is clear.
- Do not cache tenant authorization decisions across membership changes unless invalidation is proven.
- Avoid caching raw Graph secrets or token payloads.
- Use Redis for locks and cache in production when queue/scheduler scale increases.
## Monitoring Metrics
- HTTP p50/p95/p99 response time by route/panel.
- Livewire request duration and error rate.
- DB query count and slow queries by page/action.
- Queue depth, job latency, failures, retries, max runtime.
- Scheduler last-success timestamp per scheduled command.
- Graph 429/503 count, retry-after seconds, retry exhaustion.
- OperationRun created/running/failed/partial counts.
- Audit log write failures.
- Backup/restore duration and item failure rate.
## Load Test Recommendations
- List 10k policies and 100k policy versions per workspace.
- Render backup and restore tables with 50k backup items.
- Simulate concurrent backup schedule runs for multiple tenants.
- Simulate Graph 429/503 responses and verify retry/backoff budgets.
- Exercise dashboard widgets with realistic operation/finding history.