TenantAtlas/docs/performance-guidelines.md
ahmido bf43dad3d1 fix: enforce workspace surface scope for customer review workspace (#366)
## Summary
- keep `/admin/reviews/workspace` workspace-scoped in shell and sidebar context
- treat `tenant` query hints on the customer review workspace as page-level filters only
- update the customer review workspace tests and Spec 311 navigation contract to match the workspace-hub IA

## Testing
- `cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Reviews/CustomerReviewWorkspacePageTest.php`
- `cd apps/platform && ./vendor/bin/sail artisan test --compact tests/Feature/Filament/WorkspaceContextTopbarAndTenantSelectionTest.php tests/Feature/Filament/PanelNavigationSegregationTest.php`
- `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`
- `git diff --check`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #366
2026-05-15 20:52:37 +00:00

4.0 KiB

TenantPilot Performance Guidelines

Status: 2026-05-15 Applies to: Laravel 12, Filament 5, Livewire 4, PostgreSQL 16, Microsoft Graph.

Performance Target

TenantPilot should keep interactive admin requests short and move remote, large, retryable, or long-running work into queued operations with visible OperationRun state.

Current Performance Risks

Risk Evidence Priority Mitigation
Queryable payloads still in json policy versions, backup items, restore runs, audit logs P1 Convert to JSONB where queried; add targeted GIN/expression indexes.
Large Filament pages/resources 1,000-5,700 LOC classes P1 Extract tables/actions and review N+1 risks per surface.
Database queue for all work .env.example and queue config P2 Move high-volume Graph/restore work to Redis queue when load grows.
Dashboard/widget query cost multiple KPI/list widgets P2 Cache or precompute expensive aggregate metrics.
Graph throttling Microsoft Graph 429/503 behavior P1 Honor Retry-After, use exponential backoff with jitter, avoid polling.

Synchronous vs Asynchronous

Keep synchronous:

  • Rendering Filament pages.
  • Validating form/action input.
  • Creating operation intent records.
  • Small DB-only state transitions.
  • Showing preview summaries from already persisted data.

Move asynchronous:

  • Microsoft Graph reads/writes.
  • Backup set item capture.
  • Restore execution.
  • Bulk export/import.
  • Compliance/evidence snapshots.
  • Long report generation.
  • Notification delivery retries.
  • Any workflow likely to exceed 2-5 seconds.

Filament Table Rules

  • Always define a default sort.
  • Eager-load relationships used by visible columns.
  • Use withCount()/aggregate subqueries instead of per-row counts.
  • Hide technical columns by default.
  • Use session persistence only on investigative resources.
  • Avoid computed columns that perform per-row service calls.
  • Avoid Graph calls during table render.

Database Rules

  • Prefer jsonb for raw Graph snapshots, backup payloads, restore previews/results, evidence summaries, and audit metadata that must be queried.
  • Add GIN indexes only when a query path exists; prefer expression indexes for common JSON paths.
  • Add composite indexes for workspace/tenant/time/status list filters.
  • Add partial unique indexes for active run/idempotency constraints.
  • Keep migrations incremental and reversible where practical.

Queue Strategy

MVP:

  • Database queue is acceptable for local and low-volume staging.
  • Jobs must be idempotent and observable.
  • Worker timeout must be lower than retry_after.

Scale-up:

  • Move production queues to Redis.
  • Split queues: high, default, graph, restore, reports, notifications.
  • Run separate worker counts per queue.
  • Use process supervision in Dokploy/container runtime.
  • Restart/reload workers on every deploy.

Caching Strategy

  • Cache stable config-derived capability maps.
  • Cache dashboard aggregates only when invalidation is clear.
  • Do not cache tenant authorization decisions across membership changes unless invalidation is proven.
  • Avoid caching raw Graph secrets or token payloads.
  • Use Redis for locks and cache in production when queue/scheduler scale increases.

Monitoring Metrics

  • HTTP p50/p95/p99 response time by route/panel.
  • Livewire request duration and error rate.
  • DB query count and slow queries by page/action.
  • Queue depth, job latency, failures, retries, max runtime.
  • Scheduler last-success timestamp per scheduled command.
  • Graph 429/503 count, retry-after seconds, retry exhaustion.
  • OperationRun created/running/failed/partial counts.
  • Audit log write failures.
  • Backup/restore duration and item failure rate.

Load Test Recommendations

  • List 10k policies and 100k policy versions per workspace.
  • Render backup and restore tables with 50k backup items.
  • Simulate concurrent backup schedule runs for multiple tenants.
  • Simulate Graph 429/503 responses and verify retry/backoff budgets.
  • Exercise dashboard widgets with realistic operation/finding history.