TenantAtlas/docs/audits/2026-03-15-audit-spec-candidates.md
ahmido 641bb4afde feat: implement tenant lifecycle operability semantics (#172)
## Summary
- implement Spec 143 tenant lifecycle, operability, and tenant-context semantics across chooser, tenant management, onboarding, and canonical operation viewers
- add centralized tenant lifecycle and operability support types, audit action coverage, and lifecycle-aware badge and action handling
- add feature and unit coverage for tenant chooser eligibility, global search scoping, canonical operation access, onboarding authorization, and lifecycle presentation

## Testing
- vendor/bin/sail artisan test --compact
- vendor/bin/sail bin pint --dirty --format agent

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #172
2026-03-15 09:08:36 +00:00

265 lines
13 KiB
Markdown

# Audit-Derived Spec Candidates
**Date:** 2026-03-15
**Source audit:** [docs/audits/tenantpilot-architecture-audit-constitution.md](docs/audits/tenantpilot-architecture-audit-constitution.md) plus first-pass repo scan driven by [ .github/prompts/tenantpilot.audit.prompt.md ](.github/prompts/tenantpilot.audit.prompt.md)
## Goal
Translate the first architecture and safety audit findings into a small number of high-value follow-up specs.
These are intentionally **problem-cluster specs**, not bug tickets. Each candidate groups multiple symptoms under one architectural diagnosis so the repo does not fragment into dozens of local fixes.
## Recommended Order
1. Spec 144: queued execution reauthorization and scope continuity
2. Spec 145: tenant-owned query canon and wrong-tenant regression guards
3. Spec 146: findings workflow enforcement and audit backstop
4. Spec 147: Livewire context locking and trusted-state reduction
## Candidate 144
### Proposed slug
`144-queued-execution-reauthorization-scope-continuity`
### Architectural diagnosis
Queued work currently relies too heavily on the authorization and tenant state that existed at dispatch time. That is acceptable for UX initiation, but not as the final trust boundary for execution.
### Primary findings covered
- Job authorization revalidation is incomplete.
- Jobs resolve tenant records by ID without a canonical execution-time scope recheck.
- Execution truth is tracked by `OperationRun`, but execution legitimacy is not revalidated as a first-class concern.
### Why this is not already covered
- [specs/110-ops-ux-enforcement/spec.md](specs/110-ops-ux-enforcement/spec.md) standardizes `OperationRun` lifecycle and notifications, not actor reauthorization or tenant lifecycle rechecks during execution.
- [specs/143-tenant-lifecycle-operability-context-semantics/spec.md](specs/143-tenant-lifecycle-operability-context-semantics/spec.md) hardens run viewing semantics, not job execution semantics.
- [specs/049-backup-restore-job-orchestration/spec.md](specs/049-backup-restore-job-orchestration/spec.md) predates the newer constitutional model and is too narrow.
### Scope
- queued jobs that mutate tenant-owned data or trigger provider work
- execution middleware and shared job execution helpers
- actor snapshot versus current authorization semantics
- tenant archived, deleted, or disabled execution handling
- auditable execution denial or cancellation outcomes
### Must-answer questions
- What is the canonical execution identity: original actor, current actor capabilities, or system-owned delegated authority?
- What should happen if the tenant is archived, detached, or otherwise no longer executable when the job starts?
- Which failures are terminal authorization failures versus retryable precondition failures?
- How should denial-at-execution be represented in `OperationRun` and `AuditLog`?
### Expected requirements shape
- Define a canonical execution revalidation contract for queued jobs.
- Require tenant existence and tenant operability checks before side effects.
- Require capability revalidation for human-initiated jobs where authority is expected to remain actor-bound.
- Define when a job may continue under system authority and when it must fail closed.
- Add regression coverage for role downgrade, tenant archival, tenant deletion, and stale actor context.
### Suggested success criteria
- No in-scope job mutates tenant-owned state after tenant archival or actor deauthorization.
- Execution-denied jobs produce deterministic `OperationRun` outcome and auditable reason codes.
- Focused regression coverage exists for each representative operation family.
### Delivery recommendation
Dedicated spec required.
## Candidate 145
### Proposed slug
`145-tenant-owned-query-canon-and-wrong-tenant-guards`
### Architectural diagnosis
Tenant isolation is mostly implemented, but the query layer remains too ad hoc. The repo relies on many repeated `where('tenant_id', ...)` patterns across resources, widgets, and pages. That creates drift risk and weakens systematic negative testing.
### Primary findings covered
- Policy queries and similar surfaces use ad hoc tenant scoping instead of canonical model-level resolution.
- Wrong-tenant regression coverage is uneven across resources, actions, detail pages, and bulk operations.
- The system has constitutional isolation rules, but not yet a sufficiently reusable query canon plus guardrail suite.
### Why this is not already covered
- [specs/135-canonical-tenant-context-resolution/spec.md](specs/135-canonical-tenant-context-resolution/spec.md) and [specs/136-admin-canonical-tenant/spec.md](specs/136-admin-canonical-tenant/spec.md) focus on resolving the correct active tenant context on admin and monitoring surfaces.
- They do not define a tenant-owned model query contract or a repeatable wrong-tenant regression matrix for resources and actions.
### Scope
- tenant-owned Eloquent models and shared scoping helpers
- Filament resource queries, widgets, relation managers, and sensitive actions
- route-model lookup hardening where tenant-owned records are resolved
- regression tests for wrong-tenant index, detail, row action, and bulk action paths
### Must-answer questions
- Is the canonical pattern a shared trait, explicit local scopes, or a resolver-backed query helper?
- Which model families are officially tenant-owned and therefore in scope for mandatory canonical query helpers?
- Which action classes count as mandatory wrong-tenant regression surfaces?
- Where should 404 versus 403 semantics be asserted in tests for members versus non-members?
### Expected requirements shape
- Define a canonical query entry pattern for tenant-owned models.
- Ban free-form tenant scoping on defined sensitive model families except for documented edge cases.
- Require focused wrong-tenant regression coverage on key resources and sensitive actions.
- Add a lightweight static or grep-style guard where it is safe and low-noise.
### Suggested success criteria
- All tier-1 tenant-owned resources use canonical query helpers instead of ad hoc per-page filtering.
- Focused wrong-tenant regression coverage exists for table rows, detail pages, and bulk actions on tier-1 surfaces.
- No newly introduced sensitive action can execute against a record from a foreign tenant context.
### Delivery recommendation
Dedicated spec required.
## Candidate 146
### Proposed slug
`146-findings-workflow-enforcement-and-audit-backstop`
### Architectural diagnosis
The findings workflow spec is strong at the product level, but the enforcement model is still too soft. Transition validity and auditability depend too much on service-path discipline instead of being impossible to bypass.
### Primary findings covered
- Critical status transitions lack centralized enforcement.
- Audit trail for finding status transitions depends on `FindingWorkflowService` invocation, not a model-level or domain-level backstop.
- Direct status mutation remains possible in principle even though the intended workflow path is specified.
### Why this is not already covered
- [specs/111-findings-workflow-sla/spec.md](specs/111-findings-workflow-sla/spec.md) defines allowed transitions and expected audit behavior, but does not yet settle the enforcement mechanism strongly enough.
- The audit finding is not about missing product semantics; it is about missing architectural hardening of those semantics.
### Scope
- `Finding` transition enforcement model
- allowed versus forbidden mutations outside workflow services
- audit backstop for all meaningful finding lifecycle changes
- recurrence, reopen, auto-resolve, and direct update edge cases
- tests for invalid transition attempts and missing-audit bypass attempts
### Must-answer questions
- Is the source of truth a formal state machine, model guard, custom cast, or service-only write API with hard enforcement?
- What model events are safe for audit backstop versus too implicit for domain truth?
- How should auto-resolve and recurrence semantics interact with the same transition gate?
- Do other lifecycle-heavy models need the same pattern once Findings is hardened?
### Expected requirements shape
- Formalize transition enforcement, not just transition documentation.
- Require all meaningful status mutations to pass through a canonical transition gateway.
- Add an audit backstop so status-changing writes cannot silently escape history.
- Add negative tests for forbidden transitions, bypass attempts, and recurrence edge cases.
### Suggested success criteria
- No invalid finding status transition is possible through service, model, action, or direct update path in covered flows.
- Every meaningful finding lifecycle mutation is auditable.
- Regression tests fail when a bypass path is introduced.
### Delivery recommendation
Dedicated spec required.
## Candidate 147
### Proposed slug
`147-livewire-context-locking-and-trusted-state-reduction`
### Architectural diagnosis
Some complex Livewire and Filament flows still expose too much context identity in public component state. This is not necessarily a current exploit, but it leaves the repo dependent on convention instead of a hardened state model.
### Primary findings covered
- Serializable tenant and workspace context exists in public component state without a strong, explicit locking pattern.
- Workflow continuity in complex wizard flows still depends partly on client-visible identifiers.
- The audit constitution says public state is untrusted, but the repo lacks one reusable hardening standard for these flows.
### Why this is not already covered
- [specs/138-managed-tenant-onboarding-draft-identity/spec.md](specs/138-managed-tenant-onboarding-draft-identity/spec.md) improves draft identity and resume semantics.
- [specs/140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp/spec.md](specs/140-onboarding-lifecycle-operation-checkpoints-concurrency-mvp/spec.md) improves lifecycle truth.
- Neither spec defines a repo-wide Livewire state-safety pattern for locked IDs, session-derived context, and server-side revalidation across complex components.
### Scope
- onboarding wizard
- restore and other multi-step resources with public IDs
- public Livewire properties carrying tenant, workspace, provider, or foreign record references
- `#[Locked]` and equivalent trust-reduction patterns
- forged-state and mutated-ID regression tests
### Must-answer questions
- Which IDs may remain public but locked, and which should disappear entirely from public component state?
- What is the canonical source of truth for active tenant and workspace in component actions: session, route record, shell context, or persisted draft?
- How should forged-state detection fail: 404, 403, validation error, or forced context reset?
- Which component families are tier-1 and must comply first?
### Expected requirements shape
- Define a trusted-state reduction standard for Livewire and Filament components.
- Require explicit locking or server-side derivation for ownership-relevant identifiers.
- Ban direct trust in mutable client-provided context IDs.
- Add forged-state regression coverage for tier-1 wizard and resource flows.
### Suggested success criteria
- Tier-1 components no longer rely on mutable public tenant or workspace identifiers for authorization-sensitive decisions.
- Forged-state tests fail closed in all covered surfaces.
- A reusable pattern exists for future Livewire components.
### Delivery recommendation
Dedicated spec required.
## Deferred Candidate
### OperationRun result referential integrity
This remains important, but I would not pull it into the first wave ahead of the four candidates above.
Reason:
- It is real architecture debt.
- It touches auditability and forensic traceability.
- But it is less immediately exploitable than execution reauthorization, query canon drift, workflow bypass, or mutable Livewire context.
Recommended treatment:
- Track as a likely follow-up to [specs/134-audit-log-foundation/spec.md](specs/134-audit-log-foundation/spec.md) and the existing operations lineage work.
- Reassess after Candidate 144 and Candidate 146, because those may clarify the right run-to-artifact integrity model.
## Recommendation
If only one spec is started now, start with **Spec 144**.
Why:
- It is the cleanest gap between current constitutional claims and actual backend execution trust.
- It cuts across restore, sync, and other high-value operations.
- It prevents the class of bug where the UI was right at dispatch time but wrong at execution time.
If two specs are started in parallel, pair **144** with **146**:
- 144 hardens execution trust.
- 146 hardens lifecycle truth and auditability.
That combination improves the backend trust model faster than a purely UI- or test-first follow-up.