TenantAtlas/specs/194-governance-friction-hardening/research.md
ahmido acc8947384 feat: harden governance action semantics (#229)
## Summary
- add the Spec 194 governance action catalog, friction classes, reason policies, and regression guards
- align exception, review, evidence, finding, tenant, provider connection, and system run actions to the shared semantics model
- add focused feature, RBAC, audit, unit, and browser coverage, including the tenant detail triage header consistency update

## Verification
- ran the focused Spec 194 verification pack from the quickstart and task plan
- ran targeted tenant triage coverage after the detail-header update
- ran `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`

## Filament Notes
- Filament v5 / Livewire v4 compliance preserved
- provider registration remains in `apps/platform/bootstrap/providers.php`
- globally searchable resources were not changed
- destructive actions remain confirmation-gated and server-authorized
- no new Filament assets were introduced; the existing `cd apps/platform && php artisan filament:assets` deploy step stays unchanged

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #229
2026-04-12 21:21:44 +00:00

140 lines
6.7 KiB
Markdown

# Research: Governance Friction Hardening and Operator Vocabulary
## Decision: Introduce one narrow governance-action catalog instead of a new governance workflow framework
### Rationale
Spec 194 needs one project-wide, testable source for friction class, reason policy, danger expectation, and canonical vocabulary across actions that already exist on multiple surfaces. The repo already has several concrete governance families: exception decisions, review lifecycle, evidence lifecycle, run triage, finding lifecycle, and tenant lifecycle. That is enough real variance to justify one small derived catalog, but not a new runtime workflow engine.
### Alternatives considered
- Keep all semantics page-local and document them only in the spec: rejected because local copy and modal logic would drift again and CI could not enforce the rules.
- Build a full governance action framework with custom builders, registries, and resolvers: rejected because the repo only needs shared semantics, not a second execution engine.
## Decision: Keep existing mutation services and audit loggers as owners of state change
### Rationale
The current services already own the actual lifecycle mutation and most audit logging:
- `FindingExceptionService` for approve, reject, renew, revoke
- `TenantReviewLifecycleService` for publish and archive
- `EvidenceSnapshotService` for refresh and expire
- `OperationRunTriageService` for retry, cancel, and mark investigated
- `FindingWorkflowService` for close and reopen
- `TenantResource` lifecycle helpers plus `WorkspaceAuditLogger` for archive and restore
The narrowest correct implementation is to align UI semantics and extend service inputs or audit metadata only where Spec 194 requires stronger reason propagation.
### Alternatives considered
- Move lifecycle mutations into a new shared governance service layer: rejected because it would duplicate working domain services and add coordination overhead without solving a new business problem.
- Keep reason capture only in UI and not in service-level inputs: rejected because Spec 194 requires reasons to remain audit-visible and not be purely presentational.
## Decision: Treat reason capture as a family contract, not a local modal choice
### Rationale
Current repo behavior is inconsistent:
- Exception family already captures reasons across all four major actions.
- Review publish or archive capture no reason.
- Evidence refresh or expire capture no reason.
- System run triage captures reason only for `Mark investigated`, not for `Cancel`.
- Finding `Close` captures reason, but `Reopen` does not.
- Tenant archive or restore capture no reason.
Spec 194 therefore must define reason policy by family and then drive the UI forms and service inputs from that rule.
### Alternatives considered
- Leave reason capture to each page owner: rejected because it produced the current inconsistency.
- Force a reason on every action: rejected because it would over-harden F0 and F1 actions and reduce operator velocity without safety benefit.
## Decision: Distinguish technical refresh from formal governance lifecycle
### Rationale
The repo already shows that similarly placed actions do not have equivalent business meaning:
- `Refresh evidence` is operational regeneration of data.
- `Expire snapshot` formally invalidates a governance artifact.
- `Refresh review` is operational recomputation.
- `Publish review` is a formal release step.
- `Retry` is follow-up work.
- `Cancel` is a stronger intervention.
Spec 194 should therefore classify by business impact, not by whether the action appears in a header or uses the same Filament primitive.
### Alternatives considered
- Classify by surface location: rejected because the same family appears on queue, detail, workspace, and system pages.
- Classify by current button color: rejected because current color usage is part of the inconsistency.
## Decision: Use canonical operator vocabulary per family and prohibit casual synonyms
### Rationale
The same domain effect should not oscillate between verbs. The current repo already has stable families that can be hardened:
- `Approve / Reject`
- `Renew exception / Revoke exception`
- `Publish review / Archive review / Create next review`
- `Refresh evidence / Expire snapshot`
- `Close / Reopen`
- `Retry / Cancel / Mark investigated`
- `Archive / Restore`
Spec 194 should preserve those families and use them consistently in action labels, modal headings, notifications, and audit wording.
### Alternatives considered
- Allow page-specific synonyms where copy “reads better”: rejected because operator ambiguity is precisely the problem this spec is solving.
- Rename everything to one generic lifecycle lexicon: rejected because different domains still need domain-specific objects and verbs.
## Decision: Keep the new semantics derived and guardable, not persisted
### Rationale
The new friction classes and reason policies are product rules, not new domain records. They do not need their own table or long-lived artifact. A derived catalog plus tests is enough to make the rules explicit, reviewable, and regression-safe.
### Alternatives considered
- Persist the matrix in the database or a user-editable admin screen: rejected because the semantics are part of product behavior, not tenant-owned configuration.
- Leave the matrix only in documentation: rejected because the repo needs an enforceable regression gate.
## Decision: Reuse the existing test layering already proven in this repo
### Rationale
The repo already has the right three layers for Spec 194:
- Guard tests for contract-level invariants
- Focused feature or RBAC tests around concrete surfaces and services
- Browser smoke tests for cross-surface operator flows
This gives durable coverage without overbuilding.
### Alternatives considered
- Browser-test every friction permutation: rejected because service and page tests already cover most of the logic more cheaply.
- Add only a unit test for the catalog: rejected because surface wiring and authorization semantics would remain unverified.
## Decision: Align the highest-risk families first
### Rationale
The strongest current inconsistencies and operator risks are concentrated in:
- Exception decision and lifecycle actions
- Review publication and archival
- Evidence expiry semantics
- System run triage
These should be aligned before lower-risk supporting families such as tenant restore or navigation-adjacent actions.
### Alternatives considered
- Start with the broadest surface rollout: rejected because it would spread effort without first hardening the most consequential actions.
- Start with tenant lifecycle only: rejected because exception, review, evidence, and run triage already carry higher governance importance.