# Research: Governance Friction Hardening and Operator Vocabulary ## Decision: Introduce one narrow governance-action catalog instead of a new governance workflow framework ### Rationale Spec 194 needs one project-wide, testable source for friction class, reason policy, danger expectation, and canonical vocabulary across actions that already exist on multiple surfaces. The repo already has several concrete governance families: exception decisions, review lifecycle, evidence lifecycle, run triage, finding lifecycle, and tenant lifecycle. That is enough real variance to justify one small derived catalog, but not a new runtime workflow engine. ### Alternatives considered - Keep all semantics page-local and document them only in the spec: rejected because local copy and modal logic would drift again and CI could not enforce the rules. - Build a full governance action framework with custom builders, registries, and resolvers: rejected because the repo only needs shared semantics, not a second execution engine. ## Decision: Keep existing mutation services and audit loggers as owners of state change ### Rationale The current services already own the actual lifecycle mutation and most audit logging: - `FindingExceptionService` for approve, reject, renew, revoke - `TenantReviewLifecycleService` for publish and archive - `EvidenceSnapshotService` for refresh and expire - `OperationRunTriageService` for retry, cancel, and mark investigated - `FindingWorkflowService` for close and reopen - `TenantResource` lifecycle helpers plus `WorkspaceAuditLogger` for archive and restore The narrowest correct implementation is to align UI semantics and extend service inputs or audit metadata only where Spec 194 requires stronger reason propagation. ### Alternatives considered - Move lifecycle mutations into a new shared governance service layer: rejected because it would duplicate working domain services and add coordination overhead without solving a new business problem. - Keep reason capture only in UI and not in service-level inputs: rejected because Spec 194 requires reasons to remain audit-visible and not be purely presentational. ## Decision: Treat reason capture as a family contract, not a local modal choice ### Rationale Current repo behavior is inconsistent: - Exception family already captures reasons across all four major actions. - Review publish or archive capture no reason. - Evidence refresh or expire capture no reason. - System run triage captures reason only for `Mark investigated`, not for `Cancel`. - Finding `Close` captures reason, but `Reopen` does not. - Tenant archive or restore capture no reason. Spec 194 therefore must define reason policy by family and then drive the UI forms and service inputs from that rule. ### Alternatives considered - Leave reason capture to each page owner: rejected because it produced the current inconsistency. - Force a reason on every action: rejected because it would over-harden F0 and F1 actions and reduce operator velocity without safety benefit. ## Decision: Distinguish technical refresh from formal governance lifecycle ### Rationale The repo already shows that similarly placed actions do not have equivalent business meaning: - `Refresh evidence` is operational regeneration of data. - `Expire snapshot` formally invalidates a governance artifact. - `Refresh review` is operational recomputation. - `Publish review` is a formal release step. - `Retry` is follow-up work. - `Cancel` is a stronger intervention. Spec 194 should therefore classify by business impact, not by whether the action appears in a header or uses the same Filament primitive. ### Alternatives considered - Classify by surface location: rejected because the same family appears on queue, detail, workspace, and system pages. - Classify by current button color: rejected because current color usage is part of the inconsistency. ## Decision: Use canonical operator vocabulary per family and prohibit casual synonyms ### Rationale The same domain effect should not oscillate between verbs. The current repo already has stable families that can be hardened: - `Approve / Reject` - `Renew exception / Revoke exception` - `Publish review / Archive review / Create next review` - `Refresh evidence / Expire snapshot` - `Close / Reopen` - `Retry / Cancel / Mark investigated` - `Archive / Restore` Spec 194 should preserve those families and use them consistently in action labels, modal headings, notifications, and audit wording. ### Alternatives considered - Allow page-specific synonyms where copy “reads better”: rejected because operator ambiguity is precisely the problem this spec is solving. - Rename everything to one generic lifecycle lexicon: rejected because different domains still need domain-specific objects and verbs. ## Decision: Keep the new semantics derived and guardable, not persisted ### Rationale The new friction classes and reason policies are product rules, not new domain records. They do not need their own table or long-lived artifact. A derived catalog plus tests is enough to make the rules explicit, reviewable, and regression-safe. ### Alternatives considered - Persist the matrix in the database or a user-editable admin screen: rejected because the semantics are part of product behavior, not tenant-owned configuration. - Leave the matrix only in documentation: rejected because the repo needs an enforceable regression gate. ## Decision: Reuse the existing test layering already proven in this repo ### Rationale The repo already has the right three layers for Spec 194: - Guard tests for contract-level invariants - Focused feature or RBAC tests around concrete surfaces and services - Browser smoke tests for cross-surface operator flows This gives durable coverage without overbuilding. ### Alternatives considered - Browser-test every friction permutation: rejected because service and page tests already cover most of the logic more cheaply. - Add only a unit test for the catalog: rejected because surface wiring and authorization semantics would remain unverified. ## Decision: Align the highest-risk families first ### Rationale The strongest current inconsistencies and operator risks are concentrated in: - Exception decision and lifecycle actions - Review publication and archival - Evidence expiry semantics - System run triage These should be aligned before lower-risk supporting families such as tenant restore or navigation-adjacent actions. ### Alternatives considered - Start with the broadest surface rollout: rejected because it would spread effort without first hardening the most consequential actions. - Start with tenant lifecycle only: rejected because exception, review, evidence, and run triage already carry higher governance importance.