TenantAtlas/specs/194-governance-friction-hardening/research.md
ahmido acc8947384 feat: harden governance action semantics (#229)
## Summary
- add the Spec 194 governance action catalog, friction classes, reason policies, and regression guards
- align exception, review, evidence, finding, tenant, provider connection, and system run actions to the shared semantics model
- add focused feature, RBAC, audit, unit, and browser coverage, including the tenant detail triage header consistency update

## Verification
- ran the focused Spec 194 verification pack from the quickstart and task plan
- ran targeted tenant triage coverage after the detail-header update
- ran `cd apps/platform && ./vendor/bin/sail bin pint --dirty --format agent`

## Filament Notes
- Filament v5 / Livewire v4 compliance preserved
- provider registration remains in `apps/platform/bootstrap/providers.php`
- globally searchable resources were not changed
- destructive actions remain confirmation-gated and server-authorized
- no new Filament assets were introduced; the existing `cd apps/platform && php artisan filament:assets` deploy step stays unchanged

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #229
2026-04-12 21:21:44 +00:00

6.7 KiB

Research: Governance Friction Hardening and Operator Vocabulary

Decision: Introduce one narrow governance-action catalog instead of a new governance workflow framework

Rationale

Spec 194 needs one project-wide, testable source for friction class, reason policy, danger expectation, and canonical vocabulary across actions that already exist on multiple surfaces. The repo already has several concrete governance families: exception decisions, review lifecycle, evidence lifecycle, run triage, finding lifecycle, and tenant lifecycle. That is enough real variance to justify one small derived catalog, but not a new runtime workflow engine.

Alternatives considered

  • Keep all semantics page-local and document them only in the spec: rejected because local copy and modal logic would drift again and CI could not enforce the rules.
  • Build a full governance action framework with custom builders, registries, and resolvers: rejected because the repo only needs shared semantics, not a second execution engine.

Decision: Keep existing mutation services and audit loggers as owners of state change

Rationale

The current services already own the actual lifecycle mutation and most audit logging:

  • FindingExceptionService for approve, reject, renew, revoke
  • TenantReviewLifecycleService for publish and archive
  • EvidenceSnapshotService for refresh and expire
  • OperationRunTriageService for retry, cancel, and mark investigated
  • FindingWorkflowService for close and reopen
  • TenantResource lifecycle helpers plus WorkspaceAuditLogger for archive and restore

The narrowest correct implementation is to align UI semantics and extend service inputs or audit metadata only where Spec 194 requires stronger reason propagation.

Alternatives considered

  • Move lifecycle mutations into a new shared governance service layer: rejected because it would duplicate working domain services and add coordination overhead without solving a new business problem.
  • Keep reason capture only in UI and not in service-level inputs: rejected because Spec 194 requires reasons to remain audit-visible and not be purely presentational.

Decision: Treat reason capture as a family contract, not a local modal choice

Rationale

Current repo behavior is inconsistent:

  • Exception family already captures reasons across all four major actions.
  • Review publish or archive capture no reason.
  • Evidence refresh or expire capture no reason.
  • System run triage captures reason only for Mark investigated, not for Cancel.
  • Finding Close captures reason, but Reopen does not.
  • Tenant archive or restore capture no reason.

Spec 194 therefore must define reason policy by family and then drive the UI forms and service inputs from that rule.

Alternatives considered

  • Leave reason capture to each page owner: rejected because it produced the current inconsistency.
  • Force a reason on every action: rejected because it would over-harden F0 and F1 actions and reduce operator velocity without safety benefit.

Decision: Distinguish technical refresh from formal governance lifecycle

Rationale

The repo already shows that similarly placed actions do not have equivalent business meaning:

  • Refresh evidence is operational regeneration of data.
  • Expire snapshot formally invalidates a governance artifact.
  • Refresh review is operational recomputation.
  • Publish review is a formal release step.
  • Retry is follow-up work.
  • Cancel is a stronger intervention.

Spec 194 should therefore classify by business impact, not by whether the action appears in a header or uses the same Filament primitive.

Alternatives considered

  • Classify by surface location: rejected because the same family appears on queue, detail, workspace, and system pages.
  • Classify by current button color: rejected because current color usage is part of the inconsistency.

Decision: Use canonical operator vocabulary per family and prohibit casual synonyms

Rationale

The same domain effect should not oscillate between verbs. The current repo already has stable families that can be hardened:

  • Approve / Reject
  • Renew exception / Revoke exception
  • Publish review / Archive review / Create next review
  • Refresh evidence / Expire snapshot
  • Close / Reopen
  • Retry / Cancel / Mark investigated
  • Archive / Restore

Spec 194 should preserve those families and use them consistently in action labels, modal headings, notifications, and audit wording.

Alternatives considered

  • Allow page-specific synonyms where copy “reads better”: rejected because operator ambiguity is precisely the problem this spec is solving.
  • Rename everything to one generic lifecycle lexicon: rejected because different domains still need domain-specific objects and verbs.

Decision: Keep the new semantics derived and guardable, not persisted

Rationale

The new friction classes and reason policies are product rules, not new domain records. They do not need their own table or long-lived artifact. A derived catalog plus tests is enough to make the rules explicit, reviewable, and regression-safe.

Alternatives considered

  • Persist the matrix in the database or a user-editable admin screen: rejected because the semantics are part of product behavior, not tenant-owned configuration.
  • Leave the matrix only in documentation: rejected because the repo needs an enforceable regression gate.

Decision: Reuse the existing test layering already proven in this repo

Rationale

The repo already has the right three layers for Spec 194:

  • Guard tests for contract-level invariants
  • Focused feature or RBAC tests around concrete surfaces and services
  • Browser smoke tests for cross-surface operator flows

This gives durable coverage without overbuilding.

Alternatives considered

  • Browser-test every friction permutation: rejected because service and page tests already cover most of the logic more cheaply.
  • Add only a unit test for the catalog: rejected because surface wiring and authorization semantics would remain unverified.

Decision: Align the highest-risk families first

Rationale

The strongest current inconsistencies and operator risks are concentrated in:

  • Exception decision and lifecycle actions
  • Review publication and archival
  • Evidence expiry semantics
  • System run triage

These should be aligned before lower-risk supporting families such as tenant restore or navigation-adjacent actions.

Alternatives considered

  • Start with the broadest surface rollout: rejected because it would spread effort without first hardening the most consequential actions.
  • Start with tenant lifecycle only: rejected because exception, review, evidence, and run triage already carry higher governance importance.