TenantAtlas/specs/163-baseline-subject-resolution/spec.md
ahmido c17255f854 feat: implement baseline subject resolution semantics (#193)
## Summary
- add the structured subject-resolution foundation for baseline compare and baseline capture, including capability guards, subject descriptors, resolution outcomes, and operator action categories
- persist structured evidence-gap subject records and update compare/capture surfaces, landing projections, and cleanup tooling to use the new contract
- add Spec 163 artifacts and focused Pest coverage for classification, determinism, cleanup, and DB-only rendering

## Validation
- `vendor/bin/sail bin pint --dirty --format agent`
- `vendor/bin/sail artisan test --compact tests/Unit/Support/Baselines tests/Feature/Baselines tests/Feature/Filament/OperationRunEnterpriseDetailPageTest.php`

## Notes
- verified locally that a fresh post-restart baseline compare run now writes structured `baseline_compare.evidence_gaps.subjects` records instead of the legacy broad payload shape
- excluded the separate `docs/product/spec-candidates.md` worktree change from this branch commit and PR

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #193
2026-03-25 12:40:45 +00:00

21 KiB

Feature Specification: Baseline Subject Resolution and Evidence Gap Semantics Foundation

Feature Branch: 163-baseline-subject-resolution
Created: 2026-03-24
Status: Draft
Input: User description: "Spec 163 — Baseline Subject Resolution & Evidence Gap Semantics Foundation"

Spec Scope Fields (mandatory)

  • Scope: tenant, canonical-view
  • Primary Routes: /admin/operations/{run}, /admin/t/{tenant}/baseline-compare-landing, and existing baseline compare and baseline capture entry points that surface evidence-gap meaning
  • Data Ownership: Tenant-owned local evidence records, captured baseline comparison results, and operation-run context remain the operational source of truth for resolution outcomes. Workspace-owned baseline support metadata remains the source of support promises and subject-class expectations.
  • RBAC: Existing workspace membership, tenant entitlement, and baseline compare or monitoring view permissions remain authoritative. This feature does not introduce new roles or broaden visibility.

For canonical-view specs, the spec MUST define:

  • Default filter behavior when tenant-context is active: Canonical monitoring surfaces must continue to respect active tenant context in navigation and related links, while direct run-detail access remains explicit to the run's tenant and must not silently widen visibility.
  • Explicit entitlement checks preventing cross-tenant leakage: Canonical run-detail and tenant-scoped compare review surfaces must continue to enforce workspace entitlement first and tenant entitlement second, with deny-as-not-found behavior for non-members and no cross-tenant hinting through resolution outcomes or evidence-gap details.

Operator Surface Contract (mandatory when operator-facing surfaces are changed)

Surface Primary Persona Surface Type Primary Operator Question Default-visible Information Diagnostics-only Information Status Dimensions Used Mutation Scope Primary Actions Dangerous Actions
Monitoring → baseline compare or capture run detail Workspace manager or entitled tenant operator Canonical detail Is this gap structural, operational, or transient, and what action should I take next? Resolution meaning, subject class, operator-safe next step, whether retry or sync is relevant, whether the issue is a product limitation or tenant-local data issue Raw context payloads and low-level diagnostic fragments execution outcome, evidence completeness, root-cause class, actionability Simulation only for compare interpretation, TenantPilot only for rendering and classification persistence View run, inspect gap meaning, navigate to related tenant review surfaces None
Tenant baseline compare landing and related review surfaces Tenant operator Tenant-scoped review surface Can I trust this compare result, and what exactly is missing or mismatched locally? Structural versus operational meaning, subject class, local evidence expectation, next-step guidance, compare support limitations Raw stored context and secondary technical diagnostics compare trust, data completeness, root-cause class, actionability Simulation only Compare now, inspect latest run, review evidence-gap meaning None

User Scenarios & Testing (mandatory)

User Story 1 - Distinguish structural from missing-local-data gaps (Priority: P1)

An operator reviewing a baseline compare or capture result needs the product to tell them whether the gap is structurally expected for that subject class or whether a policy or inventory record is actually missing locally.

Why this priority: This is the core trust problem. If structural limits and missing-local-data cases are collapsed into one generic reason, operators take the wrong follow-up action and lose confidence in the product.

Independent Test: Run compare and capture flows that include both policy-backed and foundation-backed subjects, then verify that the resulting gaps clearly separate structural resolver limits from missing local records without relying on raw diagnostics.

Acceptance Scenarios:

  1. Given a baseline-supported subject that is inventory-backed but not policy-backed, When a new compare or capture run evaluates it, Then the run records a structural foundation or inventory outcome instead of a generic policy-not-found meaning.
  2. Given a policy-backed subject with no local policy record, When the same flow evaluates it, Then the run records an operational missing-local-data outcome that is distinct from structural subject-class limits.

User Story 2 - Keep support promises truthful at runtime (Priority: P2)

A product owner or operator needs baseline-supported subject types to enter compare or capture only when the runtime can classify and resolve them truthfully, so support configuration does not overpromise capabilities the resolver cannot deliver.

Why this priority: False support promises create predictable false alarms and make baseline support metadata untrustworthy.

Independent Test: Evaluate supported subject types against current resolver capability and verify that each type either enters the run with a valid resolution path and meaningful outcome set or is explicitly limited or excluded before misleading gaps are produced.

Acceptance Scenarios:

  1. Given a subject type marked as baseline-supported, When the runtime has no truthful resolution path for that subject class, Then the type is either explicitly limited, explicitly excluded, or classified through a non-policy path instead of silently producing a generic missing-policy signal.
  2. Given a subject type with a valid resolution path, When the run evaluates it, Then the stored outcome reflects the correct subject class and local evidence model.

User Story 3 - Replace dev-era broad reasons with the new contract cleanly (Priority: P3)

A developer or operator needs the repository to move to the new structured gap contract without carrying obsolete development-only run payloads forward just for compatibility.

Why this priority: Staying in a mixed old-and-new state during development would preserve ambiguity in exactly the area this feature is trying to fix.

Independent Test: Remove or regenerate old development runs, create a new run under the updated contract, and verify that the existing surfaces expose subject class, resolution meaning, and action category without fallback to the old broad reason contract.

Acceptance Scenarios:

  1. Given development-only runs that use the old broad reason shape, When the team chooses to clean or regenerate them, Then the product does not require runtime preservation of the obsolete shape to proceed.
  2. Given a new run created after this foundation is implemented, When an operator opens the existing detail surfaces, Then the run exposes subject class, resolution meaning, and action category without requiring a new screen.

Edge Cases

  • A run contains a mix of policy-backed, foundation-backed, inventory-backed, and derived subjects. Each subject must keep its own resolution meaning instead of being normalized into one broad reason bucket.
  • A subject is supported in configuration but currently lacks a truthful runtime resolution path. The system must not silently enter the subject into compare or capture as if the path were valid.
  • A transient upstream or budget-related failure occurs for one subject while another subject in the same run is structurally not policy-backed. The surface must keep transient and structural meaning distinct.
  • Development data may still contain obsolete broad-reason payloads during rollout. The team may remove or regenerate those runs instead of extending the runtime contract to support them indefinitely.
  • Two identical subjects evaluated against the same tenant-local state at different points in the same release must produce the same resolution outcome and operator meaning.

Requirements (mandatory)

Constitution alignment (required): This feature does not add new external provider calls or a new long-running operation type. It establishes a stricter semantic contract for how existing baseline compare and capture workflows classify subjects, persist evidence-gap meaning, and describe operator action truth. Existing tenant isolation, preview, and audit expectations remain in force.

Constitution alignment (OPS-UX): Existing compare and capture runs continue to use the established three-surface feedback contract. Run status and outcome remain service-owned. Summary counts remain numeric and lifecycle-safe. This feature extends the semantic detail stored in run context so evidence-gap meaning is deterministic, reproducible, and available on progress and terminal surfaces without redefining run lifecycle ownership.

Constitution alignment (RBAC-UX): This feature changes what existing entitled users can understand on run-detail and tenant review surfaces, not who may access those surfaces. Non-members remain deny-as-not-found. Members who lack the relevant capability remain forbidden only after entitlement is established. No cross-tenant visibility or capability broadening is introduced.

Constitution alignment (OPS-EX-AUTH-001): Not applicable beyond reaffirming that monitoring and operations surfaces continue to avoid synchronous auth-handshake behavior.

Constitution alignment (BADGE-001): If any status-like labels are refined for evidence-gap meaning, the semantic mapping must remain centralized and shared across dense and detailed surfaces. This feature must not create ad hoc surface-specific meanings for structural, operational, or transient states.

Constitution alignment (UI-NAMING-001): Operator-facing wording must describe the object class and root cause before advice. Labels should use domain language such as “Policy record missing locally”, “Inventory-backed foundation subject”, or “Retry may help”, and avoid implementation-first phrasing.

Constitution alignment (OPSURF-001): Existing run detail and tenant review surfaces remain operator-first. Default-visible content must answer whether the issue is structural, operational, or transient before exposing raw diagnostics. Mutation scope messaging for existing compare actions remains unchanged.

Constitution alignment (Filament Action Surfaces): The affected Filament pages remain compliant with the Action Surface Contract. No new destructive actions are introduced. Existing compare or review actions remain read or inspect oriented, and this feature changes interpretation rather than mutation behavior.

Constitution alignment (UX-001 — Layout & Information Architecture): This feature reuses existing layouts and surfaces. The required change is semantic clarity, not a new layout pattern. Existing sections and detail affordances must present the new meaning without introducing naked diagnostics or a parallel screen.

Functional Requirements

  • FR-001: The system MUST determine the subject class for every compare or capture subject before attempting local resolution.
  • FR-002: The system MUST support, at minimum, policy-backed, inventory-backed, foundation-backed, and derived subject classes for new runs.
  • FR-003: The system MUST choose the local resolution strategy from the subject class and supported capability contract, rather than implicitly treating every in-scope subject as policy-backed.
  • FR-004: The system MUST distinguish policy-backed missing-local-record cases from structural foundation or inventory-only cases in new run outputs.
  • FR-005: The system MUST support a precise evidence-gap reason taxonomy for new runs that can separately represent missing policy records, missing inventory records, structural non-policy-backed subjects, resolution mismatches, invalid or duplicate subjects, transient capture failures, ambiguity, and budget or throttling limits.
  • FR-006: The system MUST persist structured gap metadata for new runs that includes subject class, resolution meaning, and operator action category, rather than relying only on a broad reason code and a raw subject key.
  • FR-007: The system MUST provide an explicit resolution outcome for each evaluated subject, including successful resolution path, structural limitation, missing local artifact, or transient failure as applicable.
  • FR-008: The system MUST prevent baseline support metadata from overpromising compare or capture capability when no truthful runtime resolution path exists for that subject class.
  • FR-009: The system MUST classify new gaps so operators can tell whether retry, backup or sync, or product follow-up is the correct next action.
  • FR-010: The system MUST NOT persist the historical broad policy-not-found reason as the sole reason for newly created structural cases that have a more precise semantic classification.
  • FR-011: During development, the system MAY invalidate or discard previously stored run payloads that only contain the broad legacy reason if that simplifies migration to the new structured contract.
  • FR-012: The system MUST preserve already precise reason families, including transient and ambiguity-related cases, without collapsing them into the new structural taxonomy.
  • FR-013: The system MUST keep the semantic meaning aligned across dense landing surfaces and richer detail surfaces so the same run does not communicate different root causes on different pages.
  • FR-014: The system MUST derive resolution meaning on the backend so run context, auditability, and diagnostic replay do not depend on UI-only interpretation.
  • FR-015: The system MUST produce the same resolution outcome and operator-facing meaning for the same subject and tenant-local state whenever the input conditions are unchanged.
  • FR-016: The system MUST allow inventory-backed or foundation-backed supported subjects to remain in scope only when their compare or capture behavior can be described truthfully through the resolution contract.

Assumptions

  • Foundation-backed subjects remain eligible for compare or capture only when the product can truthfully classify them through an inventory-backed or limited non-policy resolution path. Otherwise they are treated as explicitly unsupported for that operation rather than as generic missing-policy cases.
  • Subject class and resolution outcome are both required because they answer different operator questions: what kind of object is this, and what happened when the system tried to resolve it.
  • The repository is still in active development, so breaking cleanup of previously stored development run payloads is acceptable when it removes obsolete broad-reason semantics instead of preserving them.
  • Newly created runs are expected to use the new structured contract immediately; there is no requirement to keep the old broad reason shape alive for future writes.
  • This foundation spec establishes root-cause truth and runtime support truth. Fidelity richness, renderer density, and deeper wording refinements are handled in follow-on work.

Deferred Scope

  • New renderer families, fidelity badges, or snapshot richness redesign are not included in this feature.
  • This feature does not redefine content diff algorithms, reporting exports, or large historical data backfills.
  • This feature does not require a new operator screen. It upgrades semantic truth on existing surfaces.
  • This feature does not preserve historical development run payloads only for compatibility's sake.
  • This feature does not create dual-read or dual-write architecture for old and new gap semantics unless a concrete development need emerges later.
  • New downstream domain behavior, including new findings, alerts, or follow-on automation, must be designed around the new structured contract rather than the old broad reason.

Development Migration Policy

  • Breaking Cleanup Is Acceptable: Existing development-only compare and capture runs MAY be deleted, regenerated, or rendered invalid if that removes obsolete broad-reason semantics and keeps the runtime model cleaner.
  • Single Contract Going Forward: Newly created runs MUST write the new structured resolution and gap contract only.
  • No Parallel Semantic Core: The old broad reason MAY be recognized temporarily in one-off development utilities or cleanup scripts, but it MUST NOT remain a first-class domain contract for ongoing feature work.
  • Regenerate Over Preserve: When tests, fixtures, or local demo data depend on the old shape, the preferred path is to rebuild them against the new contract instead of extending production code to preserve the obsolete structure.

UI Action Matrix (mandatory when Filament is changed)

Surface Location Header Actions Inspect Affordance (List/Table) Row Actions (max 2 visible) Bulk Actions (grouped) Empty-State CTA(s) View Header Actions Create/Edit Save+Cancel Audit log? Notes / Exemptions
Baseline compare or capture run detail Existing canonical run-detail surface Existing navigation and refresh actions remain Existing detail and diagnostic sections remain the inspect affordance None added None No new CTA. Empty states explain whether the run has no gaps or whether development data must be regenerated under the new contract. Existing run-detail header actions remain Not applicable Existing run-backed audit semantics remain Read-only semantic upgrade; no new mutation surface
Tenant baseline compare landing and related review surfaces Existing tenant-scoped review surfaces Existing compare and navigation actions remain Existing summary and detail sections remain the inspect affordance None added None Existing compare CTA remains; no new dangerous action is introduced Existing page-level actions remain Not applicable Existing run-backed audit semantics remain Read-only semantic upgrade; same actions, clearer meaning

Key Entities (include if feature involves data)

  • Subject: A compare or capture target that the product must classify and resolve against tenant-local evidence before it can judge trust or completeness.
  • Subject Class: The business-level class that describes whether a subject is policy-backed, inventory-backed, foundation-backed, or derived.
  • Resolution Outcome: The deterministic result of attempting to resolve a subject locally, including both successful resolution paths and precise failure or limitation meanings.
  • Evidence Gap Detail: The structured record attached to a run that captures which subject was affected, how it was classified, what local evidence expectation applied, and which operator action category follows.
  • Support Capability Contract: The support promise that states whether a subject type may enter compare or capture and through which truthful resolution path.

Success Criteria (mandatory)

Measurable Outcomes

  • SC-001: In validation runs containing both structural and missing-local-data cases, operators can distinguish the two classes from the default-visible surface without opening raw diagnostics in 100% of sampled review sessions.
  • SC-002: For every supported subject type included in a release validation pack, the runtime either produces a truthful resolution path or classifies the type as explicitly limited or unsupported before misleading broad-cause gaps are emitted.
  • SC-003: Local development data, tests, and fixtures can be regenerated against the new structured contract without requiring production code to preserve the obsolete broad-reason payload shape.
  • SC-004: New runs expose enough structured gap metadata that operators can determine whether retry, backup or sync, or product follow-up is the next action in a single page visit.
  • SC-005: Replaying the same subject against the same tenant-local state yields the same stored resolution outcome and operator action category across repeated validation runs.

Definition of Done

  • Newly created compare and capture runs persist the new structured resolution contract and do not rely on the broad legacy reason as their primary semantic output.
  • Development fixtures, local data, and tests that depended on the old broad reason shape are either regenerated or intentionally removed instead of forcing the runtime to preserve obsolete semantics.
  • New domain logic introduced for this feature uses subject class, resolution outcome, and structured gap metadata as the source of truth instead of branching on the legacy broad reason.
  • Structural, operational, and transient cases are distinguishable in backend persistence and in operator-facing interpretation.
  • Baseline-supported subject types do not enter the runtime path with a silent structural resolver mismatch.