TenantAtlas/specs/177-inventory-coverage-truth/research.md
2026-04-05 14:18:37 +02:00

66 lines
7.3 KiB
Markdown

# Phase 0 Research: Inventory Coverage Truth (177)
## Context
Spec 177 corrects a semantic trust problem on the existing inventory surfaces.
The current inventory KPI widget computes `Coverage %` from restorable-item share inside `InventoryKpiHeader`, while real per-type sync truth is already persisted in canonical `OperationRun.context['inventory']['coverage']` by `InventorySyncService` and `RunInventorySyncJob`. The current `InventoryCoverage` page is config-driven and capability-first, and inventory-sync run detail still leaves the per-type result largely hidden behind generic run outcome and raw JSON.
The feature must stay derived, tenant-scoped, and operator-first.
## Decisions
### Decision: The coverage basis is the latest completed inventory-sync run with parseable per-type coverage payload
- **Rationale**: The spec needs `Succeeded`, `Failed`, `Skipped`, and `Unknown` to be visible as tenant coverage truth. The job currently writes a normalized `context.inventory.coverage` payload before terminalizing the run, including skipped and failed cases that still carry real per-type truth. The narrowest deterministic rule is therefore to select the latest completed `inventory_sync` run for the tenant whose payload can be parsed by `InventoryCoverage::fromContext()`.
- **Alternatives considered**:
- Latest succeeded or partially succeeded run only: rejected because it would hide relevant skipped or failed per-type truth that the spec explicitly wants operators to see.
- Latest attempted run regardless of payload: rejected because a run without parseable coverage payload cannot support per-type coverage truth and would collapse all rows into guesswork.
### Decision: Coverage remains fully derived from existing truth sources
- **Rationale**: The feature can answer the tenant coverage question by combining three already-existing sources: canonical per-type sync truth from `OperationRun.context.inventory.coverage`, observed-item counts from `InventoryItem`, and product capability metadata from `InventoryPolicyTypeMeta` plus `CoverageCapabilitiesResolver`. This satisfies the constitution bias toward deriving before persisting.
- **Alternatives considered**:
- New coverage table or materialized snapshot: rejected because it would duplicate current-release truth and add lifecycle overhead without new operator value.
- Writeback summary JSON to `Tenant`: rejected because the truth already belongs to the latest inventory-sync run and current observed items.
### Decision: Introduce one narrow runtime contract and resolver as siblings to `InventoryCoverage`
- **Rationale**: `InventoryCoverage` is already the canonical parser for the stored run payload. Extending it to perform tenant-scoped run lookup, item-count joins, and follow-up classification would blur responsibilities. A sibling runtime contract such as `TenantCoverageTruth` and a resolver such as `TenantCoverageTruthResolver` keep the low-level parser small while giving the UI one stable read model.
- **Alternatives considered**:
- Add more behavior directly to `InventoryCoverage`: rejected because it would mix raw payload normalization with tenant-level query and presentation concerns.
- Compute the join independently inside each page or widget: rejected because three surfaces would re-own the same truth and regress independently.
- Add request-scoped aggregate caching: rejected as unnecessary complexity for this slice.
### Decision: Replace the KPI percentage with count-based coverage facts
- **Rationale**: The spec explicitly says absolute counts are preferred over a percentage unless the percentage is narrowly qualified. Count-based facts such as succeeded types, types needing follow-up, last sync, and items observed answer the operator question directly and avoid false completeness signals.
- **Alternatives considered**:
- Keep a relabeled percentage such as `Latest sync type coverage`: rejected for the first slice because counts are clearer and avoid another interpretation layer.
- Keep the current restorable-item share with different wording: rejected because it still answers the wrong question.
### Decision: The coverage page becomes one tenant-coverage-first report with capability metadata demoted to secondary treatment
- **Rationale**: The current page already has a searchable and filterable table surface. The narrowest correction is to reuse that surface, rebuild the row model around tenant coverage truth, lead with summary + state + follow-up columns, and keep capability metadata in secondary columns or labeled reference treatment.
- **Alternatives considered**:
- Preserve the current capability-first matrix and add a separate banner: rejected because the primary semantic center would remain wrong.
- Split the page into two separate tables for tenant truth and product support: rejected as broader than needed for the first correction slice.
### Decision: Inventory-sync run detail gets one human-readable per-type coverage section under the existing enterprise-detail stack
- **Rationale**: `OperationRunResource` already uses `EnterpriseDetailBuilder` with custom view sections. Adding one `inventory_sync`-specific section under the same pattern is the narrowest way to expose per-type results without inventing a new operational page.
- **Alternatives considered**:
- Continue relying on raw context JSON: rejected because the spec explicitly forbids leaving this truth buried in JSON.
- Build a standalone inventory-sync detail page: rejected because the canonical run viewer already exists.
### Decision: Do not introduce a first-class stale or freshness coverage state in Spec 177
- **Rationale**: The spec lists stale semantics as optional secondary behavior. The current trust defect is the wrong meaning of coverage, not missing freshness taxonomy. Showing the basis timestamp is enough for this slice and avoids broadening the state family.
- **Alternatives considered**:
- Add `Stale` now as a fifth primary coverage state: rejected because it would expand scope into inventory health and freshness semantics better handled by a later follow-up spec.
### Decision: Run continuity must be RBAC-safe and explanatory when drill-through is unavailable
- **Rationale**: The spec requires that coverage surfaces never emit broken or implicitly inaccessible next actions. The UI must only link to the basis run when the user is entitled to open it; otherwise it must show clear non-clickable guidance.
- **Alternatives considered**:
- Always show the run link and let authorization fail after navigation: rejected because it creates dead-end operator flows and can leak existence.
- Hide all run references unless the user can open them: rejected because the spec still requires clear explanation of what the coverage statement is based on.
## Clarifications Resolved
- **Relevant inventory sync**: The basis run is payload-bearing and completed; outcome alone is not sufficient.
- **Unknown semantics**: `Unknown` means there is no current tenant coverage result for that supported type in the chosen basis run, even if items still exist from older observation.
- **Capability separation**: Restore mode, risk, dependency support, and similar metadata remain visible only as secondary support reference, not as coverage truth.
- **Scope limit**: No new persistence, no backend rewrite, and no freshness-state expansion are included in this slice.