TenantAtlas/specs/284-provider-neutral-artifact-source-taxonomy/research.md
Ahmed Darrazi bf8d59e034
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 1m36s
feat: implement provider-neutral artifact source taxonomy
2026-05-09 01:45:12 +02:00

92 lines
6.4 KiB
Markdown

# Research: Provider-neutral Artifact Source Taxonomy
## Decision 1: Use one shared descriptor over existing artifact truth, not a new artifact table
- **Decision**: represent provider-neutral artifact lineage through one shared descriptor carried by existing finding, evidence, stored-report, inventory, and review-summary seams.
- **Why**: the repo already stores the underlying truth in `Finding`, `EvidenceSnapshotItem`, `StoredReport`, and `InventoryItem`. A new artifact-source table would duplicate that truth and create lifecycle or ownership questions that the current release does not need.
- **Alternatives considered**:
- new `artifact_sources` table: rejected because it adds persistence and drift risk with no current-release operator value
- page-local aliasing only: rejected because it would preserve conflicting summaries across findings, evidence, reports, inventory, and review sections
## Decision 2: Pin exact inventories for `source_family`, `source_kind`, and `source_target_kind`
- **Decision**: keep the initial inventories exact and small.
- **Pinned `source_family` set**:
- `finding`
- `stored_report`
- `evidence_snapshot`
- `inventory`
- `operation_run`
- **Pinned `source_kind` set**:
- `model_summary`
- `stored_report`
- `operation_rollup`
- `inventory_projection`
- **Pinned `source_target_kind` set**:
- `managed_environment`
- `governed_subject`
- `provider_connection`
- `operation_run`
- **Why**: the repo memory and readiness rules require exact inventories when a package introduces a bounded semantic family. Keeping the set explicit prevents later prep or implementation drift.
- **Alternatives considered**:
- open-ended family strings with only prose guidance: rejected because readiness analysis can flag vague inventories as premature
- predeclaring package-output or multi-provider families now: rejected because those values are future-facing and not required by current repo truth
## Decision 3: Standardize `detector_key` and `control_key` placement without creating new registries
- **Decision**: `284` standardizes where `detector_key` and `control_key` live in the shared descriptor and touched view models, but it does not introduce a closed detector catalog or a broader control-catalog expansion.
- **Why**: the repo already has working canonical-control resolution. The real problem is inconsistent placement and summary wording, not the absence of a second registry.
- **Alternatives considered**:
- detector catalog or detector registry: rejected because it is future-facing and wider than current repo truth
- control-catalog expansion in the same slice: rejected because `284` is about artifact-source semantics, not broader control governance
## Decision 4: Keep provider-native fields as nested detail
- **Decision**: `finding_type`, `report_type`, raw `policy_type`, provider object types, report domains, and Graph-facing detector detail remain provider-owned nested evidence.
- **Why**: the current release is still Microsoft-first in runtime. The goal is to stop using provider-native fields as top-level platform truth, not to erase them.
- **Alternatives considered**:
- full generic rewrite of provider detail: rejected because it would over-abstract current repo truth
- leaving provider-native fields as top-level summary nouns: rejected because that preserves the current artifact interpretation drift
## Decision 5: Inventory type separation should live beside existing inventory metadata helpers
- **Decision**: keep `canonical_type`, `provider_object_type`, and `provider_display_type` close to `InventoryPolicyTypeMeta` and the inventory read model rather than creating a new cross-product taxonomy engine.
- **Why**: `InventoryPolicyTypeMeta` is already the narrowest place where inventory type meaning is derived and displayed.
- **Alternatives considered**:
- new global type registry for every artifact family: rejected because it is broader than the current inventory-only problem
- leaving inventory on raw `policy_type`: rejected because it would keep one of the explicit 284 acceptance gaps alive
## Decision 6: Legacy rows should normalize on read, not through backfill
- **Decision**: preserve the candidate's no-backfill rule and normalize legacy artifacts on read or during future writes only.
- **Why**: the repo is still pre-production, but `284` does not need a backfill program to deliver operator and contributor value. Read-time normalization is enough for current artifact families.
- **Alternatives considered**:
- historical backfill migration: rejected because it adds risk and operational work without increasing the core value of the slice
- leaving legacy rows unreadable until rewritten: rejected because acceptance requires current Microsoft outputs to remain valid as Microsoft provider sources
## Decision 7: Support or AI alignment stays bounded and package runtime remains deferred
- **Decision**: if `SupportDiagnosticBundleBuilder`, `AiUseCaseCatalog`, or adjacent `source_family` consumers are touched, align them to the pinned source-family nouns only. Keep `package_run_id` optional and nullable; do not create package-execution runtime.
- **Why**: the candidate explicitly says later package execution should be able to build on the descriptor, but `284` must not implement package runtime now.
- **Alternatives considered**:
- package-output or package-run implementation in the same slice: rejected because it is adjacent future work
- ignoring existing `source_family` consumers entirely: rejected because they can become a second naming drift if touched later without the 284 vocabulary
## Implementation prerequisites present in current repo truth
- Spec `281` provider-neutral provider-connection scope is already present in repo runtime.
- Spec `282` workspace-first artifact surfaces are already present in repo runtime.
- Spec `283` provider capability registry is already present in repo runtime.
Because those inherited prerequisites are already present on the current branch, the remaining blocker is narrower: runtime work for `284` stays `prerequisite-blocked` until SCOPE-001 ownership compliance for the touched tenant-owned artifact tables is satisfied or explicitly excepted.
## Explicit non-goals carried into design
- no new artifact table or ledger
- no provider framework
- no detector registry
- no full control-catalog expansion
- no package runtime or package-output surfaces
- no historical backfill
- no workspace-first RBAC redesign
- no copy or localization neutralization