TenantAtlas/specs/284-provider-neutral-artifact-source-taxonomy/research.md
ahmido 75ebade345 feat: implement provider-neutral artifact source taxonomy (#343)
## Summary

Implements Spec 284 for provider-neutral artifact source taxonomy.

- add shared artifact source descriptor, resolver, taxonomy, and provider-detail support
- update findings, evidence snapshots, stored reports, inventory items, and tenant review surfaces to disclose descriptor-first artifact summaries
- add bounded Pest unit, feature, guard, and browser coverage for the taxonomy slice
- include the completed Spec 284 package artifacts under `specs/284-provider-neutral-artifact-source-taxonomy/`

## Notes

- branch: `284-provider-neutral-artifact-source-taxonomy`
- commit: `bf8d59e0`
- this PR was created as part of the requested commit/push/PR flow against `platform-dev`

Co-authored-by: Ahmed Darrazi <ahmed.darrazi@live.de>
Reviewed-on: #343
2026-05-08 23:47:31 +00:00

6.4 KiB

Research: Provider-neutral Artifact Source Taxonomy

Decision 1: Use one shared descriptor over existing artifact truth, not a new artifact table

  • Decision: represent provider-neutral artifact lineage through one shared descriptor carried by existing finding, evidence, stored-report, inventory, and review-summary seams.
  • Why: the repo already stores the underlying truth in Finding, EvidenceSnapshotItem, StoredReport, and InventoryItem. A new artifact-source table would duplicate that truth and create lifecycle or ownership questions that the current release does not need.
  • Alternatives considered:
    • new artifact_sources table: rejected because it adds persistence and drift risk with no current-release operator value
    • page-local aliasing only: rejected because it would preserve conflicting summaries across findings, evidence, reports, inventory, and review sections

Decision 2: Pin exact inventories for source_family, source_kind, and source_target_kind

  • Decision: keep the initial inventories exact and small.
  • Pinned source_family set:
    • finding
    • stored_report
    • evidence_snapshot
    • inventory
    • operation_run
  • Pinned source_kind set:
    • model_summary
    • stored_report
    • operation_rollup
    • inventory_projection
  • Pinned source_target_kind set:
    • managed_environment
    • governed_subject
    • provider_connection
    • operation_run
  • Why: the repo memory and readiness rules require exact inventories when a package introduces a bounded semantic family. Keeping the set explicit prevents later prep or implementation drift.
  • Alternatives considered:
    • open-ended family strings with only prose guidance: rejected because readiness analysis can flag vague inventories as premature
    • predeclaring package-output or multi-provider families now: rejected because those values are future-facing and not required by current repo truth

Decision 3: Standardize detector_key and control_key placement without creating new registries

  • Decision: 284 standardizes where detector_key and control_key live in the shared descriptor and touched view models, but it does not introduce a closed detector catalog or a broader control-catalog expansion.
  • Why: the repo already has working canonical-control resolution. The real problem is inconsistent placement and summary wording, not the absence of a second registry.
  • Alternatives considered:
    • detector catalog or detector registry: rejected because it is future-facing and wider than current repo truth
    • control-catalog expansion in the same slice: rejected because 284 is about artifact-source semantics, not broader control governance

Decision 4: Keep provider-native fields as nested detail

  • Decision: finding_type, report_type, raw policy_type, provider object types, report domains, and Graph-facing detector detail remain provider-owned nested evidence.
  • Why: the current release is still Microsoft-first in runtime. The goal is to stop using provider-native fields as top-level platform truth, not to erase them.
  • Alternatives considered:
    • full generic rewrite of provider detail: rejected because it would over-abstract current repo truth
    • leaving provider-native fields as top-level summary nouns: rejected because that preserves the current artifact interpretation drift

Decision 5: Inventory type separation should live beside existing inventory metadata helpers

  • Decision: keep canonical_type, provider_object_type, and provider_display_type close to InventoryPolicyTypeMeta and the inventory read model rather than creating a new cross-product taxonomy engine.
  • Why: InventoryPolicyTypeMeta is already the narrowest place where inventory type meaning is derived and displayed.
  • Alternatives considered:
    • new global type registry for every artifact family: rejected because it is broader than the current inventory-only problem
    • leaving inventory on raw policy_type: rejected because it would keep one of the explicit 284 acceptance gaps alive

Decision 6: Legacy rows should normalize on read, not through backfill

  • Decision: preserve the candidate's no-backfill rule and normalize legacy artifacts on read or during future writes only.
  • Why: the repo is still pre-production, but 284 does not need a backfill program to deliver operator and contributor value. Read-time normalization is enough for current artifact families.
  • Alternatives considered:
    • historical backfill migration: rejected because it adds risk and operational work without increasing the core value of the slice
    • leaving legacy rows unreadable until rewritten: rejected because acceptance requires current Microsoft outputs to remain valid as Microsoft provider sources

Decision 7: Support or AI alignment stays bounded and package runtime remains deferred

  • Decision: if SupportDiagnosticBundleBuilder, AiUseCaseCatalog, or adjacent source_family consumers are touched, align them to the pinned source-family nouns only. Keep package_run_id optional and nullable; do not create package-execution runtime.
  • Why: the candidate explicitly says later package execution should be able to build on the descriptor, but 284 must not implement package runtime now.
  • Alternatives considered:
    • package-output or package-run implementation in the same slice: rejected because it is adjacent future work
    • ignoring existing source_family consumers entirely: rejected because they can become a second naming drift if touched later without the 284 vocabulary

Implementation prerequisites present in current repo truth

  • Spec 281 provider-neutral provider-connection scope is already present in repo runtime.
  • Spec 282 workspace-first artifact surfaces are already present in repo runtime.
  • Spec 283 provider capability registry is already present in repo runtime.

Because those inherited prerequisites are already present on the current branch, the remaining blocker is narrower: runtime work for 284 stays prerequisite-blocked until SCOPE-001 ownership compliance for the touched tenant-owned artifact tables is satisfied or explicitly excepted.

Explicit non-goals carried into design

  • no new artifact table or ledger
  • no provider framework
  • no detector registry
  • no full control-catalog expansion
  • no package runtime or package-output surfaces
  • no historical backfill
  • no workspace-first RBAC redesign
  • no copy or localization neutralization