TenantAtlas/specs/284-provider-neutral-artifact-source-taxonomy/data-model.md
Ahmed Darrazi bf8d59e034
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 1m36s
feat: implement provider-neutral artifact source taxonomy
2026-05-09 01:45:12 +02:00

7.8 KiB

Data Model: Provider-neutral Artifact Source Taxonomy

Existing persisted truth reused

Finding

Existing persisted finding fields already provide the raw inputs for a provider-neutral descriptor:

  • workspace_id
  • managed_environment_id
  • finding_type
  • optional source
  • title
  • status
  • severity
  • evidence_jsonb

finding_type and source remain persisted provider or artifact detail. 284 adds a shared descriptor over them rather than replacing them as raw evidence.

EvidenceSnapshotItem

Existing evidence snapshot item fields already provide the current evidence-source seam:

  • workspace_id
  • managed_environment_id
  • dimension_key
  • state
  • required
  • source_kind
  • source_record_type
  • source_record_id
  • source_fingerprint
  • measured_at
  • freshness_at
  • summary_payload
  • sort_order

284 extends this seam by adding or deriving a provider-neutral descriptor so source_record_type stops acting as the only top-level source identity.

StoredReport

Existing stored-report truth already includes:

  • workspace_id
  • managed_environment_id
  • report_type
  • payload
  • fingerprint
  • previous_fingerprint

Current report producers already write provider-owned fields such as provider_key into payload. 284 lifts the shared lineage fields into the common descriptor without deleting provider-owned detail.

InventoryItem

Existing inventory truth already includes:

  • workspace_id
  • managed_environment_id
  • policy_type
  • external_id
  • platform
  • display_name
  • meta_jsonb
  • last_seen_at
  • last_seen_operation_run_id

policy_type remains provider-owned or legacy artifact detail after 284; it no longer stands alone as the platform's only artifact type label.

Pinned initial descriptor inventories

source_family

Value Meaning
finding artifact lineage originates from a finding or finding-derived summary
stored_report artifact lineage originates from a stored report
evidence_snapshot artifact lineage is summarized inside an evidence snapshot item or evidence snapshot view model
inventory artifact lineage originates from inventory capture or inventory projection
operation_run artifact lineage originates from operation-run rollup evidence

source_kind

Value Meaning
model_summary summary derived directly from one or more model records
stored_report summary or artifact read directly from stored-report persistence
operation_rollup summary derived from operation-run history
inventory_projection summary derived from inventory read models

source_target_kind

Value Meaning
managed_environment artifact summarizes environment-wide state
governed_subject artifact describes one governed subject or provider object under the environment
provider_connection artifact primarily describes provider-connection state
operation_run artifact primarily describes one operation run

New derived contracts

ArtifactSourceDescriptor

Represents the provider-neutral lineage envelope for a finding, evidence summary, stored report, inventory item, or touched review summary.

Field Type Notes
source_family string One of the pinned values above
source_kind string One of the pinned values above
workspace_id integer Derived workspace scope anchor for the artifact
tenant_id integer Derived tenant scope anchor for the artifact
provider_key string Provider-neutral contract field; current repo truth emits microsoft only
provider_connection_id integer or null Nullable because historical artifacts may not know the connection
managed_environment_id integer Required managed-environment anchor inside the derived workspace and tenant scope
source_target_kind string One of the pinned values above
source_target_identifier string or null Optional stable target identifier such as governed-subject key, record id, or run id
detector_key string or null Standardized field for detector or signal identity; no closed catalog in 284 v1
control_key string or null Existing canonical-control key when available
package_run_id integer or null Optional future package hook only; remains null in current runtime

InventoryTypeDescriptor

Represents the inventory-specific type split.

Field Type Notes
canonical_type string Platform-owned type used for top-level summary
provider_object_type string Raw provider object type such as the existing policy_type value
provider_display_type string Human-readable provider label for operators
legacy_policy_type string or null Optional carry-forward for old readers or diagnostics

ArtifactProviderDetail

Nested provider-owned evidence that stays below the shared descriptor.

Field Type Notes
legacy_finding_type string or null Existing finding_type where relevant
legacy_report_type string or null Existing report_type where relevant
legacy_policy_type string or null Existing inventory or drift policy_type where relevant
provider_object_type string or null Raw provider object type
provider_display_type string or null Provider-owned display label
detector_detail string or null Provider-facing detector or signal detail

ArtifactSourceViewModel

Shared summary contract used by touched Filament pages and presenters.

Field Type Notes
headline string Canonical operator-facing summary
source_descriptor ArtifactSourceDescriptor Shared lineage envelope
provider_detail ArtifactProviderDetail Nested provider-owned detail
control_summary array or null Derived control label, key, and status when existing resolver provides it
freshness array or null Existing freshness or timing metadata

Relationships

  • One managed environment can own many findings, evidence snapshot items, stored reports, and inventory items.
  • One finding or stored report can contribute one ArtifactSourceDescriptor per surfaced summary.
  • One evidence snapshot can contain many ArtifactSourceDescriptor values, one per item.
  • One inventory item can expose exactly one InventoryTypeDescriptor and one ArtifactSourceDescriptor.
  • One tenant-review section can summarize zero or more underlying artifacts but should surface one canonical source summary per summarized item.

Legacy-read normalization rules

  • If a finding has source = null, derive source_family and source_target_kind from finding_type plus any qualifying evidence fields.
  • If a drift finding only exposes policy_type, derive canonical_type from InventoryPolicyTypeMeta or adjacent subject metadata, keep the raw value as provider_object_type or legacy_policy_type, and never promote it back to the top-level headline.
  • If a stored report payload already includes provider_key, reuse it; otherwise default the descriptor to the current provider for the producing service.
  • If an evidence summary has no single source_record_id, keep source_target_identifier nullable and prefer managed_environment or governed_subject targeting instead of inventing synthetic ids.
  • If inventory has no distinct provider display label, fall back to the best available metadata label while keeping provider_object_type separate from canonical_type.
  • If canonical-control resolution returns no control, control_key remains null rather than forcing a fake mapping.

Explicit non-goals for data modeling

  • no artifact_sources table
  • no persisted package-run ledger
  • no detector registry table or config catalog
  • no control-catalog expansion
  • no full rewrite of provider-native fields out of existing tables