# Data Model: Provider-neutral Artifact Source Taxonomy ## Existing persisted truth reused ### Finding Existing persisted finding fields already provide the raw inputs for a provider-neutral descriptor: - `workspace_id` - `managed_environment_id` - `finding_type` - optional `source` - `title` - `status` - `severity` - `evidence_jsonb` `finding_type` and `source` remain persisted provider or artifact detail. `284` adds a shared descriptor over them rather than replacing them as raw evidence. ### EvidenceSnapshotItem Existing evidence snapshot item fields already provide the current evidence-source seam: - `workspace_id` - `managed_environment_id` - `dimension_key` - `state` - `required` - `source_kind` - `source_record_type` - `source_record_id` - `source_fingerprint` - `measured_at` - `freshness_at` - `summary_payload` - `sort_order` `284` extends this seam by adding or deriving a provider-neutral descriptor so `source_record_type` stops acting as the only top-level source identity. ### StoredReport Existing stored-report truth already includes: - `workspace_id` - `managed_environment_id` - `report_type` - `payload` - `fingerprint` - `previous_fingerprint` Current report producers already write provider-owned fields such as `provider_key` into payload. `284` lifts the shared lineage fields into the common descriptor without deleting provider-owned detail. ### InventoryItem Existing inventory truth already includes: - `workspace_id` - `managed_environment_id` - `policy_type` - `external_id` - `platform` - `display_name` - `meta_jsonb` - `last_seen_at` - `last_seen_operation_run_id` `policy_type` remains provider-owned or legacy artifact detail after `284`; it no longer stands alone as the platform's only artifact type label. ## Pinned initial descriptor inventories ### `source_family` | Value | Meaning | |---|---| | `finding` | artifact lineage originates from a finding or finding-derived summary | | `stored_report` | artifact lineage originates from a stored report | | `evidence_snapshot` | artifact lineage is summarized inside an evidence snapshot item or evidence snapshot view model | | `inventory` | artifact lineage originates from inventory capture or inventory projection | | `operation_run` | artifact lineage originates from operation-run rollup evidence | ### `source_kind` | Value | Meaning | |---|---| | `model_summary` | summary derived directly from one or more model records | | `stored_report` | summary or artifact read directly from stored-report persistence | | `operation_rollup` | summary derived from operation-run history | | `inventory_projection` | summary derived from inventory read models | ### `source_target_kind` | Value | Meaning | |---|---| | `managed_environment` | artifact summarizes environment-wide state | | `governed_subject` | artifact describes one governed subject or provider object under the environment | | `provider_connection` | artifact primarily describes provider-connection state | | `operation_run` | artifact primarily describes one operation run | ## New derived contracts ### ArtifactSourceDescriptor Represents the provider-neutral lineage envelope for a finding, evidence summary, stored report, inventory item, or touched review summary. | Field | Type | Notes | |---|---|---| | `source_family` | string | One of the pinned values above | | `source_kind` | string | One of the pinned values above | | `workspace_id` | integer | Derived workspace scope anchor for the artifact | | `tenant_id` | integer | Derived tenant scope anchor for the artifact | | `provider_key` | string | Provider-neutral contract field; current repo truth emits `microsoft` only | | `provider_connection_id` | integer or null | Nullable because historical artifacts may not know the connection | | `managed_environment_id` | integer | Required managed-environment anchor inside the derived workspace and tenant scope | | `source_target_kind` | string | One of the pinned values above | | `source_target_identifier` | string or null | Optional stable target identifier such as governed-subject key, record id, or run id | | `detector_key` | string or null | Standardized field for detector or signal identity; no closed catalog in `284` v1 | | `control_key` | string or null | Existing canonical-control key when available | | `package_run_id` | integer or null | Optional future package hook only; remains null in current runtime | ### InventoryTypeDescriptor Represents the inventory-specific type split. | Field | Type | Notes | |---|---|---| | `canonical_type` | string | Platform-owned type used for top-level summary | | `provider_object_type` | string | Raw provider object type such as the existing `policy_type` value | | `provider_display_type` | string | Human-readable provider label for operators | | `legacy_policy_type` | string or null | Optional carry-forward for old readers or diagnostics | ### ArtifactProviderDetail Nested provider-owned evidence that stays below the shared descriptor. | Field | Type | Notes | |---|---|---| | `legacy_finding_type` | string or null | Existing `finding_type` where relevant | | `legacy_report_type` | string or null | Existing `report_type` where relevant | | `legacy_policy_type` | string or null | Existing inventory or drift `policy_type` where relevant | | `provider_object_type` | string or null | Raw provider object type | | `provider_display_type` | string or null | Provider-owned display label | | `detector_detail` | string or null | Provider-facing detector or signal detail | ### ArtifactSourceViewModel Shared summary contract used by touched Filament pages and presenters. | Field | Type | Notes | |---|---|---| | `headline` | string | Canonical operator-facing summary | | `source_descriptor` | `ArtifactSourceDescriptor` | Shared lineage envelope | | `provider_detail` | `ArtifactProviderDetail` | Nested provider-owned detail | | `control_summary` | array or null | Derived control label, key, and status when existing resolver provides it | | `freshness` | array or null | Existing freshness or timing metadata | ## Relationships - One managed environment can own many findings, evidence snapshot items, stored reports, and inventory items. - One finding or stored report can contribute one `ArtifactSourceDescriptor` per surfaced summary. - One evidence snapshot can contain many `ArtifactSourceDescriptor` values, one per item. - One inventory item can expose exactly one `InventoryTypeDescriptor` and one `ArtifactSourceDescriptor`. - One tenant-review section can summarize zero or more underlying artifacts but should surface one canonical source summary per summarized item. ## Legacy-read normalization rules - If a finding has `source = null`, derive `source_family` and `source_target_kind` from `finding_type` plus any qualifying evidence fields. - If a drift finding only exposes `policy_type`, derive `canonical_type` from `InventoryPolicyTypeMeta` or adjacent subject metadata, keep the raw value as `provider_object_type` or `legacy_policy_type`, and never promote it back to the top-level headline. - If a stored report payload already includes `provider_key`, reuse it; otherwise default the descriptor to the current provider for the producing service. - If an evidence summary has no single `source_record_id`, keep `source_target_identifier` nullable and prefer `managed_environment` or `governed_subject` targeting instead of inventing synthetic ids. - If inventory has no distinct provider display label, fall back to the best available metadata label while keeping `provider_object_type` separate from `canonical_type`. - If canonical-control resolution returns no control, `control_key` remains null rather than forcing a fake mapping. ## Explicit non-goals for data modeling - no `artifact_sources` table - no persisted package-run ledger - no detector registry table or config catalog - no control-catalog expansion - no full rewrite of provider-native fields out of existing tables