TenantAtlas/specs/284-provider-neutral-artifact-source-taxonomy/data-model.md
Ahmed Darrazi bf8d59e034
Some checks failed
PR Fast Feedback / fast-feedback (pull_request) Failing after 1m36s
feat: implement provider-neutral artifact source taxonomy
2026-05-09 01:45:12 +02:00

179 lines
7.8 KiB
Markdown

# Data Model: Provider-neutral Artifact Source Taxonomy
## Existing persisted truth reused
### Finding
Existing persisted finding fields already provide the raw inputs for a provider-neutral descriptor:
- `workspace_id`
- `managed_environment_id`
- `finding_type`
- optional `source`
- `title`
- `status`
- `severity`
- `evidence_jsonb`
`finding_type` and `source` remain persisted provider or artifact detail. `284` adds a shared descriptor over them rather than replacing them as raw evidence.
### EvidenceSnapshotItem
Existing evidence snapshot item fields already provide the current evidence-source seam:
- `workspace_id`
- `managed_environment_id`
- `dimension_key`
- `state`
- `required`
- `source_kind`
- `source_record_type`
- `source_record_id`
- `source_fingerprint`
- `measured_at`
- `freshness_at`
- `summary_payload`
- `sort_order`
`284` extends this seam by adding or deriving a provider-neutral descriptor so `source_record_type` stops acting as the only top-level source identity.
### StoredReport
Existing stored-report truth already includes:
- `workspace_id`
- `managed_environment_id`
- `report_type`
- `payload`
- `fingerprint`
- `previous_fingerprint`
Current report producers already write provider-owned fields such as `provider_key` into payload. `284` lifts the shared lineage fields into the common descriptor without deleting provider-owned detail.
### InventoryItem
Existing inventory truth already includes:
- `workspace_id`
- `managed_environment_id`
- `policy_type`
- `external_id`
- `platform`
- `display_name`
- `meta_jsonb`
- `last_seen_at`
- `last_seen_operation_run_id`
`policy_type` remains provider-owned or legacy artifact detail after `284`; it no longer stands alone as the platform's only artifact type label.
## Pinned initial descriptor inventories
### `source_family`
| Value | Meaning |
|---|---|
| `finding` | artifact lineage originates from a finding or finding-derived summary |
| `stored_report` | artifact lineage originates from a stored report |
| `evidence_snapshot` | artifact lineage is summarized inside an evidence snapshot item or evidence snapshot view model |
| `inventory` | artifact lineage originates from inventory capture or inventory projection |
| `operation_run` | artifact lineage originates from operation-run rollup evidence |
### `source_kind`
| Value | Meaning |
|---|---|
| `model_summary` | summary derived directly from one or more model records |
| `stored_report` | summary or artifact read directly from stored-report persistence |
| `operation_rollup` | summary derived from operation-run history |
| `inventory_projection` | summary derived from inventory read models |
### `source_target_kind`
| Value | Meaning |
|---|---|
| `managed_environment` | artifact summarizes environment-wide state |
| `governed_subject` | artifact describes one governed subject or provider object under the environment |
| `provider_connection` | artifact primarily describes provider-connection state |
| `operation_run` | artifact primarily describes one operation run |
## New derived contracts
### ArtifactSourceDescriptor
Represents the provider-neutral lineage envelope for a finding, evidence summary, stored report, inventory item, or touched review summary.
| Field | Type | Notes |
|---|---|---|
| `source_family` | string | One of the pinned values above |
| `source_kind` | string | One of the pinned values above |
| `workspace_id` | integer | Derived workspace scope anchor for the artifact |
| `tenant_id` | integer | Derived tenant scope anchor for the artifact |
| `provider_key` | string | Provider-neutral contract field; current repo truth emits `microsoft` only |
| `provider_connection_id` | integer or null | Nullable because historical artifacts may not know the connection |
| `managed_environment_id` | integer | Required managed-environment anchor inside the derived workspace and tenant scope |
| `source_target_kind` | string | One of the pinned values above |
| `source_target_identifier` | string or null | Optional stable target identifier such as governed-subject key, record id, or run id |
| `detector_key` | string or null | Standardized field for detector or signal identity; no closed catalog in `284` v1 |
| `control_key` | string or null | Existing canonical-control key when available |
| `package_run_id` | integer or null | Optional future package hook only; remains null in current runtime |
### InventoryTypeDescriptor
Represents the inventory-specific type split.
| Field | Type | Notes |
|---|---|---|
| `canonical_type` | string | Platform-owned type used for top-level summary |
| `provider_object_type` | string | Raw provider object type such as the existing `policy_type` value |
| `provider_display_type` | string | Human-readable provider label for operators |
| `legacy_policy_type` | string or null | Optional carry-forward for old readers or diagnostics |
### ArtifactProviderDetail
Nested provider-owned evidence that stays below the shared descriptor.
| Field | Type | Notes |
|---|---|---|
| `legacy_finding_type` | string or null | Existing `finding_type` where relevant |
| `legacy_report_type` | string or null | Existing `report_type` where relevant |
| `legacy_policy_type` | string or null | Existing inventory or drift `policy_type` where relevant |
| `provider_object_type` | string or null | Raw provider object type |
| `provider_display_type` | string or null | Provider-owned display label |
| `detector_detail` | string or null | Provider-facing detector or signal detail |
### ArtifactSourceViewModel
Shared summary contract used by touched Filament pages and presenters.
| Field | Type | Notes |
|---|---|---|
| `headline` | string | Canonical operator-facing summary |
| `source_descriptor` | `ArtifactSourceDescriptor` | Shared lineage envelope |
| `provider_detail` | `ArtifactProviderDetail` | Nested provider-owned detail |
| `control_summary` | array or null | Derived control label, key, and status when existing resolver provides it |
| `freshness` | array or null | Existing freshness or timing metadata |
## Relationships
- One managed environment can own many findings, evidence snapshot items, stored reports, and inventory items.
- One finding or stored report can contribute one `ArtifactSourceDescriptor` per surfaced summary.
- One evidence snapshot can contain many `ArtifactSourceDescriptor` values, one per item.
- One inventory item can expose exactly one `InventoryTypeDescriptor` and one `ArtifactSourceDescriptor`.
- One tenant-review section can summarize zero or more underlying artifacts but should surface one canonical source summary per summarized item.
## Legacy-read normalization rules
- If a finding has `source = null`, derive `source_family` and `source_target_kind` from `finding_type` plus any qualifying evidence fields.
- If a drift finding only exposes `policy_type`, derive `canonical_type` from `InventoryPolicyTypeMeta` or adjacent subject metadata, keep the raw value as `provider_object_type` or `legacy_policy_type`, and never promote it back to the top-level headline.
- If a stored report payload already includes `provider_key`, reuse it; otherwise default the descriptor to the current provider for the producing service.
- If an evidence summary has no single `source_record_id`, keep `source_target_identifier` nullable and prefer `managed_environment` or `governed_subject` targeting instead of inventing synthetic ids.
- If inventory has no distinct provider display label, fall back to the best available metadata label while keeping `provider_object_type` separate from `canonical_type`.
- If canonical-control resolution returns no control, `control_key` remains null rather than forcing a fake mapping.
## Explicit non-goals for data modeling
- no `artifact_sources` table
- no persisted package-run ledger
- no detector registry table or config catalog
- no control-catalog expansion
- no full rewrite of provider-native fields out of existing tables