ADR-010: Ontology Lifecycle, Conflict Resolution & Conflict-Driven Investigation
Status: Proposed Date: 2026-03-29 Author: Atlas Architecture Depends on: ADR-007 (Workflow Ontology Engine), ADR-008 (EBA Risk Matrix Engine), ADR-008a (Reference Data Registry), ADR-009 (Data Provider Plugin Architecture) Impacts: ADR-013 (Analyst Interaction Layer), Schema Mapping Designer (Compliance Studio)
Table of Contents
- Context & Problem Statement
- Decision Summary
- Three-Schema Model
- Custom Ontology Schema Lifecycle
- Field Configuration Model
- Conflict Detection & Response
- Conflict-Driven Investigation Triggers
- Mutation Queue & Curation Layer
- Analyst as Data Source
- Live Data Preview
- Schema Persistence & Versioning
- Evaluations
- Data Source Schema Management
- Compliance Studio Organization
- Future Data Source Types
- Relationship Modeling
- Schema Composition & Inheritance
- Scoring Projection Integration
- Database Schema
- Migration Path
- Rejected Alternatives
- Open Questions
1. Context & Problem Statement
Atlas collects KYC/AML data from multiple sources — government registries (KVK, NorthData), investigation agents (SPEPWS, AMLRR, MEBO, ROA), screening databases, and manual analyst input. This data populates an entity ontology, which then feeds a deterministic risk matrix (ADR-008) to produce risk scores.
The current architecture has several gaps:
-
No unified schema design tool. The ontology schema (
ontology_schema_v3.yaml) is a static YAML file maintained by engineers. Compliance officers cannot design or customize the data model for their jurisdiction, risk appetite, or regulatory requirements without code changes. -
No conflict detection. When two data sources disagree on a field value — KVK says 50 employees, NorthData says 45 — the system has no formal mechanism to detect, classify, or respond to the disagreement. Values are silently overwritten based on implicit ordering.
-
No conflict-driven investigation. A change in employee count might be noise, but a change in UBO count is a material compliance event that should trigger investigation. There is no way to configure per-field responses to data conflicts.
-
No ontology lifecycle. The ontology is populated once during initial investigation and then treated as static. There is no mechanism for ongoing monitoring, scheduled re-verification, or detecting meaningful changes over time.
-
No schema persistence or versioning. Schema configurations, field mappings, merge rules, and conflict responses exist only in code. They cannot be saved, versioned, audited, or shared between tenants.
-
Analyst input is a second-class citizen. Analyst overrides use a generic
field_path+override_valuepattern rather than having a proper schema with typed fields, validation, and provenance. -
No live validation. Compliance officers designing a schema cannot preview how real data flows through their mappings, making it impossible to validate the design before deploying it.
This ADR defines the ontology lifecycle — from schema design through initial population, ongoing monitoring, conflict detection, conflict-driven investigation, and schema evolution.
2. Decision Summary
Build a Custom Ontology Schema system with the following properties:
- Three-schema model: source schemas (fixed per data provider), custom ontology schemas (designed by compliance officers), and risk matrix schemas (EBA dimensions). Each layer is independently versioned.
- Per-field conflict configuration: each ontology field has a merge strategy (how to pick a value), a conflict response (what to do when sources disagree), and an optional investigation trigger (which agent to invoke when conflict exceeds a threshold).
- Mutation queue with curation: all data changes — from providers, agents, and analysts — enter as proposed mutations. Alert rules evaluate each mutation and route it to auto-accept, flag-for-review, or investigate.
- Analyst as a first-class data source: analysts have their own typed schema per custom ontology, designed alongside other source schemas in the Compliance Studio.
- Schema persistence with immutable versioning: draft schemas are mutable; published schemas are frozen. Every evaluation pins to a specific schema version.
- Live data preview: during schema design, compliance officers can pull real data through their mappings to validate the design.
- Evaluation framework: run a schema against test entities to measure coverage, conflict rates, and scoring completeness before publishing.
3. Three-Schema Model
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ COMPLIANCE STUDIO │
│ │
│ ┌─────────────┐ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ Source │ │ Custom Ontology │ │ Risk Matrix │ │
│ │ Schemas │──▶│ Schema │──▶│ Schema │ │
│ │ (read-only) │ │ (editable) │ │ (EBA dimensions) │ │
│ └─────────────┘ └──────────────────┘ └─────────────────────┘ │
│ │
│ KVK Schema NL Corporate KYC Customer Risk │
│ NorthData Schema ┌──────────────── ─┐ Geographic Risk │
│ SPEPWS Schema │ LegalEntity │ Product / Service Risk │
│ AMLRR Schema │ Person │ Delivery Channel Risk │
│ MEBO Schema │ Address │ Transaction Risk │
│ ROA Schema │ Evidence │ │
│ Analyst Schema │ Ownership (rel) │ │
│ (per-ontology) └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
Source Schemas (Layer 1)
Source schemas are versioned source contracts. For provider-backed sources, the contract originates in the plugin manifest (mapping_spec.yaml / output_schema from ADR-009). For tenant-defined sources (manual forms, file uploads, database adapters), the contract is authored in Studio and then published as an immutable version.
Each source contract declares the entities it can emit, the fields within each entity, field types, cardinality, and relationship payloads if applicable. The Studio displays the published contract versions as the "left column" in the Schema Mapping Designer.
Key principle: all data sources are equal as evidence producers, but not all sources are governed the same way. Provider contracts are published by plugin authors and are read-only to tenants. Tenant administrators may configure a source adapter (credentials, trust level, endpoint settings, transforms, activation), but they do not mutate the provider contract itself.
Audit rule: every published ontology schema must pin the exact source contract versions it was designed against, and every evaluation or live run must record the contract versions and evidence snapshot used.
Source Contracts vs Tenant Adapters
To keep ownership boundaries clear, ADR-010 distinguishes three artifacts:
- Source contract — immutable schema for what a source can emit
- Tenant source adapter — tenant-specific operational config for how Atlas connects to and trusts that source
- Source-to-ontology mapping — ontology-specific mapping from contract fields into the tenant's custom ontology
This prevents a tenant from "editing KVK" while still allowing tenant-specific configuration and normalization.
Custom Ontology Schema (Layer 2)
Designed by the compliance officer in the Compliance Studio. This is the core data model for a specific compliance use case (e.g., "NL Corporate KYC", "DE Enhanced Due Diligence", "UK Simplified CDD").
The compliance officer:
- Creates entity groups (LegalEntity, Person, Address, etc.)
- Adds fields to each group with types, descriptions, and constraints
- Defines identity keys for entity resolution
- Configures per-field merge rules, conflict responses, and investigation triggers
- Wires source schema fields to ontology fields (source→ontology mappings)
- Wires ontology fields to risk matrix factors (ontology→matrix mappings)
Risk Matrix Schema (Layer 3)
Defined by the EBA risk matrix configuration (ADR-008). The five EBA dimensions and their factors are the "right column" in the Schema Mapping Designer. Each factor declares which ontology fields it reads, what reduction strategy to apply for multi-value fields, and how to handle missing values.
4. Custom Ontology Schema Lifecycle
┌──────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐
│ DRAFT │────▶│ EVALUATED │────▶│ PUBLISHED │────▶│ ARCHIVED │
│ │ │ │ │ │ │ │
│ Mutable │ │ Validated │ │ Immutable │ │ Frozen │
│ No data │ │ Test data │ │ Live data │ │ Replaced │
│ flow │ │ preview │ │ flow │ │ by newer │
└──────────┘ └───────────┘ └───────────┘ └──────────┘
▲ │
│ │
└────── fork (new version) ──────────┘
States
| State | Mutable | Data Flow | Scoring | Description |
|---|---|---|---|---|
draft | Yes | Preview only | No | Work in progress. Can be edited freely. Not used for live investigations. |
evaluated | Yes (minor edits) | Test set | Simulation | Passed evaluation criteria. Test data has been run through the schema to validate coverage and correctness. |
published | No | Live | Yes | Active production schema. At most one published version per schema_line_id per tenant. Immutable. |
archived | No | None | Historical only | Superseded by a newer published version. Retained for audit trail. Evaluations performed under this version reference it permanently. |
Versioning
Each schema belongs to a schema line — a named, tenant-scoped identity (e.g., "NL Corporate KYC"). Within a line, versions are sequential integers. Forking a published version creates a new draft at the next version number.
NL Corporate KYC
v1 (published → archived)
v2 (published, active)
v3 (draft, in progress)
5. Field Configuration Model
Each field in the custom ontology has three configuration dimensions:
5.1 Merge Strategy (How to pick a value)
Determines which value "wins" when multiple sources provide data for the same field.
| Strategy | Label | Behavior |
|---|---|---|
first_available | 1st | Uses the first non-null value, ordered by source trust level (descending) |
highest_trust | HT | Always picks the value from the highest-trust source, even if a lower-trust source responded first |
latest | NEW | Uses the most recently updated value regardless of trust |
manual_only | MAN | Only accepts values from the analyst data source; ignores automated sources |
any_true | ANY | For booleans: true if any source reports true |
count_distinct | CNT | For collections: counts unique values across all sources |
accumulate | ACC | For collections: merges all values from all sources (addresses, trade names, etc.) |
5.2 Conflict Response (What to do when sources disagree)
Determines the system's response when the resolved value differs from values provided by other sources.
| Response | Icon | Behavior |
|---|---|---|
accept_trusted | Grey shield | Accept the merge strategy winner. Log the discrepancy in the mutation audit trail. No further action. |
flag_review | Yellow shield | Accept the merge strategy winner provisionally. Create a review task in the analyst inbox. The field is marked as "pending review" in the ontology until an analyst confirms or overrides. |
freeze_investigate | Red shield | Do not update the field. Freeze at the last analyst-verified value. Trigger an investigation (see §7). The field remains frozen until the investigation completes and an analyst resolves the conflict. |
5.3 Investigation Trigger (Which investigation to launch)
Only applicable when conflict response is freeze_investigate. Configures:
- Agent: which investigation agent to invoke (MEBO for ownership conflicts, ROA for address conflicts, SPEPWS for sanctions changes, etc.)
- Threshold: optional condition to trigger (e.g., "only investigate if delta > 10%" or "only if the value changes from true to false")
- Priority: investigation priority level (critical, high, medium)
- Scope: whether the investigation re-examines just this field or the entire entity group
Visual Representation in Schema Mapping Designer
Each ontology field row displays:
[Left Port] [Source Badges] [🔑] field_name [MergeRule] [ConflictShield] [REQ] type [×] [Right Port]
The conflict shield is clickable and opens a configuration panel:
┌─────────────────────────────────────┐
│ Conflict Response: ownership_pct │
│ │
│ ○ Accept trusted source │
│ ○ Flag for analyst review │
│ ● Freeze & investigate │
│ │
│ Investigation Agent: [MEBO ▾] │
│ Threshold: [delta > 5%] │
│ Priority: [High ▾] │
│ Scope: [Entity group ▾] │
└─────────────────────────────────────┘
6. Conflict Detection & Response
Detection Flow
Source provides new value
│
▼
┌─────────────────────────┐
│ Compare against current │
│ ontology field value │
│ AND other source values │
├─────────────────────────┤
│ Same value? │──── Yes ──▶ Accept silently
│ │
│ Different value? │──── ▼
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Apply merge strategy │
│ to determine "winner" │
├─────────────────────────┤
│ Winner == current? │──── Yes ──▶ Log discrepancy, no change
│ │
│ Winner != current? │──── ▼ ──── Mutation proposed
└─────────────────────────┘
│
▼
┌─────────────────────────┐
│ Apply conflict response │
│ from field config │
├─────────────────────────┤
│ accept_trusted │──── Accept mutation, log audit
│ flag_review │──── Accept mutation, create review task
│ freeze_investigate │──── Reject mutation, freeze field, trigger investigation
└─────────────────────────┘
Conflict Classification
Not all disagreements are equal. The system classifies conflicts by materiality:
| Category | Examples | Typical Response |
|---|---|---|
| Formatting | "Bedrijf ABC B.V." vs "Bedrijf ABC BV" | Normalize, accept_trusted |
| Staleness | Employee count 50 (Jan) vs 52 (Mar) | Accept latest, log |
| Material | UBO count 2 vs 3, sanctions status change | freeze_investigate |
| Contradictory | Status "active" vs "dissolved" | freeze_investigate + critical priority |
The compliance officer configures which category applies to each field via the conflict response setting. The system does not auto-classify — the officer makes the determination based on regulatory requirements and risk appetite.
Conflict Record
Every detected conflict produces an immutable ConflictRecord:
conflict_id: "cf-2026-0329-001"
entity_id: "ent-abc123"
entity_type: "Person"
field_name: "ownership_percentage"
schema_version: "nl-corporate-kyc-v2"
# What happened
current_value: 25.0
current_source: "mebo"
current_verified_at: "2026-02-15T10:30:00Z"
current_verified_by: "analyst:jvandijk"
incoming_value: 33.3
incoming_source: "northdata"
incoming_received_at: "2026-03-29T08:15:00Z"
# How it was handled
conflict_response: "freeze_investigate"
merge_strategy: "highest_trust"
merge_winner: "mebo" # would have picked MEBO's value, but frozen instead
# Investigation (if triggered)
investigation_id: "inv-2026-0329-002"
investigation_agent: "mebo"
investigation_priority: "high"
investigation_status: "pending"
# Resolution (filled later)
resolved_at: null
resolved_by: null
resolved_value: null
resolution_justification: null
7. Conflict-Driven Investigation Triggers
Core Concept
When an ontology field is configured with freeze_investigate, a change in that field's value from a data source does not update the ontology. Instead, it:
- Freezes the field at the last analyst-verified value
- Creates a
ConflictRecord - Triggers an investigation via the workflow engine (ADR-007)
- The investigation agent re-examines the specific question
- The agent's findings are presented to an analyst as evidence
- The analyst resolves the conflict (accept new value, keep old value, or set a different value)
- The resolution unfreezes the field
Integration with Scheduled Re-Investigation
The conflict-driven investigation mechanism works synergistically with scheduled re-investigations:
INITIAL KYC
│
▼
┌─────────────────┐
│ Populate ontology│
│ from all sources │
│ Auto-accept │
│ (baseline) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Analyst verifies │◀─────── First human review
│ Marks fields as │
│ "verified" │
└────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
Scheduled External Manual
Re-check Webhook Trigger
(e.g., 90d) (source push) (analyst)
│ │ │
└────────────┼────────────┘
│
▼
┌─────────────────┐
│ Fresh data from │
│ sources │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Compare against │
│ verified ontology│──── No conflicts ──▶ Log, done
│ │
│ Conflicts found │──── ▼
└────────┬────────┘
│
▼
┌─────────────────┐
│ Per-field conflict│
│ response rules │
│ fire │
└────────┬────────┘
│
┌────────────┼────────────┐
│ │ │
▼ ▼ ▼
accept_ flag_ freeze_
trusted review investigate
│ │ │
│ │ ▼
│ │ ┌─────────────┐
│ │ │ Launch │
│ │ │ targeted │
│ │ │ investigation│
│ │ └──────┬──────┘
│ │ │
│ │ ▼
│ │ ┌─────────────┐
│ │ │ Agent │
│ │ │ findings → │
│ │ │ Analyst │
│ │ │ resolution │
│ │ └──────┬──────┘
│ │ │
└────────────┼────────────┘
│
▼
┌─────────────────┐
│ Updated ontology │
│ Re-score via │
│ risk matrix │
└─────────────────┘
Investigation Scoping
When a conflict triggers an investigation, the scope can be configured:
| Scope | Behavior | Example |
|---|---|---|
field_only | Re-investigate just the conflicting field | Ownership percentage changed → MEBO re-verifies that specific ownership chain |
entity_group | Re-investigate the entire entity group | Any Person field conflict → MEBO re-examines all UBO/management data |
full_entity | Re-investigate the entire entity across all sources | Material conflict on a core identity field → full re-investigation |
related_entities | Re-investigate the entity and all related entities | Sanctions status change → re-screen all related persons and entities |
Threshold Expressions
Thresholds can be simple or compound:
# Simple: any change triggers investigation
threshold: "changed"
# Delta: only if the change exceeds a percentage
threshold: "delta > 10%"
# Boolean flip: only false→true (new risk appearing)
threshold: "false_to_true"
# Value change: specific value transitions
threshold: "value_not_in(active, dormant)"
# Compound: multiple conditions
threshold: "delta > 5% AND incoming_trust < 0.9"
8. Mutation Queue & Curation Layer
Architecture
All data changes — from providers, agents, analysts, and conflict resolutions — flow through a unified mutation queue. The queue is subject-oriented, not entity-only: a mutation can target an entity field, a relationship field, or a collection member.
┌──────────────────────────────────────────────────────────────────┐
│ MUTATION QUEUE │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Provider │ │ Agent │ │ Analyst │ │ Conflict │ │
│ │ Mutations │ │ Evidence │ │ Input │ │ Resolutions │ │
│ └─────┬────┘ └─────┬────┘ └─────┬────┘ └────────┬─────────┘ │
│ │ │ │ │ │
│ └─────────────┼─────────────┼─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌────────────────────────┐ │
│ │ Deterministic Evidence │ │
│ │ Mapper │ │
│ │ (source schema → │ │
│ │ ontology mutations) │ │
│ └───────────┬────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ Conflict Detection │ │
│ │ (compare against │ │
│ │ current ontology) │ │
│ └───────────┬────────────┘ │
│ │ │
│ ┌──────────┼──────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ Auto-Accept Review Investigate │
│ │ Queue Trigger │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌────────┐ ┌────────┐ ┌────────────┐ │
│ │ Apply │ │ Analyst│ │ Freeze + │ │
│ │ to │ │ Inbox │ │ Launch │ │
│ │ Entity │ │ │ │ Investigation│ │
│ │ Graph │ │ │ │ │ │
│ └───┬────┘ └───┬────┘ └──────┬─────┘ │
│ │ │ │ │
│ └───────────┼──────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────┐ │
│ │ Scoring Projection │ │
│ │ (read accepted graph, │ │
│ │ compute risk scores) │ │
│ └────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Mutation Record
Every proposed change is recorded as an immutable mutation. Mutations address a subject instance so the same mechanism works for:
- entity attributes (
LegalEntity.legal_name) - relationship attributes (
BENEFICIAL_OWNER_OF.ownership_percentage) - collection members (
trade_names["Acme Logistics"])
The runtime model must therefore identify both the subject kind and the stable path within that subject:
CREATE TABLE ontology_mutations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
-- What object is being mutated
subject_kind TEXT NOT NULL, -- 'entity', 'relationship', 'collection_member'
entity_id UUID,
relationship_instance_id UUID,
subject_type TEXT NOT NULL, -- entity_type or relationship_type
field_path TEXT NOT NULL, -- 'legal_name', 'ownership_percentage', 'addresses[registered].city'
cardinality_key TEXT, -- stable identity for repeated items where needed
-- What changed
previous_value JSONB,
proposed_value JSONB NOT NULL,
source_type TEXT NOT NULL, -- 'provider', 'agent', 'analyst', 'conflict_resolution'
source_id TEXT NOT NULL, -- e.g., 'kvk', 'mebo', 'analyst:jvandijk'
source_contract_version_id UUID,
source_trust NUMERIC(3,2),
-- Conflict detection
conflict_detected BOOLEAN NOT NULL DEFAULT FALSE,
conflict_id UUID REFERENCES conflict_records(id),
-- Resolution
status TEXT NOT NULL DEFAULT 'proposed',
-- 'proposed', 'auto_accepted', 'pending_review', 'accepted', 'rejected', 'frozen'
resolved_at TIMESTAMPTZ,
resolved_by TEXT, -- 'system:auto_accept', 'system:freeze', 'analyst:jvandijk'
-- Audit
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
investigation_id UUID,
batch_id UUID, -- groups mutations from the same source refresh
evidence_snapshot_id UUID,
CHECK (
(subject_kind = 'entity' AND entity_id IS NOT NULL AND relationship_instance_id IS NULL) OR
(subject_kind = 'relationship' AND relationship_instance_id IS NOT NULL) OR
(subject_kind = 'collection_member')
)
);
CREATE INDEX idx_mutations_subject ON ontology_mutations(subject_kind, entity_id, relationship_instance_id, field_path);
CREATE INDEX idx_mutations_status ON ontology_mutations(status) WHERE status IN ('proposed', 'pending_review', 'frozen');
CREATE INDEX idx_mutations_conflict ON ontology_mutations(conflict_id) WHERE conflict_detected = TRUE;
Provenance Chain
Every accepted mutation updates the graph AND records field-level provenance against the same subject instance:
CREATE TABLE field_provenance (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
subject_kind TEXT NOT NULL, -- 'entity', 'relationship', 'collection_member'
entity_id UUID,
relationship_instance_id UUID,
subject_type TEXT NOT NULL,
field_path TEXT NOT NULL,
cardinality_key TEXT,
subject_key TEXT NOT NULL, -- normalized key for current-subject uniqueness
current_value JSONB NOT NULL,
source_type TEXT NOT NULL,
source_id TEXT NOT NULL,
source_contract_version_id UUID,
source_trust NUMERIC(3,2),
verified_by TEXT, -- analyst who verified this value
verified_at TIMESTAMPTZ,
mutation_id UUID NOT NULL REFERENCES ontology_mutations(id),
schema_version_id UUID NOT NULL,
is_frozen BOOLEAN NOT NULL DEFAULT FALSE,
frozen_reason TEXT, -- 'conflict_investigation' etc.
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(subject_key)
);
Relationship Instance Identity
Relationship data is not modeled as a loose edge plus attributes. Each accepted relationship has a stable relationship_instance_id derived from:
- relationship type
- source and target entity IDs
- any schema-declared identity fields on the relationship
This lets Atlas distinguish:
- a changed ownership percentage on the same beneficial-owner relationship
- a relationship ending and a new one starting
- two separate concurrent relationships of the same type with different effective periods
Without this identity model, conflict detection on ownership and control data becomes unreliable.
9. Analyst as Data Source
Design Principle
The analyst is a first-class data source with their own typed schema, just like KVK or NorthData. The difference is that the analyst schema is per custom ontology — the compliance officer defines what an analyst can contribute when they design the ontology.
Analyst Schema Design
In the Schema Mapping Designer, the analyst appears in the data source library alongside registries and agents. When the compliance officer adds the analyst as a source, they define the analyst's output fields:
analyst_schema:
schema_for: "nl-corporate-kyc-v2"
entities:
- entity: "Qualitative Assessment"
fields:
- name: risk_justification
type: string
description: "Free-text justification for risk assessment"
- name: customer_interview_date
type: date
description: "Date of last customer interview"
- name: interview_notes
type: string
description: "Summary of customer interview"
- name: manual_risk_override
type: enum
description: "low, medium, high, critical"
values: [low, medium, high, critical]
- name: override_reason
type: string
description: "Justification for manual risk override"
- entity: "Verification"
fields:
- name: ubo_verified
type: boolean
description: "Analyst has manually verified UBO structure"
- name: address_verified
type: boolean
description: "Analyst has physically or digitally verified the address"
- name: verification_method
type: enum
values: [site_visit, google_maps, phone_call, video_call, document_review]
- name: verification_date
type: date
- name: verification_notes
type: string
Analyst Fields That No Automated Source Can Provide
Some ontology fields should only accept analyst input. These are fields where human judgment is required:
- Qualitative risk assessments ("this customer's business model is unusually complex for their stated industry")
- Interview-based findings ("customer could not explain their ownership structure")
- Verification confirmations ("I visited the registered office and confirmed it's a real business")
- Override justifications ("despite the automated high-risk score, I'm overriding to medium because...")
These fields use the manual_only merge strategy, which means they are invisible to automated sources and only populated by analyst input through the curation layer.
10. Live Data Preview
Purpose
During schema design, the compliance officer needs to validate that their mappings produce correct results with real data. The live preview pulls data from configured sources, runs it through the mapping pipeline, and shows the populated ontology.
Preview Flow
Officer clicks "Preview" in Schema Mapping Designer
│
▼
┌─────────────────────────┐
│ Select test entity │
│ (e.g., KVK# 12345678) │
└────────┬────────────────┘
│
▼
┌─────────────────────────┐
│ Call active source APIs │
│ with test entity ID │
│ (KVK, NorthData, etc.) │
└────────┬────────────────┘
│
▼
┌─────────────────────────┐
│ Execute source→ontology │
│ mappings (client-side │
│ or via preview API) │
└────────┬────────────────┘
│
▼
┌─────────────────────────┐
│ Show populated ontology │
│ with field-level detail: │
│ │
│ legal_name: "Acme BV" │
│ └─ KVK: "Acme BV" │ ← source values
│ └─ ND: "Acme B.V." │ ← shows conflict
│ └─ Winner: KVK (HT) │ ← merge result
│ └─ Conflict: format │ ← classification
│ │
│ ownership_pct: FROZEN │
│ └─ MEBO: 25.0% │ ← source values
│ └─ ND: 33.3% │ ← conflict
│ └─ Response: INVEST │ ← would trigger
│ │
│ is_sanctioned: false │
│ └─ SPEPWS: false │
│ └─ (no conflict) │
└──────────────────────────┘
Preview Detail View
For each populated field, the preview shows:
| Column | Description |
|---|---|
| Field | Ontology field name |
| Resolved Value | The value after applying merge strategy |
| Sources | Each source that provided a value, with the raw value and trust level |
| Merge Rule | Which strategy was applied |
| Conflict? | Whether sources disagreed |
| Response | What would happen if this were live (accept/flag/investigate) |
| Transform | Whether a transform was applied and what it did |
Coverage Report
The preview also generates a coverage summary:
Schema Coverage for "Acme BV" (KVK 12345678)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Ontology fields: 36
Fields populated: 28 (78%)
Fields empty: 8 (22%)
Required fields OK: 12/14 (86%)
Required fields missing: 2 (jurisdiction, channel_type)
Source utilization:
KVK: 12 fields mapped, 12 populated (100%)
NorthData: 8 fields mapped, 7 populated (88%)
SPEPWS: 4 fields mapped, 2 populated (50%) ← low coverage
MEBO: 3 fields mapped, 3 populated (100%)
ROA: 5 fields mapped, 4 populated (80%)
Analyst: 4 fields mapped, 0 populated (awaiting analyst input)
Conflicts detected: 3
ownership_percentage: MEBO (25.0%) vs NorthData (33.3%) → FREEZE
total_employees: KVK (50) vs NorthData (45) → ACCEPT_TRUSTED
legal_name: KVK ("Acme BV") vs NorthData ("Acme B.V.") → ACCEPT_TRUSTED
Risk Matrix completeness:
Customer Risk: 5/5 factors mapped (100%)
Geographic Risk: 2/3 factors mapped (67%) ← transaction_geography unmapped
Product/Service: 2/2 factors mapped (100%)
Delivery Channel: 1/2 factors mapped (50%) ← digital_presence unmapped
Transaction Risk: 0/2 factors mapped (0%) ← no source for transaction data
11. Schema Persistence & Versioning
Database Schema
-- Schema line: named, tenant-scoped identity
CREATE TABLE ontology_schema_lines (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
description TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT NOT NULL,
UNIQUE(tenant_id, name)
);
-- Schema version: immutable once published
CREATE TABLE ontology_schema_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_line_id UUID NOT NULL REFERENCES ontology_schema_lines(id),
version_number INTEGER NOT NULL,
status TEXT NOT NULL DEFAULT 'draft',
-- 'draft', 'evaluated', 'published', 'archived'
-- Schema content (JSONB for flexibility during design, compiled to typed schema on publish)
entity_groups JSONB NOT NULL, -- entity definitions with fields
source_mappings JSONB NOT NULL, -- source→ontology field mappings
matrix_mappings JSONB NOT NULL, -- ontology→matrix field mappings
field_configs JSONB NOT NULL, -- per-field merge rules, conflict responses, investigation triggers
active_sources JSONB NOT NULL, -- which tenant source adapters are active
analyst_schema JSONB, -- analyst data source schema
pinned_source_contracts JSONB NOT NULL, -- exact source contract version IDs used by this schema
-- Metadata
risk_matrix_version UUID REFERENCES risk_matrix_versions(id),
forked_from UUID REFERENCES ontology_schema_versions(id),
published_at TIMESTAMPTZ,
published_by TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(schema_line_id, version_number),
CHECK (status IN ('draft', 'evaluated', 'published', 'archived'))
);
-- Enforce: at most one published version per schema line
CREATE UNIQUE INDEX uq_one_published_per_line
ON ontology_schema_versions(schema_line_id)
WHERE status = 'published';
-- Source mappings (denormalized for query performance, source of truth is JSONB above)
CREATE TABLE ontology_source_mappings (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
source_id TEXT NOT NULL,
source_entity TEXT NOT NULL,
source_field TEXT NOT NULL,
target_entity TEXT NOT NULL,
target_field TEXT NOT NULL,
transform TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
-- Field conflict configuration
CREATE TABLE ontology_field_configs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
entity_type TEXT NOT NULL,
field_name TEXT NOT NULL,
-- Identity
is_identity_key BOOLEAN NOT NULL DEFAULT FALSE,
is_required BOOLEAN NOT NULL DEFAULT FALSE,
-- Merge
merge_strategy TEXT NOT NULL DEFAULT 'first_available',
-- 'first_available', 'highest_trust', 'latest', 'manual_only', 'any_true', 'count_distinct', 'accumulate'
-- Conflict
conflict_response TEXT NOT NULL DEFAULT 'accept_trusted',
-- 'accept_trusted', 'flag_review', 'freeze_investigate'
-- Investigation trigger (only if conflict_response = 'freeze_investigate')
investigation_agent TEXT,
investigation_threshold TEXT, -- threshold expression
investigation_priority TEXT DEFAULT 'medium',
investigation_scope TEXT DEFAULT 'field_only',
-- 'field_only', 'entity_group', 'full_entity', 'related_entities'
UNIQUE(schema_version_id, entity_type, field_name)
);
Published ontology schemas are immutable compilations. On publish, Atlas stores the fully resolved ontology schema plus the exact source contract versions and matrix version it was validated against. If a provider later ships a new contract version, existing published ontology schemas are unaffected until a new ontology version is created and evaluated against that newer contract.
12. Evaluations
Purpose
Before publishing a schema, the compliance officer runs evaluations to validate coverage, correctness, and scoring completeness. An evaluation runs the schema against a set of test entities and produces a quality report.
Evaluations have two modes:
- snapshot mode (default) — uses frozen evidence snapshots and pinned source contract versions; this is the authoritative publish gate
- live preview mode — pulls fresh data from active sources; this is exploratory and explicitly non-deterministic
Evaluation Types
| Type | What It Tests | Pass Criteria |
|---|---|---|
| Coverage | How many ontology fields get populated for each test entity | ≥ threshold (e.g., 80% of required fields) |
| Source Agreement | How often sources agree vs conflict across the test set | Conflict rate below threshold |
| Transform Correctness | Whether transforms produce valid output | No transform errors |
| Matrix Completeness | Whether every required risk factor receives a value | All required factors mapped and populated |
| Scoring Consistency | Whether the same entity produces the same risk score across runs in snapshot mode | Deterministic (0 variance) |
| Regression | Whether the new schema version produces different scores than the previous version for the same entities | Diff report showing score changes |
Evaluation Workflow
1. Officer selects a test set (list of entity IDs or a saved test suite)
2. System resolves the exact source contract versions pinned by the draft schema
3. In snapshot mode, system loads frozen evidence snapshots for the test entities; in live preview mode, it pulls fresh data and creates temporary snapshots
4. System runs data through the schema (mappings, merge rules, conflict detection)
5. System populates the ontology and computes risk scores
6. System generates the evaluation report, including contract versions and snapshot IDs
7. If all required snapshot-mode criteria pass, schema status moves to "evaluated"
8. Officer reviews report and decides to publish or iterate
Test Suites
Test suites are tenant-scoped, reusable collections of test entities:
CREATE TABLE evaluation_test_suites (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
name TEXT NOT NULL,
description TEXT,
entity_ids UUID[] NOT NULL, -- list of entity IDs to test
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT NOT NULL
);
CREATE TABLE evaluation_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
test_suite_id UUID REFERENCES evaluation_test_suites(id),
evaluation_mode TEXT NOT NULL DEFAULT 'snapshot',
-- 'snapshot', 'live_preview'
source_contract_set JSONB NOT NULL,
evidence_snapshot_ids UUID[] NOT NULL DEFAULT '{}',
status TEXT NOT NULL DEFAULT 'running',
-- 'running', 'completed', 'failed'
-- Results
coverage_score NUMERIC(5,2),
conflict_rate NUMERIC(5,2),
matrix_completeness NUMERIC(5,2),
transform_errors INTEGER DEFAULT 0,
scoring_variance NUMERIC(5,4),
report JSONB, -- full detailed report
started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ,
run_by TEXT NOT NULL
);
13. Data Source Schema Management
Every Data Source Has an Ontology Schema
A core principle of the three-schema model is that every data source — whether a government registry, AI agent, screening database, file import, or manual analyst — must declare an output schema. This schema defines the entities and fields that the source can produce. Without a declared schema, a source cannot participate in the mapping pipeline.
The schema is stored as a versioned source contract. Contracts are immutable once published. A source adapter may be reconfigured per tenant without changing the contract version it emits against.
Managing Source Schemas in Compliance Studio
The Compliance Studio provides a dedicated Data Sources section where compliance officers and administrators can:
- Browse the data source library — all installed data provider plugins (ADR-009), organized by category (Registry, Investigation, Screening, Analyst)
- View published source contracts — inspect the output schema of any data source: which entities it produces, which fields within each entity, field types, and descriptions
- Configure source adapters — per-tenant trust levels, active/inactive status, API credentials (masked), rate limits, and sandbox vs production mode
- Create custom analyst schemas — design the analyst data source schema for a specific ontology (what fields can an analyst contribute)
- Register new tenant-owned data sources — add database connections, file upload schemas, webhook endpoints, or custom API integrations as new sources with declared output schemas
- Health monitoring — view source health status, last successful call, error rates, and latency metrics
Source Schema Registry
Source management is split into three stores:
- Source definitions — stable source identities such as
kvk,northdata,analyst - Source contract versions — immutable published output schemas
- Tenant source adapters — per-tenant connection, trust, and activation settings
This separation ensures:
- provider contract evolution is explicit and auditable
- tenant operational config can change without invalidating published ontology schemas
- evaluations and live runs can be replayed against the exact contract version used at the time
CREATE TABLE data_source_definitions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_id TEXT NOT NULL UNIQUE, -- 'kvk', 'northdata', 'analyst'
source_type TEXT NOT NULL, -- 'rest_api', 'database', 'file', 'webhook', 'agent', 'manual'
category TEXT NOT NULL, -- 'registry', 'investigation', 'screening', 'analyst'
display_name TEXT NOT NULL,
description TEXT
);
CREATE TABLE data_source_contract_versions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_definition_id UUID NOT NULL REFERENCES data_source_definitions(id),
version_number INTEGER NOT NULL,
status TEXT NOT NULL DEFAULT 'draft',
-- 'draft', 'published', 'archived'
output_schema JSONB NOT NULL,
capabilities TEXT[] NOT NULL DEFAULT '{}',
published_at TIMESTAMPTZ,
published_by TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(source_definition_id, version_number)
);
CREATE UNIQUE INDEX uq_one_published_contract_per_source
ON data_source_contract_versions(source_definition_id)
WHERE status = 'published';
CREATE TABLE tenant_source_adapters (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
source_definition_id UUID NOT NULL REFERENCES data_source_definitions(id),
pinned_contract_version_id UUID NOT NULL REFERENCES data_source_contract_versions(id),
trust_level NUMERIC(3,2) NOT NULL DEFAULT 0.5,
connection_config JSONB,
is_active BOOLEAN NOT NULL DEFAULT TRUE,
health_status TEXT DEFAULT 'unknown',
last_health_check TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
UNIQUE(tenant_id, source_definition_id)
);
The analyst source is a special case:
- the analyst runtime adapter is tenant-owned
- the analyst contract is generated from the published ontology schema and versioned alongside it
- analyst input therefore remains typed, auditable, and replayable like any other source
Linking Data Mappings to Risk Matrices
A published data mapping (custom ontology schema) can be linked to one or more risk matrix versions. This link establishes the full pipeline: sources → ontology → matrix → scores.
CREATE TABLE schema_matrix_links (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
risk_matrix_version_id UUID NOT NULL REFERENCES risk_matrix_versions(id),
is_primary BOOLEAN NOT NULL DEFAULT FALSE, -- primary link used for scoring
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT NOT NULL,
UNIQUE(schema_version_id, risk_matrix_version_id)
);
-- At most one primary link per schema version
CREATE UNIQUE INDEX uq_primary_matrix_link
ON schema_matrix_links(schema_version_id)
WHERE is_primary = TRUE;
This means:
- A "NL Corporate KYC v2" schema can be linked to "EBA Standard Matrix v3" for standard scoring
- The same schema could also be linked to "Enhanced DD Matrix v1" for high-risk customers
- The Schema Mapping Designer shows the linked matrix in the right column
- When switching the linked matrix, the right column updates to show the new matrix's factors and required mappings
- Evaluations run against a specific schema + matrix combination
14. Compliance Studio Organization
Navigation Structure
The Compliance Studio is organized around the three layers of the pipeline, from data sources through ontology design to risk scoring. Each section provides the tools for one layer.
Compliance Studio
├── Data Sources ← Layer 1: Where data comes from
│ ├── Source Library ← Browse/manage installed sources
│ ├── Source Schemas ← View published source contracts
│ └── Health Monitor ← Source connectivity & health
│
├── Schema Designer ← Layer 2: How data is unified
│ ├── Data Mappings ← Create/edit custom ontology schemas + mappings
│ └── Schema Versions ← Version history, publish, fork
│
├── Risk Matrices ← Layer 3: How risk is scored
│ ├── Matrix Editor ← Configure EBA dimensions & factors
│ └── Risk Categories ← Reference data (country lists, industry codes, etc.)
│
├── Evaluations ← Validation: Does it all work?
│ ├── Schema Evaluations ← Coverage, conflict, completeness reports
│ └── Scoring Evaluations ← End-to-end scoring validation
│
└── Workflows ← Orchestration: When does it run?
├── Workflow Schemas ← KYC workflow definitions
└── Investigation Templates ← Conflict-triggered investigation configs
Menu Items and Their Purpose
| Menu Item | Icon | Path | Purpose |
|---|---|---|---|
| Source Library | database | /studio/sources | Browse all available data sources, activate/deactivate per tenant, configure credentials and trust levels |
| Source Schemas | diagram-tree | /studio/sources/schemas | View published source contracts, inspect version history, create custom analyst contracts, and register tenant-owned sources. |
| Health Monitor | pulse | /studio/sources/health | Real-time dashboard of source connectivity, error rates, latency, and last successful data pull |
| Data Mappings | data-connection | /studio/mappings | The Schema Mapping Designer — the visual three-column builder where you wire sources → ontology → matrix |
| Schema Versions | git-branch | /studio/mappings/versions | List all schema versions per line, their status (draft/evaluated/published/archived), diff between versions |
| Matrix Editor | heat-grid | /studio/matrices | Configure EBA risk dimensions, factors, weights, aggregation functions. Link matrices to data mappings. |
| Risk Categories | th-derived | /studio/risk-categories | Manage reference data: high-risk country lists, industry risk classifications, PEP categories, etc. |
| Schema Evaluations | lab-test | /studio/evaluations/schema | Run and review schema evaluations: coverage, conflict rates, source utilization |
| Scoring Evaluations | timeline-bar-chart | /studio/evaluations/scoring | Run and review end-to-end scoring: schema + matrix → risk scores for test entities |
| Workflow Schemas | flows | /studio/workflows | Define KYC onboarding workflows, re-investigation schedules, monitoring frequencies |
| Investigation Templates | search | /studio/workflows/investigations | Configure conflict-driven investigation templates: which agent, what threshold, what scope |
Design Principle: Pipeline Visibility
The Studio menu follows the data pipeline left-to-right: Sources → Ontology → Matrix → Evaluation → Workflow. A compliance officer designing a new KYC program would naturally work through the menu top-to-bottom:
- Configure their data sources (which registries, agents, and screening databases to use)
- Design their ontology (what entity model captures their compliance requirements)
- Link a risk matrix (how ontology fields map to EBA risk factors)
- Evaluate (does the pipeline produce correct, complete results)
- Deploy via workflow (when and how the pipeline runs for real investigations)
15. Future Data Source Types
Extensibility Model
The data provider plugin architecture (ADR-009) already supports adding new sources via plugin manifests. This ADR extends that model to support diverse data source types beyond REST APIs.
| Source Type | Connection | Schema Declaration | Example |
|---|---|---|---|
| REST API | URL + auth headers + rate limits | Response JSON schema → output fields | KVK, NorthData |
| Database | Connection string + query template | Result set columns → output fields | Internal CRM, AML database |
| File Upload | Upload endpoint + format spec | Parsed columns/fields → output fields | CSV watchlists, Excel reports |
| Webhook | Incoming webhook URL + payload schema | Payload fields → output fields | Real-time screening alerts |
| AI Agent | Agent configuration + prompt template | Agent output schema → output fields | MEBO, AMLRR, ROA |
| Manual Entry | Form schema in Compliance Studio | Form fields → output fields | Analyst input |
| Aggregator | Combines multiple sub-sources | Union of sub-source schemas | NorthData (aggregates multiple registries) |
Plugin Manifest Extension
# Extended plugin manifest for ADR-010 compatibility
plugin:
id: "kvk-nl"
version: "1.0.0"
type: "rest_api" # NEW: source type
connection: # NEW: connection configuration
type: "rest_api"
base_url: "https://api.kvk.nl/api"
auth:
type: "api_key"
header: "apikey"
rate_limit:
requests_per_minute: 60
sandbox:
base_url: "https://api.kvk.nl/test/api"
output_schema: # Replaces mapping_spec.yaml
entities:
- entity: "Registration"
fields:
- name: kvkNummer
type: string
required: true
- name: statutaireNaam
type: string
required: true
# ...
health_check:
endpoint: "/v2/zoeken?kvkNummer=00000000"
expected_status: 200
16. Relationship Modeling
Gap in Current Design
The current schema models entities (LegalEntity, Person, Address) but not the relationships between them. "Person X is UBO of LegalEntity Y" is a relationship with its own attributes (ownership percentage, control type, effective date).
Relationship Types in Custom Ontology
The custom ontology schema supports relationship type definitions alongside entity types:
relationship_types:
- type: "BENEFICIAL_OWNER_OF"
source_entity: "Person"
target_entity: "LegalEntity"
fields:
- name: ownership_percentage
type: number
conflict_response: freeze_investigate
investigation_agent: mebo
- name: control_type
type: enum
values: [direct, indirect, nominee]
- name: effective_date
type: date
- name: verified_by
type: string
merge_strategy: manual_only
- type: "REGISTERED_AT"
source_entity: "LegalEntity"
target_entity: "Address"
fields:
- name: address_type
type: enum
values: [registered, operational, postal]
- name: effective_date
type: date
- type: "RELATED_TO"
source_entity: "LegalEntity"
target_entity: "LegalEntity"
fields:
- name: relationship_type
type: enum
values: [parent, subsidiary, sister, joint_venture]
- name: ownership_percentage
type: number
Relationship fields follow the same merge strategy, conflict response, and investigation trigger model as entity fields, but runtime processing treats each relationship as a first-class subject with its own identity, provenance, and lifecycle.
Relationship Runtime Model
At publish time, every relationship type must declare either:
- an explicit identity key for matching repeated relationships, or
- a rule that Atlas should derive identity from
(source_entity_id, target_entity_id, relationship_type, identity fields...)
At runtime:
- Incoming relationship evidence is matched to an existing
relationship_instance_id - If matched, field-level mutations are evaluated on that relationship instance
- If unmatched, a new relationship instance is proposed
- If a previously current relationship is no longer observed, Atlas records an end event rather than deleting it
This aligns relationship handling with ADR-009's is_current and event-sourcing requirements and avoids collapsing "changed edge attributes" into "different edge exists".
17. Schema Composition & Inheritance
Use Case
A compliance team needs multiple schema variants that share a common base:
- NL Corporate KYC — Simplified (low-risk customers, fewer fields)
- NL Corporate KYC — Standard (standard due diligence)
- NL Corporate KYC — Enhanced (high-risk, additional fields and stricter conflict responses)
Inheritance Model
A schema can declare a parent. It inherits all entity groups, fields, mappings, and configurations from the parent, and can:
- Add new entity groups or fields
- Override field configurations (e.g., change conflict_response from
accept_trustedtofreeze_investigate) - Remove fields that are not needed (mark as
excluded) - Cannot change field types or identity keys (those are fixed by the parent)
ALTER TABLE ontology_schema_versions
ADD COLUMN parent_version_id UUID REFERENCES ontology_schema_versions(id),
ADD COLUMN inheritance_overrides JSONB; -- field-level overrides on parent
The effective schema is computed by merging the parent's definitions with the child's overrides. This is resolved at publish time — the published version contains the fully resolved schema, not a reference to the parent.
18. Scoring Projection Integration
Compile-Time Validation
When a schema version is published, the system validates that the risk matrix schema's required factors are satisfiable:
For each required factor in the risk matrix:
1. Identify the ontology field(s) it reads (from matrix mappings)
2. Verify those fields exist in the custom ontology
3. Verify at least one source mapping feeds those fields
4. Verify the reduction strategy is compatible with the field type
5. If any check fails → block publish with a clear error message
Runtime Scoring Flow
Published Schema v2
│
▼
┌─────────────────────────┐
│ Scoring Projection │
│ │
│ Reads: accepted graph │
│ snapshot (entities + │
│ relationships, excluding │
│ frozen subjects) │
│ │
│ For each matrix factor: │
│ - Find ontology field │
│ - Apply reduction │
│ - Handle missing values│
│ - Compute factor score │
│ │
│ Aggregate dimension │
│ scores → overall risk │
└────────┬────────────────┘
│
▼
Risk Score + Audit Record
(pinned to schema v2 +
matrix version +
source contract set +
evidence snapshot)
Frozen fields are excluded from scoring by default. A frozen field means its value is under investigation and cannot be relied upon. The scoring projection uses the missing_strategy for that factor (typically flag_for_review) and the evaluation is marked as "provisional — pending investigation."
19. Database Schema
New Tables Summary
| Table | Purpose |
|---|---|
ontology_schema_lines | Named schema identities per tenant |
ontology_schema_versions | Versioned schema definitions (draft/published/archived) |
ontology_source_mappings | Denormalized source→ontology field mappings |
ontology_field_configs | Per-field merge, conflict, and investigation configuration |
ontology_mutations | Immutable record of all proposed subject mutations |
field_provenance | Current field values with source provenance |
conflict_records | Detected conflicts with resolution tracking |
data_source_definitions | Stable source identities |
data_source_contract_versions | Versioned immutable source contracts |
tenant_source_adapters | Tenant-specific operational source config |
evaluation_test_suites | Reusable test entity collections |
evaluation_runs | Evaluation execution results |
Conflict Records Table
CREATE TABLE conflict_records (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL REFERENCES tenants(id),
schema_version_id UUID NOT NULL REFERENCES ontology_schema_versions(id),
subject_kind TEXT NOT NULL, -- 'entity', 'relationship', 'collection_member'
entity_id UUID,
relationship_instance_id UUID,
subject_type TEXT NOT NULL,
field_path TEXT NOT NULL,
cardinality_key TEXT,
-- Conflict details
current_value JSONB,
current_source TEXT NOT NULL,
current_verified_at TIMESTAMPTZ,
current_verified_by TEXT,
incoming_value JSONB NOT NULL,
incoming_source TEXT NOT NULL,
incoming_source_contract_version_id UUID,
incoming_received_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
-- Configuration that triggered this
merge_strategy TEXT NOT NULL,
conflict_response TEXT NOT NULL,
-- Investigation (if triggered)
investigation_id UUID,
investigation_agent TEXT,
investigation_priority TEXT,
investigation_scope TEXT,
-- Resolution
status TEXT NOT NULL DEFAULT 'open',
-- 'open', 'investigating', 'resolved', 'auto_resolved'
resolved_at TIMESTAMPTZ,
resolved_by TEXT,
resolved_value JSONB,
resolution_type TEXT,
-- 'accept_incoming', 'keep_current', 'override', 'escalated'
resolution_justification TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE INDEX idx_conflicts_open ON conflict_records(tenant_id, status) WHERE status IN ('open', 'investigating');
CREATE INDEX idx_conflicts_subject ON conflict_records(subject_kind, entity_id, relationship_instance_id, field_path);
20. Migration Path
Phase 1: Schema Persistence (Sprint 1)
- Database tables for schema lines, versions, field configs
- Versioned source contracts and tenant source adapters
- Save/load from Schema Mapping Designer
- Draft/publish lifecycle
- No conflict detection yet — pure schema management
Phase 2: Mutation Queue & Provenance (Sprint 2)
- Mutation queue table and processing pipeline
- Field provenance tracking
- Relationship instance identity and relationship-level mutations
- Merge strategy execution (deterministic value resolution)
- Auto-accept for initial population (baseline)
Phase 3: Conflict Detection & Response (Sprint 3)
- Conflict detection during mutation processing
accept_trustedandflag_reviewconflict responses- Conflict records table
- Analyst review inbox for flagged conflicts
Phase 4: Conflict-Driven Investigation (Sprint 4)
freeze_investigateconflict response- Investigation trigger mechanism (via Temporal workflows, ADR-007)
- Field freezing with provisional scoring
- Investigation resolution flow
Phase 5: Live Preview & Evaluations (Sprint 5)
- Preview API that pulls data and runs through mappings
- Snapshot-backed evaluation framework with test suites
- Coverage, conflict, and scoring reports
- Schema status gates (draft → evaluated → published)
Phase 6: Schema Composition (Sprint 6)
- Parent/child schema inheritance
- Inheritance override resolution at publish time
- Relationship type definitions in custom ontology
21. Rejected Alternatives
Alternative 1: Auto-classify conflicts by field type
Rejected because conflict materiality is a compliance judgment, not a technical property. The same field (employee count) might be material for one client and immaterial for another. The compliance officer must configure this explicitly.
Alternative 2: Real-time conflict resolution (no queue)
Rejected because conflict handling may involve human judgment, investigation, and audit trails. A synchronous model would block data ingestion on analyst availability.
Alternative 3: Global merge strategy (same rule for all fields)
Rejected because different fields have fundamentally different semantics. A boolean flag (is_sanctioned) needs any_true; a name needs highest_trust; an address collection needs accumulate. Per-field configuration is essential.
Alternative 4: Separate ontology and conflict systems
Rejected because conflict detection is inherent to the ontology update process. Separating them would require duplicating entity state tracking and create consistency risks.
Alternative 5: Duplicate entities on conflict instead of freezing fields
Rejected because entity duplication creates identity resolution problems downstream. The conflict is about a field value, not about entity identity. Freezing the field preserves the entity graph integrity while quarantining the uncertain data.
Alternative 6: Mutable source schemas per tenant
Rejected because it blurs the boundary between provider contract, tenant adapter, and ontology mapping. Mutable source schemas would make replay and auditability fragile. Source contracts must be versioned and immutable once published; tenant customization belongs in adapters and ontology mappings.
Alternative 7: Live-data-only evaluations
Rejected because upstream data drift would make publish gates non-deterministic. Live preview is useful for exploration, but authoritative evaluation must run against frozen evidence snapshots and pinned source contract versions.
22. Open Questions
-
Threshold expression language: should we use a simple DSL (
delta > 10%), a JSON rules format, or reuse the existing rule engine from ADR-007? -
Conflict resolution SLA: should there be a configurable time limit on how long a field can remain frozen before auto-escalating?
-
Cross-entity conflict correlation: if multiple fields across related entities conflict simultaneously (e.g., ownership changes across a corporate structure), should the system detect and group these as a single investigation?
-
Schema marketplace: should published schemas be shareable across tenants (e.g., an "NL Corporate KYC" schema that any Dutch-focused client can adopt and customize)?
-
Backward compatibility: when a new schema version adds fields, how should existing entities be handled? Lazy migration (compute on read) vs eager migration (backfill on publish)?
-
Evidence retention policy: how long should frozen evidence snapshots be retained for replay, regulator lookback, and regression testing?
-
Regulatory mapping: should the schema designer be able to tag fields with regulatory references (e.g., "this field satisfies EBA GL 2.5(a)") for audit purposes?