ADR-010: Ontology Lifecycle, Conflict Resolution & Conflict-Driven Investigation

Status: Proposed Date: 2026-03-29 Author: Atlas Architecture Depends on: ADR-007 (Workflow Ontology Engine), ADR-008 (EBA Risk Matrix Engine), ADR-008a (Reference Data Registry), ADR-009 (Data Provider Plugin Architecture) Impacts: ADR-013 (Analyst Interaction Layer), Schema Mapping Designer (Compliance Studio)

Context & Problem Statement
Decision Summary
Three-Schema Model
Custom Ontology Schema Lifecycle
Field Configuration Model
Conflict Detection & Response
Conflict-Driven Investigation Triggers
Mutation Queue & Curation Layer
Analyst as Data Source
Live Data Preview
Schema Persistence & Versioning
Evaluations
Data Source Schema Management
Compliance Studio Organization
Future Data Source Types
Relationship Modeling
Schema Composition & Inheritance
Scoring Projection Integration
Database Schema
Migration Path
Rejected Alternatives
Open Questions

1. Context & Problem Statement

Atlas collects KYC/AML data from multiple sources — government registries (KVK, NorthData), investigation agents (SPEPWS, AMLRR, MEBO, ROA), screening databases, and manual analyst input. This data populates an entity ontology, which then feeds a deterministic risk matrix (ADR-008) to produce risk scores.

The current architecture has several gaps:

No unified schema design tool. The ontology schema (ontology_schema_v3.yaml) is a static YAML file maintained by engineers. Compliance officers cannot design or customize the data model for their jurisdiction, risk appetite, or regulatory requirements without code changes.
No conflict detection. When two data sources disagree on a field value — KVK says 50 employees, NorthData says 45 — the system has no formal mechanism to detect, classify, or respond to the disagreement. Values are silently overwritten based on implicit ordering.
No conflict-driven investigation. A change in employee count might be noise, but a change in UBO count is a material compliance event that should trigger investigation. There is no way to configure per-field responses to data conflicts.
No ontology lifecycle. The ontology is populated once during initial investigation and then treated as static. There is no mechanism for ongoing monitoring, scheduled re-verification, or detecting meaningful changes over time.
No schema persistence or versioning. Schema configurations, field mappings, merge rules, and conflict responses exist only in code. They cannot be saved, versioned, audited, or shared between tenants.
Analyst input is a second-class citizen. Analyst overrides use a generic field_path + override_value pattern rather than having a proper schema with typed fields, validation, and provenance.
No live validation. Compliance officers designing a schema cannot preview how real data flows through their mappings, making it impossible to validate the design before deploying it.

This ADR defines the ontology lifecycle — from schema design through initial population, ongoing monitoring, conflict detection, conflict-driven investigation, and schema evolution.

2. Decision Summary

Build a Custom Ontology Schema system with the following properties:

Three-schema model: source schemas (fixed per data provider), custom ontology schemas (designed by compliance officers), and risk matrix schemas (EBA dimensions). Each layer is independently versioned.
Per-field conflict configuration: each ontology field has a merge strategy (how to pick a value), a conflict response (what to do when sources disagree), and an optional investigation trigger (which agent to invoke when conflict exceeds a threshold).
Mutation queue with curation: all data changes — from providers, agents, and analysts — enter as proposed mutations. Alert rules evaluate each mutation and route it to auto-accept, flag-for-review, or investigate.
Analyst as a first-class data source: analysts have their own typed schema per custom ontology, designed alongside other source schemas in the Compliance Studio.
Schema persistence with immutable versioning: draft schemas are mutable; published schemas are frozen. Every evaluation pins to a specific schema version.
Live data preview: during schema design, compliance officers can pull real data through their mappings to validate the design.
Evaluation framework: run a schema against test entities to measure coverage, conflict rates, and scoring completeness before publishing.

3. Three-Schema Model

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                        COMPLIANCE STUDIO                             │
│                                                                      │
│  ┌─────────────┐   ┌──────────────────┐   ┌─────────────────────┐  │
│  │ Source       │   │ Custom Ontology   │   │ Risk Matrix         │  │
│  │ Schemas      │──▶│ Schema            │──▶│ Schema              │  │
│  │ (read-only)  │   │ (editable)        │   │ (EBA dimensions)    │  │
│  └─────────────┘   └──────────────────┘   └─────────────────────┘  │
│                                                                      │
│  KVK Schema          NL Corporate KYC      Customer Risk            │
│  NorthData Schema    ┌─────────────────┐   Geographic Risk          │
│  SPEPWS Schema       │ LegalEntity     │   Product / Service Risk   │
│  AMLRR Schema        │ Person          │   Delivery Channel Risk    │
│  MEBO Schema         │ Address         │   Transaction Risk         │
│  ROA Schema          │ Evidence        │                            │
│  Analyst Schema      │ Ownership (rel) │                            │
│  (per-ontology)      └─────────────────┘                            │
└─────────────────────────────────────────────────────────────────────┘

Source Schemas (Layer 1)

Source schemas are versioned source contracts. For provider-backed sources, the contract originates in the plugin manifest (mapping_spec.yaml / output_schema from ADR-009). For tenant-defined sources (manual forms, file uploads, database adapters), the contract is authored in Studio and then published as an immutable version.

Each source contract declares the entities it can emit, the fields within each entity, field types, cardinality, and relationship payloads if applicable. The Studio displays the published contract versions as the "left column" in the Schema Mapping Designer.

Key principle: all data sources are equal as evidence producers, but not all sources are governed the same way. Provider contracts are published by plugin authors and are read-only to tenants. Tenant administrators may configure a source adapter (credentials, trust level, endpoint settings, transforms, activation), but they do not mutate the provider contract itself.

Audit rule: every published ontology schema must pin the exact source contract versions it was designed against, and every evaluation or live run must record the contract versions and evidence snapshot used.

Source Contracts vs Tenant Adapters

To keep ownership boundaries clear, ADR-010 distinguishes three artifacts:

Source contract — immutable schema for what a source can emit
Tenant source adapter — tenant-specific operational config for how Atlas connects to and trusts that source
Source-to-ontology mapping — ontology-specific mapping from contract fields into the tenant's custom ontology

This prevents a tenant from "editing KVK" while still allowing tenant-specific configuration and normalization.

Custom Ontology Schema (Layer 2)

Designed by the compliance officer in the Compliance Studio. This is the core data model for a specific compliance use case (e.g., "NL Corporate KYC", "DE Enhanced Due Diligence", "UK Simplified CDD").

The compliance officer:

Creates entity groups (LegalEntity, Person, Address, etc.)
Adds fields to each group with types, descriptions, and constraints
Defines identity keys for entity resolution
Configures per-field merge rules, conflict responses, and investigation triggers
Wires source schema fields to ontology fields (source→ontology mappings)
Wires ontology fields to risk matrix factors (ontology→matrix mappings)

Risk Matrix Schema (Layer 3)

Defined by the EBA risk matrix configuration (ADR-008). The five EBA dimensions and their factors are the "right column" in the Schema Mapping Designer. Each factor declares which ontology fields it reads, what reduction strategy to apply for multi-value fields, and how to handle missing values.

4. Custom Ontology Schema Lifecycle

┌──────────┐     ┌───────────┐     ┌───────────┐     ┌──────────┐
│  DRAFT   │────▶│ EVALUATED │────▶│ PUBLISHED │────▶│ ARCHIVED │
│          │     │           │     │           │     │          │
│ Mutable  │     │ Validated │     │ Immutable │     │ Frozen   │
│ No data  │     │ Test data │     │ Live data │     │ Replaced │
│ flow     │     │ preview   │     │ flow      │     │ by newer │
└──────────┘     └───────────┘     └───────────┘     └──────────┘
     ▲                                    │
     │                                    │
     └────── fork (new version) ──────────┘

States

State	Mutable	Data Flow	Scoring	Description
`draft`	Yes	Preview only	No	Work in progress. Can be edited freely. Not used for live investigations.
`evaluated`	Yes (minor edits)	Test set	Simulation	Passed evaluation criteria. Test data has been run through the schema to validate coverage and correctness.
`published`	No	Live	Yes	Active production schema. At most one published version per `schema_line_id` per tenant. Immutable.
`archived`	No	None	Historical only	Superseded by a newer published version. Retained for audit trail. Evaluations performed under this version reference it permanently.

Versioning

Each schema belongs to a schema line — a named, tenant-scoped identity (e.g., "NL Corporate KYC"). Within a line, versions are sequential integers. Forking a published version creates a new draft at the next version number.

NL Corporate KYC
  v1 (published → archived)
  v2 (published, active)
  v3 (draft, in progress)

5. Field Configuration Model

Each field in the custom ontology has three configuration dimensions:

5.1 Merge Strategy (How to pick a value)

Determines which value "wins" when multiple sources provide data for the same field.

Strategy	Label	Behavior
`first_available`	1st	Uses the first non-null value, ordered by source trust level (descending)
`highest_trust`	HT	Always picks the value from the highest-trust source, even if a lower-trust source responded first
`latest`	NEW	Uses the most recently updated value regardless of trust
`manual_only`	MAN	Only accepts values from the analyst data source; ignores automated sources
`any_true`	ANY	For booleans: true if any source reports true
`count_distinct`	CNT	For collections: counts unique values across all sources
`accumulate`	ACC	For collections: merges all values from all sources (addresses, trade names, etc.)

5.2 Conflict Response (What to do when sources disagree)

Determines the system's response when the resolved value differs from values provided by other sources.

Response	Icon	Behavior
`accept_trusted`	Grey shield	Accept the merge strategy winner. Log the discrepancy in the mutation audit trail. No further action.
`flag_review`	Yellow shield	Accept the merge strategy winner provisionally. Create a review task in the analyst inbox. The field is marked as "pending review" in the ontology until an analyst confirms or overrides.
`freeze_investigate`	Red shield	Do not update the field. Freeze at the last analyst-verified value. Trigger an investigation (see §7). The field remains frozen until the investigation completes and an analyst resolves the conflict.

5.3 Investigation Trigger (Which investigation to launch)

Only applicable when conflict response is freeze_investigate. Configures:

Agent: which investigation agent to invoke (MEBO for ownership conflicts, ROA for address conflicts, SPEPWS for sanctions changes, etc.)
Threshold: optional condition to trigger (e.g., "only investigate if delta > 10%" or "only if the value changes from true to false")
Priority: investigation priority level (critical, high, medium)
Scope: whether the investigation re-examines just this field or the entire entity group

Visual Representation in Schema Mapping Designer

Each ontology field row displays:

[Left Port] [Source Badges] [🔑] field_name [MergeRule] [ConflictShield] [REQ] type [×] [Right Port]

The conflict shield is clickable and opens a configuration panel:

┌─────────────────────────────────────┐
│ Conflict Response: ownership_pct    │
│                                     │
│ ○ Accept trusted source             │
│ ○ Flag for analyst review           │
│ ● Freeze & investigate              │
│                                     │
│ Investigation Agent: [MEBO ▾]       │
│ Threshold: [delta > 5%]             │
│ Priority: [High ▾]                  │
│ Scope: [Entity group ▾]            │
└─────────────────────────────────────┘

6. Conflict Detection & Response

Detection Flow

Source provides new value
        │
        ▼
┌─────────────────────────┐
│ Compare against current  │
│ ontology field value     │
│ AND other source values  │
├─────────────────────────┤
│ Same value?             │──── Yes ──▶ Accept silently
│                         │
│ Different value?        │──── ▼
└─────────────────────────┘
        │
        ▼
┌─────────────────────────┐
│ Apply merge strategy     │
│ to determine "winner"    │
├─────────────────────────┤
│ Winner == current?      │──── Yes ──▶ Log discrepancy, no change
│                         │
│ Winner != current?      │──── ▼ ──── Mutation proposed
└─────────────────────────┘
        │
        ▼
┌─────────────────────────┐
│ Apply conflict response  │
│ from field config        │
├─────────────────────────┤
│ accept_trusted          │──── Accept mutation, log audit
│ flag_review             │──── Accept mutation, create review task
│ freeze_investigate      │──── Reject mutation, freeze field, trigger investigation
└─────────────────────────┘

Conflict Classification

Not all disagreements are equal. The system classifies conflicts by materiality:

Category	Examples	Typical Response
Formatting	"Bedrijf ABC B.V." vs "Bedrijf ABC BV"	Normalize, accept_trusted
Staleness	Employee count 50 (Jan) vs 52 (Mar)	Accept latest, log
Material	UBO count 2 vs 3, sanctions status change	freeze_investigate
Contradictory	Status "active" vs "dissolved"	freeze_investigate + critical priority

The compliance officer configures which category applies to each field via the conflict response setting. The system does not auto-classify — the officer makes the determination based on regulatory requirements and risk appetite.

Conflict Record

Every detected conflict produces an immutable ConflictRecord:

conflict_id: "cf-2026-0329-001"
entity_id: "ent-abc123"
entity_type: "Person"
field_name: "ownership_percentage"
schema_version: "nl-corporate-kyc-v2"

# What happened
current_value: 25.0
current_source: "mebo"
current_verified_at: "2026-02-15T10:30:00Z"
current_verified_by: "analyst:jvandijk"

incoming_value: 33.3
incoming_source: "northdata"
incoming_received_at: "2026-03-29T08:15:00Z"

# How it was handled
conflict_response: "freeze_investigate"
merge_strategy: "highest_trust"
merge_winner: "mebo"  # would have picked MEBO's value, but frozen instead

# Investigation (if triggered)
investigation_id: "inv-2026-0329-002"
investigation_agent: "mebo"
investigation_priority: "high"
investigation_status: "pending"

# Resolution (filled later)
resolved_at: null
resolved_by: null
resolved_value: null
resolution_justification: null

7. Conflict-Driven Investigation Triggers

Core Concept

When an ontology field is configured with freeze_investigate, a change in that field's value from a data source does not update the ontology. Instead, it:

Freezes the field at the last analyst-verified value
Creates a ConflictRecord
Triggers an investigation via the workflow engine (ADR-007)
The investigation agent re-examines the specific question
The agent's findings are presented to an analyst as evidence
The analyst resolves the conflict (accept new value, keep old value, or set a different value)
The resolution unfreezes the field

Integration with Scheduled Re-Investigation

The conflict-driven investigation mechanism works synergistically with scheduled re-investigations:

                    INITIAL KYC
                        │
                        ▼
              ┌─────────────────┐
              │ Populate ontology│
              │ from all sources │
              │ Auto-accept      │
              │ (baseline)       │
              └────────┬────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ Analyst verifies │◀─────── First human review
              │ Marks fields as  │
              │ "verified"       │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          │            │            │
          ▼            ▼            ▼
   Scheduled      External      Manual
   Re-check       Webhook       Trigger
   (e.g., 90d)    (source push) (analyst)
          │            │            │
          └────────────┼────────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ Fresh data from  │
              │ sources          │
              └────────┬────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ Compare against  │
              │ verified ontology│──── No conflicts ──▶ Log, done
              │                  │
              │ Conflicts found  │──── ▼
              └────────┬────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ Per-field conflict│
              │ response rules   │
              │ fire             │
              └────────┬────────┘
                       │
          ┌────────────┼────────────┐
          │            │            │
          ▼            ▼            ▼
     accept_      flag_        freeze_
     trusted      review       investigate
          │            │            │
          │            │            ▼
          │            │     ┌─────────────┐
          │            │     │ Launch       │
          │            │     │ targeted     │
          │            │     │ investigation│
          │            │     └──────┬──────┘
          │            │            │
          │            │            ▼
          │            │     ┌─────────────┐
          │            │     │ Agent        │
          │            │     │ findings →   │
          │            │     │ Analyst      │
          │            │     │ resolution   │
          │            │     └──────┬──────┘
          │            │            │
          └────────────┼────────────┘
                       │
                       ▼
              ┌─────────────────┐
              │ Updated ontology │
              │ Re-score via     │
              │ risk matrix      │
              └─────────────────┘

Investigation Scoping

When a conflict triggers an investigation, the scope can be configured:

Scope	Behavior	Example
`field_only`	Re-investigate just the conflicting field	Ownership percentage changed → MEBO re-verifies that specific ownership chain
`entity_group`	Re-investigate the entire entity group	Any Person field conflict → MEBO re-examines all UBO/management data
`full_entity`	Re-investigate the entire entity across all sources	Material conflict on a core identity field → full re-investigation
`related_entities`	Re-investigate the entity and all related entities	Sanctions status change → re-screen all related persons and entities

Threshold Expressions

Thresholds can be simple or compound:

# Simple: any change triggers investigation
threshold: "changed"

# Delta: only if the change exceeds a percentage
threshold: "delta > 10%"

# Boolean flip: only false→true (new risk appearing)
threshold: "false_to_true"

# Value change: specific value transitions
threshold: "value_not_in(active, dormant)"

# Compound: multiple conditions
threshold: "delta > 5% AND incoming_trust < 0.9"

8. Mutation Queue & Curation Layer

Architecture

All data changes — from providers, agents, analysts, and conflict resolutions — flow through a unified mutation queue. The queue is subject-oriented, not entity-only: a mutation can target an entity field, a relationship field, or a collection member.

┌──────────────────────────────────────────────────────────────────┐
│                       MUTATION QUEUE                              │
│                                                                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
│  │ Provider  │  │ Agent    │  │ Analyst  │  │ Conflict         │ │
│  │ Mutations │  │ Evidence │  │ Input    │  │ Resolutions      │ │
│  └─────┬────┘  └─────┬────┘  └─────┬────┘  └────────┬─────────┘ │
│        │             │             │                 │            │
│        └─────────────┼─────────────┼─────────────────┘            │
│                      │             │                              │
│                      ▼             ▼                              │
│              ┌────────────────────────┐                           │
│              │ Deterministic Evidence  │                           │
│              │ Mapper                  │                           │
│              │ (source schema →        │                           │
│              │  ontology mutations)    │                           │
│              └───────────┬────────────┘                           │
│                          │                                        │
│                          ▼                                        │
│              ┌────────────────────────┐                           │
│              │ Conflict Detection      │                           │
│              │ (compare against        │                           │
│              │  current ontology)      │                           │
│              └───────────┬────────────┘                           │
│                          │                                        │
│               ┌──────────┼──────────┐                             │
│               │          │          │                              │
│               ▼          ▼          ▼                              │
│          Auto-Accept  Review    Investigate                       │
│               │       Queue     Trigger                           │
│               │          │          │                              │
│               ▼          ▼          ▼                              │
│         ┌────────┐  ┌────────┐  ┌────────────┐                   │
│         │ Apply  │  │ Analyst│  │ Freeze +   │                   │
│         │ to     │  │ Inbox  │  │ Launch     │                   │
│         │ Entity │  │        │  │ Investigation│                  │
│         │ Graph  │  │        │  │            │                    │
│         └───┬────┘  └───┬────┘  └──────┬─────┘                   │
│             │           │              │                           │
│             └───────────┼──────────────┘                           │
│                         │                                         │
│                         ▼                                         │
│              ┌────────────────────────┐                           │
│              │ Scoring Projection      │                           │
│              │ (read accepted graph,   │                           │
│              │  compute risk scores)   │                           │
│              └────────────────────────┘                           │
└──────────────────────────────────────────────────────────────────┘

Mutation Record

Every proposed change is recorded as an immutable mutation. Mutations address a subject instance so the same mechanism works for:

entity attributes (LegalEntity.legal_name)
relationship attributes (BENEFICIAL_OWNER_OF.ownership_percentage)
collection members (trade_names["Acme Logistics"])

The runtime model must therefore identify both the subject kind and the stable path within that subject:

CREATE TABLE ontology_mutations (
    id                      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id               UUID NOT NULL REFERENCES tenants(id),
    schema_version_id       UUID NOT NULL REFERENCES ontology_schema_versions(id),

    -- What object is being mutated
    subject_kind            TEXT NOT NULL,   -- 'entity', 'relationship', 'collection_member'
    entity_id               UUID,
    relationship_instance_id UUID,
    subject_type            TEXT NOT NULL,   -- entity_type or relationship_type
    field_path              TEXT NOT NULL,   -- 'legal_name', 'ownership_percentage', 'addresses[registered].city'
    cardinality_key         TEXT,            -- stable identity for repeated items where needed

    -- What changed
    previous_value      JSONB,
    proposed_value      JSONB NOT NULL,
    source_type             TEXT NOT NULL,  -- 'provider', 'agent', 'analyst', 'conflict_resolution'
    source_id               TEXT NOT NULL,  -- e.g., 'kvk', 'mebo', 'analyst:jvandijk'
    source_contract_version_id UUID,
    source_trust            NUMERIC(3,2),

    -- Conflict detection
    conflict_detected       BOOLEAN NOT NULL DEFAULT FALSE,
    conflict_id             UUID REFERENCES conflict_records(id),

    -- Resolution
    status                  TEXT NOT NULL DEFAULT 'proposed',
    -- 'proposed', 'auto_accepted', 'pending_review', 'accepted', 'rejected', 'frozen'
    resolved_at             TIMESTAMPTZ,
    resolved_by             TEXT,  -- 'system:auto_accept', 'system:freeze', 'analyst:jvandijk'

    -- Audit
    created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    investigation_id        UUID,
    batch_id                UUID,  -- groups mutations from the same source refresh
    evidence_snapshot_id    UUID,

    CHECK (
        (subject_kind = 'entity' AND entity_id IS NOT NULL AND relationship_instance_id IS NULL) OR
        (subject_kind = 'relationship' AND relationship_instance_id IS NOT NULL) OR
        (subject_kind = 'collection_member')
    )
);

CREATE INDEX idx_mutations_subject ON ontology_mutations(subject_kind, entity_id, relationship_instance_id, field_path);
CREATE INDEX idx_mutations_status ON ontology_mutations(status) WHERE status IN ('proposed', 'pending_review', 'frozen');
CREATE INDEX idx_mutations_conflict ON ontology_mutations(conflict_id) WHERE conflict_detected = TRUE;

Provenance Chain

Every accepted mutation updates the graph AND records field-level provenance against the same subject instance:

CREATE TABLE field_provenance (
    id                      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    subject_kind            TEXT NOT NULL,   -- 'entity', 'relationship', 'collection_member'
    entity_id               UUID,
    relationship_instance_id UUID,
    subject_type            TEXT NOT NULL,
    field_path              TEXT NOT NULL,
    cardinality_key         TEXT,
    subject_key             TEXT NOT NULL,   -- normalized key for current-subject uniqueness
    current_value           JSONB NOT NULL,
    source_type             TEXT NOT NULL,
    source_id               TEXT NOT NULL,
    source_contract_version_id UUID,
    source_trust            NUMERIC(3,2),
    verified_by             TEXT,           -- analyst who verified this value
    verified_at             TIMESTAMPTZ,
    mutation_id             UUID NOT NULL REFERENCES ontology_mutations(id),
    schema_version_id       UUID NOT NULL,
    is_frozen               BOOLEAN NOT NULL DEFAULT FALSE,
    frozen_reason           TEXT,           -- 'conflict_investigation' etc.
    created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(subject_key)
);

Relationship Instance Identity

Relationship data is not modeled as a loose edge plus attributes. Each accepted relationship has a stable relationship_instance_id derived from:

relationship type
source and target entity IDs
any schema-declared identity fields on the relationship

This lets Atlas distinguish:

a changed ownership percentage on the same beneficial-owner relationship
a relationship ending and a new one starting
two separate concurrent relationships of the same type with different effective periods

Without this identity model, conflict detection on ownership and control data becomes unreliable.

9. Analyst as Data Source

Design Principle

The analyst is a first-class data source with their own typed schema, just like KVK or NorthData. The difference is that the analyst schema is per custom ontology — the compliance officer defines what an analyst can contribute when they design the ontology.

Analyst Schema Design

In the Schema Mapping Designer, the analyst appears in the data source library alongside registries and agents. When the compliance officer adds the analyst as a source, they define the analyst's output fields:

analyst_schema:
  schema_for: "nl-corporate-kyc-v2"
  entities:
    - entity: "Qualitative Assessment"
      fields:
        - name: risk_justification
          type: string
          description: "Free-text justification for risk assessment"
        - name: customer_interview_date
          type: date
          description: "Date of last customer interview"
        - name: interview_notes
          type: string
          description: "Summary of customer interview"
        - name: manual_risk_override
          type: enum
          description: "low, medium, high, critical"
          values: [low, medium, high, critical]
        - name: override_reason
          type: string
          description: "Justification for manual risk override"

    - entity: "Verification"
      fields:
        - name: ubo_verified
          type: boolean
          description: "Analyst has manually verified UBO structure"
        - name: address_verified
          type: boolean
          description: "Analyst has physically or digitally verified the address"
        - name: verification_method
          type: enum
          values: [site_visit, google_maps, phone_call, video_call, document_review]
        - name: verification_date
          type: date
        - name: verification_notes
          type: string

Analyst Fields That No Automated Source Can Provide

Some ontology fields should only accept analyst input. These are fields where human judgment is required:

Qualitative risk assessments ("this customer's business model is unusually complex for their stated industry")
Interview-based findings ("customer could not explain their ownership structure")
Verification confirmations ("I visited the registered office and confirmed it's a real business")
Override justifications ("despite the automated high-risk score, I'm overriding to medium because...")

These fields use the manual_only merge strategy, which means they are invisible to automated sources and only populated by analyst input through the curation layer.

10. Live Data Preview

Purpose

During schema design, the compliance officer needs to validate that their mappings produce correct results with real data. The live preview pulls data from configured sources, runs it through the mapping pipeline, and shows the populated ontology.

Preview Flow

Officer clicks "Preview" in Schema Mapping Designer
        │
        ▼
┌─────────────────────────┐
│ Select test entity       │
│ (e.g., KVK# 12345678)  │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│ Call active source APIs   │
│ with test entity ID      │
│ (KVK, NorthData, etc.)  │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│ Execute source→ontology  │
│ mappings (client-side     │
│ or via preview API)      │
└────────┬────────────────┘
         │
         ▼
┌─────────────────────────┐
│ Show populated ontology   │
│ with field-level detail: │
│                          │
│ legal_name: "Acme BV"   │
│   └─ KVK: "Acme BV"    │  ← source values
│   └─ ND: "Acme B.V."   │  ← shows conflict
│   └─ Winner: KVK (HT)  │  ← merge result
│   └─ Conflict: format   │  ← classification
│                          │
│ ownership_pct: FROZEN    │
│   └─ MEBO: 25.0%        │  ← source values
│   └─ ND: 33.3%          │  ← conflict
│   └─ Response: INVEST   │  ← would trigger
│                          │
│ is_sanctioned: false     │
│   └─ SPEPWS: false      │
│   └─ (no conflict)      │
└──────────────────────────┘

Preview Detail View

For each populated field, the preview shows:

Column	Description
Field	Ontology field name
Resolved Value	The value after applying merge strategy
Sources	Each source that provided a value, with the raw value and trust level
Merge Rule	Which strategy was applied
Conflict?	Whether sources disagreed
Response	What would happen if this were live (accept/flag/investigate)
Transform	Whether a transform was applied and what it did

Coverage Report

The preview also generates a coverage summary:

Schema Coverage for "Acme BV" (KVK 12345678)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Ontology fields:        36
Fields populated:       28 (78%)
Fields empty:            8 (22%)
Required fields OK:     12/14 (86%)
Required fields missing: 2 (jurisdiction, channel_type)

Source utilization:
  KVK:      12 fields mapped, 12 populated (100%)
  NorthData: 8 fields mapped, 7 populated (88%)
  SPEPWS:    4 fields mapped, 2 populated (50%)   ← low coverage
  MEBO:      3 fields mapped, 3 populated (100%)
  ROA:       5 fields mapped, 4 populated (80%)
  Analyst:   4 fields mapped, 0 populated (awaiting analyst input)

Conflicts detected: 3
  ownership_percentage: MEBO (25.0%) vs NorthData (33.3%) → FREEZE
  total_employees: KVK (50) vs NorthData (45) → ACCEPT_TRUSTED
  legal_name: KVK ("Acme BV") vs NorthData ("Acme B.V.") → ACCEPT_TRUSTED

Risk Matrix completeness:
  Customer Risk:      5/5 factors mapped (100%)
  Geographic Risk:    2/3 factors mapped (67%)   ← transaction_geography unmapped
  Product/Service:    2/2 factors mapped (100%)
  Delivery Channel:   1/2 factors mapped (50%)   ← digital_presence unmapped
  Transaction Risk:   0/2 factors mapped (0%)    ← no source for transaction data

11. Schema Persistence & Versioning

Database Schema

-- Schema line: named, tenant-scoped identity
CREATE TABLE ontology_schema_lines (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    name            TEXT NOT NULL,
    description     TEXT,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by      TEXT NOT NULL,

    UNIQUE(tenant_id, name)
);

-- Schema version: immutable once published
CREATE TABLE ontology_schema_versions (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    schema_line_id      UUID NOT NULL REFERENCES ontology_schema_lines(id),
    version_number      INTEGER NOT NULL,
    status              TEXT NOT NULL DEFAULT 'draft',
    -- 'draft', 'evaluated', 'published', 'archived'

    -- Schema content (JSONB for flexibility during design, compiled to typed schema on publish)
    entity_groups       JSONB NOT NULL,      -- entity definitions with fields
    source_mappings     JSONB NOT NULL,      -- source→ontology field mappings
    matrix_mappings     JSONB NOT NULL,      -- ontology→matrix field mappings
    field_configs       JSONB NOT NULL,      -- per-field merge rules, conflict responses, investigation triggers
    active_sources      JSONB NOT NULL,      -- which tenant source adapters are active
    analyst_schema      JSONB,               -- analyst data source schema
    pinned_source_contracts JSONB NOT NULL,  -- exact source contract version IDs used by this schema

    -- Metadata
    risk_matrix_version UUID REFERENCES risk_matrix_versions(id),
    forked_from         UUID REFERENCES ontology_schema_versions(id),
    published_at        TIMESTAMPTZ,
    published_by        TEXT,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(schema_line_id, version_number),
    CHECK (status IN ('draft', 'evaluated', 'published', 'archived'))
);

-- Enforce: at most one published version per schema line
CREATE UNIQUE INDEX uq_one_published_per_line
    ON ontology_schema_versions(schema_line_id)
    WHERE status = 'published';

-- Source mappings (denormalized for query performance, source of truth is JSONB above)
CREATE TABLE ontology_source_mappings (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    schema_version_id   UUID NOT NULL REFERENCES ontology_schema_versions(id),
    source_id           TEXT NOT NULL,
    source_entity       TEXT NOT NULL,
    source_field        TEXT NOT NULL,
    target_entity       TEXT NOT NULL,
    target_field        TEXT NOT NULL,
    transform           TEXT,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- Field conflict configuration
CREATE TABLE ontology_field_configs (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    schema_version_id   UUID NOT NULL REFERENCES ontology_schema_versions(id),
    entity_type         TEXT NOT NULL,
    field_name          TEXT NOT NULL,

    -- Identity
    is_identity_key     BOOLEAN NOT NULL DEFAULT FALSE,
    is_required         BOOLEAN NOT NULL DEFAULT FALSE,

    -- Merge
    merge_strategy      TEXT NOT NULL DEFAULT 'first_available',
    -- 'first_available', 'highest_trust', 'latest', 'manual_only', 'any_true', 'count_distinct', 'accumulate'

    -- Conflict
    conflict_response   TEXT NOT NULL DEFAULT 'accept_trusted',
    -- 'accept_trusted', 'flag_review', 'freeze_investigate'

    -- Investigation trigger (only if conflict_response = 'freeze_investigate')
    investigation_agent TEXT,
    investigation_threshold TEXT,   -- threshold expression
    investigation_priority TEXT DEFAULT 'medium',
    investigation_scope TEXT DEFAULT 'field_only',
    -- 'field_only', 'entity_group', 'full_entity', 'related_entities'

    UNIQUE(schema_version_id, entity_type, field_name)
);

Published ontology schemas are immutable compilations. On publish, Atlas stores the fully resolved ontology schema plus the exact source contract versions and matrix version it was validated against. If a provider later ships a new contract version, existing published ontology schemas are unaffected until a new ontology version is created and evaluated against that newer contract.

12. Evaluations

Purpose

Before publishing a schema, the compliance officer runs evaluations to validate coverage, correctness, and scoring completeness. An evaluation runs the schema against a set of test entities and produces a quality report.

Evaluations have two modes:

snapshot mode (default) — uses frozen evidence snapshots and pinned source contract versions; this is the authoritative publish gate
live preview mode — pulls fresh data from active sources; this is exploratory and explicitly non-deterministic

Evaluation Types

Type	What It Tests	Pass Criteria
Coverage	How many ontology fields get populated for each test entity	≥ threshold (e.g., 80% of required fields)
Source Agreement	How often sources agree vs conflict across the test set	Conflict rate below threshold
Transform Correctness	Whether transforms produce valid output	No transform errors
Matrix Completeness	Whether every required risk factor receives a value	All required factors mapped and populated
Scoring Consistency	Whether the same entity produces the same risk score across runs in snapshot mode	Deterministic (0 variance)
Regression	Whether the new schema version produces different scores than the previous version for the same entities	Diff report showing score changes

Evaluation Workflow

Officer selects a test set (list of entity IDs or a saved test suite)
System resolves the exact source contract versions pinned by the draft schema
In snapshot mode, system loads frozen evidence snapshots for the test entities; in live preview mode, it pulls fresh data and creates temporary snapshots
System runs data through the schema (mappings, merge rules, conflict detection)
System populates the ontology and computes risk scores
System generates the evaluation report, including contract versions and snapshot IDs
If all required snapshot-mode criteria pass, schema status moves to "evaluated"
Officer reviews report and decides to publish or iterate

Test Suites

Test suites are tenant-scoped, reusable collections of test entities:

CREATE TABLE evaluation_test_suites (
    id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id       UUID NOT NULL REFERENCES tenants(id),
    name            TEXT NOT NULL,
    description     TEXT,
    entity_ids      UUID[] NOT NULL,   -- list of entity IDs to test
    created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by      TEXT NOT NULL
);

CREATE TABLE evaluation_runs (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    schema_version_id   UUID NOT NULL REFERENCES ontology_schema_versions(id),
    test_suite_id       UUID REFERENCES evaluation_test_suites(id),
    evaluation_mode     TEXT NOT NULL DEFAULT 'snapshot',
    -- 'snapshot', 'live_preview'
    source_contract_set JSONB NOT NULL,
    evidence_snapshot_ids UUID[] NOT NULL DEFAULT '{}',
    status              TEXT NOT NULL DEFAULT 'running',
    -- 'running', 'completed', 'failed'

    -- Results
    coverage_score      NUMERIC(5,2),
    conflict_rate       NUMERIC(5,2),
    matrix_completeness NUMERIC(5,2),
    transform_errors    INTEGER DEFAULT 0,
    scoring_variance    NUMERIC(5,4),

    report              JSONB,          -- full detailed report
    started_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    completed_at        TIMESTAMPTZ,
    run_by              TEXT NOT NULL
);

13. Data Source Schema Management

Every Data Source Has an Ontology Schema

A core principle of the three-schema model is that every data source — whether a government registry, AI agent, screening database, file import, or manual analyst — must declare an output schema. This schema defines the entities and fields that the source can produce. Without a declared schema, a source cannot participate in the mapping pipeline.

The schema is stored as a versioned source contract. Contracts are immutable once published. A source adapter may be reconfigured per tenant without changing the contract version it emits against.

Managing Source Schemas in Compliance Studio

The Compliance Studio provides a dedicated Data Sources section where compliance officers and administrators can:

Browse the data source library — all installed data provider plugins (ADR-009), organized by category (Registry, Investigation, Screening, Analyst)
View published source contracts — inspect the output schema of any data source: which entities it produces, which fields within each entity, field types, and descriptions
Configure source adapters — per-tenant trust levels, active/inactive status, API credentials (masked), rate limits, and sandbox vs production mode
Create custom analyst schemas — design the analyst data source schema for a specific ontology (what fields can an analyst contribute)
Register new tenant-owned data sources — add database connections, file upload schemas, webhook endpoints, or custom API integrations as new sources with declared output schemas
Health monitoring — view source health status, last successful call, error rates, and latency metrics

Source Schema Registry

Source management is split into three stores:

Source definitions — stable source identities such as kvk, northdata, analyst
Source contract versions — immutable published output schemas
Tenant source adapters — per-tenant connection, trust, and activation settings

This separation ensures:

provider contract evolution is explicit and auditable
tenant operational config can change without invalidating published ontology schemas
evaluations and live runs can be replayed against the exact contract version used at the time

CREATE TABLE data_source_definitions (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_id           TEXT NOT NULL UNIQUE,   -- 'kvk', 'northdata', 'analyst'
    source_type         TEXT NOT NULL,          -- 'rest_api', 'database', 'file', 'webhook', 'agent', 'manual'
    category            TEXT NOT NULL,          -- 'registry', 'investigation', 'screening', 'analyst'
    display_name        TEXT NOT NULL,
    description         TEXT
);

CREATE TABLE data_source_contract_versions (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    source_definition_id UUID NOT NULL REFERENCES data_source_definitions(id),
    version_number      INTEGER NOT NULL,
    status              TEXT NOT NULL DEFAULT 'draft',
    -- 'draft', 'published', 'archived'
    output_schema       JSONB NOT NULL,
    capabilities        TEXT[] NOT NULL DEFAULT '{}',
    published_at        TIMESTAMPTZ,
    published_by        TEXT,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(source_definition_id, version_number)
);

CREATE UNIQUE INDEX uq_one_published_contract_per_source
    ON data_source_contract_versions(source_definition_id)
    WHERE status = 'published';

CREATE TABLE tenant_source_adapters (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id           UUID NOT NULL REFERENCES tenants(id),
    source_definition_id UUID NOT NULL REFERENCES data_source_definitions(id),
    pinned_contract_version_id UUID NOT NULL REFERENCES data_source_contract_versions(id),
    trust_level         NUMERIC(3,2) NOT NULL DEFAULT 0.5,
    connection_config   JSONB,
    is_active           BOOLEAN NOT NULL DEFAULT TRUE,
    health_status       TEXT DEFAULT 'unknown',
    last_health_check   TIMESTAMPTZ,
    created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    UNIQUE(tenant_id, source_definition_id)
);

The analyst source is a special case:

the analyst runtime adapter is tenant-owned
the analyst contract is generated from the published ontology schema and versioned alongside it
analyst input therefore remains typed, auditable, and replayable like any other source

Linking Data Mappings to Risk Matrices

A published data mapping (custom ontology schema) can be linked to one or more risk matrix versions. This link establishes the full pipeline: sources → ontology → matrix → scores.

CREATE TABLE schema_matrix_links (
    id                      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    schema_version_id       UUID NOT NULL REFERENCES ontology_schema_versions(id),
    risk_matrix_version_id  UUID NOT NULL REFERENCES risk_matrix_versions(id),
    is_primary              BOOLEAN NOT NULL DEFAULT FALSE,  -- primary link used for scoring
    created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    created_by              TEXT NOT NULL,

    UNIQUE(schema_version_id, risk_matrix_version_id)
);

-- At most one primary link per schema version
CREATE UNIQUE INDEX uq_primary_matrix_link
    ON schema_matrix_links(schema_version_id)
    WHERE is_primary = TRUE;

This means:

A "NL Corporate KYC v2" schema can be linked to "EBA Standard Matrix v3" for standard scoring
The same schema could also be linked to "Enhanced DD Matrix v1" for high-risk customers
The Schema Mapping Designer shows the linked matrix in the right column
When switching the linked matrix, the right column updates to show the new matrix's factors and required mappings
Evaluations run against a specific schema + matrix combination

14. Compliance Studio Organization

The Compliance Studio is organized around the three layers of the pipeline, from data sources through ontology design to risk scoring. Each section provides the tools for one layer.

Compliance Studio
├── Data Sources                  ← Layer 1: Where data comes from
│   ├── Source Library            ← Browse/manage installed sources
│   ├── Source Schemas            ← View published source contracts
│   └── Health Monitor            ← Source connectivity & health
│
├── Schema Designer               ← Layer 2: How data is unified
│   ├── Data Mappings             ← Create/edit custom ontology schemas + mappings
│   └── Schema Versions           ← Version history, publish, fork
│
├── Risk Matrices                 ← Layer 3: How risk is scored
│   ├── Matrix Editor             ← Configure EBA dimensions & factors
│   └── Risk Categories            ← Reference data (country lists, industry codes, etc.)
│
├── Evaluations                   ← Validation: Does it all work?
│   ├── Schema Evaluations        ← Coverage, conflict, completeness reports
│   └── Scoring Evaluations       ← End-to-end scoring validation
│
└── Workflows                     ← Orchestration: When does it run?
    ├── Workflow Schemas           ← KYC workflow definitions
    └── Investigation Templates   ← Conflict-triggered investigation configs

Menu Item	Icon	Path	Purpose
Source Library	`database`	`/studio/sources`	Browse all available data sources, activate/deactivate per tenant, configure credentials and trust levels
Source Schemas	`diagram-tree`	`/studio/sources/schemas`	View published source contracts, inspect version history, create custom analyst contracts, and register tenant-owned sources.
Health Monitor	`pulse`	`/studio/sources/health`	Real-time dashboard of source connectivity, error rates, latency, and last successful data pull
Data Mappings	`data-connection`	`/studio/mappings`	The Schema Mapping Designer — the visual three-column builder where you wire sources → ontology → matrix
Schema Versions	`git-branch`	`/studio/mappings/versions`	List all schema versions per line, their status (draft/evaluated/published/archived), diff between versions
Matrix Editor	`heat-grid`	`/studio/matrices`	Configure EBA risk dimensions, factors, weights, aggregation functions. Link matrices to data mappings.
Risk Categories	`th-derived`	`/studio/risk-categories`	Manage reference data: high-risk country lists, industry risk classifications, PEP categories, etc.
Schema Evaluations	`lab-test`	`/studio/evaluations/schema`	Run and review schema evaluations: coverage, conflict rates, source utilization
Scoring Evaluations	`timeline-bar-chart`	`/studio/evaluations/scoring`	Run and review end-to-end scoring: schema + matrix → risk scores for test entities
Workflow Schemas	`flows`	`/studio/workflows`	Define KYC onboarding workflows, re-investigation schedules, monitoring frequencies
Investigation Templates	`search`	`/studio/workflows/investigations`	Configure conflict-driven investigation templates: which agent, what threshold, what scope

Design Principle: Pipeline Visibility

The Studio menu follows the data pipeline left-to-right: Sources → Ontology → Matrix → Evaluation → Workflow. A compliance officer designing a new KYC program would naturally work through the menu top-to-bottom:

Configure their data sources (which registries, agents, and screening databases to use)
Design their ontology (what entity model captures their compliance requirements)
Link a risk matrix (how ontology fields map to EBA risk factors)
Evaluate (does the pipeline produce correct, complete results)
Deploy via workflow (when and how the pipeline runs for real investigations)

15. Future Data Source Types

Extensibility Model

The data provider plugin architecture (ADR-009) already supports adding new sources via plugin manifests. This ADR extends that model to support diverse data source types beyond REST APIs.

Source Type	Connection	Schema Declaration	Example
REST API	URL + auth headers + rate limits	Response JSON schema → output fields	KVK, NorthData
Database	Connection string + query template	Result set columns → output fields	Internal CRM, AML database
File Upload	Upload endpoint + format spec	Parsed columns/fields → output fields	CSV watchlists, Excel reports
Webhook	Incoming webhook URL + payload schema	Payload fields → output fields	Real-time screening alerts
AI Agent	Agent configuration + prompt template	Agent output schema → output fields	MEBO, AMLRR, ROA
Manual Entry	Form schema in Compliance Studio	Form fields → output fields	Analyst input
Aggregator	Combines multiple sub-sources	Union of sub-source schemas	NorthData (aggregates multiple registries)

Plugin Manifest Extension

# Extended plugin manifest for ADR-010 compatibility
plugin:
  id: "kvk-nl"
  version: "1.0.0"
  type: "rest_api"           # NEW: source type

  connection:                 # NEW: connection configuration
    type: "rest_api"
    base_url: "https://api.kvk.nl/api"
    auth:
      type: "api_key"
      header: "apikey"
    rate_limit:
      requests_per_minute: 60
    sandbox:
      base_url: "https://api.kvk.nl/test/api"

  output_schema:              # Replaces mapping_spec.yaml
    entities:
      - entity: "Registration"
        fields:
          - name: kvkNummer
            type: string
            required: true
          - name: statutaireNaam
            type: string
            required: true
          # ...

  health_check:
    endpoint: "/v2/zoeken?kvkNummer=00000000"
    expected_status: 200

16. Relationship Modeling

Gap in Current Design

The current schema models entities (LegalEntity, Person, Address) but not the relationships between them. "Person X is UBO of LegalEntity Y" is a relationship with its own attributes (ownership percentage, control type, effective date).

Relationship Types in Custom Ontology

The custom ontology schema supports relationship type definitions alongside entity types:

relationship_types:
  - type: "BENEFICIAL_OWNER_OF"
    source_entity: "Person"
    target_entity: "LegalEntity"
    fields:
      - name: ownership_percentage
        type: number
        conflict_response: freeze_investigate
        investigation_agent: mebo
      - name: control_type
        type: enum
        values: [direct, indirect, nominee]
      - name: effective_date
        type: date
      - name: verified_by
        type: string
        merge_strategy: manual_only

  - type: "REGISTERED_AT"
    source_entity: "LegalEntity"
    target_entity: "Address"
    fields:
      - name: address_type
        type: enum
        values: [registered, operational, postal]
      - name: effective_date
        type: date

  - type: "RELATED_TO"
    source_entity: "LegalEntity"
    target_entity: "LegalEntity"
    fields:
      - name: relationship_type
        type: enum
        values: [parent, subsidiary, sister, joint_venture]
      - name: ownership_percentage
        type: number

Relationship fields follow the same merge strategy, conflict response, and investigation trigger model as entity fields, but runtime processing treats each relationship as a first-class subject with its own identity, provenance, and lifecycle.

Relationship Runtime Model

At publish time, every relationship type must declare either:

an explicit identity key for matching repeated relationships, or
a rule that Atlas should derive identity from (source_entity_id, target_entity_id, relationship_type, identity fields...)

At runtime:

Incoming relationship evidence is matched to an existing relationship_instance_id
If matched, field-level mutations are evaluated on that relationship instance
If unmatched, a new relationship instance is proposed
If a previously current relationship is no longer observed, Atlas records an end event rather than deleting it

This aligns relationship handling with ADR-009's is_current and event-sourcing requirements and avoids collapsing "changed edge attributes" into "different edge exists".

17. Schema Composition & Inheritance

Use Case

A compliance team needs multiple schema variants that share a common base:

NL Corporate KYC — Simplified (low-risk customers, fewer fields)
NL Corporate KYC — Standard (standard due diligence)
NL Corporate KYC — Enhanced (high-risk, additional fields and stricter conflict responses)

Inheritance Model

A schema can declare a parent. It inherits all entity groups, fields, mappings, and configurations from the parent, and can:

Add new entity groups or fields
Override field configurations (e.g., change conflict_response from accept_trusted to freeze_investigate)
Remove fields that are not needed (mark as excluded)
Cannot change field types or identity keys (those are fixed by the parent)

ALTER TABLE ontology_schema_versions
    ADD COLUMN parent_version_id UUID REFERENCES ontology_schema_versions(id),
    ADD COLUMN inheritance_overrides JSONB;  -- field-level overrides on parent

The effective schema is computed by merging the parent's definitions with the child's overrides. This is resolved at publish time — the published version contains the fully resolved schema, not a reference to the parent.

18. Scoring Projection Integration

Compile-Time Validation

When a schema version is published, the system validates that the risk matrix schema's required factors are satisfiable:

For each required factor in the risk matrix:
Identify the ontology field(s) it reads (from matrix mappings)
Verify those fields exist in the custom ontology
Verify at least one source mapping feeds those fields
Verify the reduction strategy is compatible with the field type
If any check fails → block publish with a clear error message

Runtime Scoring Flow

Published Schema v2
        │
        ▼
┌─────────────────────────┐
│ Scoring Projection       │
│                          │
│ Reads: accepted graph    │
│ snapshot (entities +     │
│ relationships, excluding │
│ frozen subjects)         │
│                          │
│ For each matrix factor:  │
│   - Find ontology field  │
│   - Apply reduction      │
│   - Handle missing values│
│   - Compute factor score │
│                          │
│ Aggregate dimension      │
│ scores → overall risk    │
└────────┬────────────────┘
         │
         ▼
   Risk Score + Audit Record
  (pinned to schema v2 +
   matrix version +
   source contract set +
   evidence snapshot)

Frozen fields are excluded from scoring by default. A frozen field means its value is under investigation and cannot be relied upon. The scoring projection uses the missing_strategy for that factor (typically flag_for_review) and the evaluation is marked as "provisional — pending investigation."

19. Database Schema

New Tables Summary

Table	Purpose
`ontology_schema_lines`	Named schema identities per tenant
`ontology_schema_versions`	Versioned schema definitions (draft/published/archived)
`ontology_source_mappings`	Denormalized source→ontology field mappings
`ontology_field_configs`	Per-field merge, conflict, and investigation configuration
`ontology_mutations`	Immutable record of all proposed subject mutations
`field_provenance`	Current field values with source provenance
`conflict_records`	Detected conflicts with resolution tracking
`data_source_definitions`	Stable source identities
`data_source_contract_versions`	Versioned immutable source contracts
`tenant_source_adapters`	Tenant-specific operational source config
`evaluation_test_suites`	Reusable test entity collections
`evaluation_runs`	Evaluation execution results

Conflict Records Table

CREATE TABLE conflict_records (
    id                      UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id               UUID NOT NULL REFERENCES tenants(id),
    schema_version_id       UUID NOT NULL REFERENCES ontology_schema_versions(id),
    subject_kind            TEXT NOT NULL,   -- 'entity', 'relationship', 'collection_member'
    entity_id               UUID,
    relationship_instance_id UUID,
    subject_type            TEXT NOT NULL,
    field_path              TEXT NOT NULL,
    cardinality_key         TEXT,

    -- Conflict details
    current_value           JSONB,
    current_source          TEXT NOT NULL,
    current_verified_at     TIMESTAMPTZ,
    current_verified_by     TEXT,
    incoming_value          JSONB NOT NULL,
    incoming_source         TEXT NOT NULL,
    incoming_source_contract_version_id UUID,
    incoming_received_at    TIMESTAMPTZ NOT NULL DEFAULT NOW(),

    -- Configuration that triggered this
    merge_strategy          TEXT NOT NULL,
    conflict_response       TEXT NOT NULL,

    -- Investigation (if triggered)
    investigation_id        UUID,
    investigation_agent     TEXT,
    investigation_priority  TEXT,
    investigation_scope     TEXT,

    -- Resolution
    status                  TEXT NOT NULL DEFAULT 'open',
    -- 'open', 'investigating', 'resolved', 'auto_resolved'
    resolved_at             TIMESTAMPTZ,
    resolved_by             TEXT,
    resolved_value          JSONB,
    resolution_type         TEXT,
    -- 'accept_incoming', 'keep_current', 'override', 'escalated'
    resolution_justification TEXT,

    created_at              TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE INDEX idx_conflicts_open ON conflict_records(tenant_id, status) WHERE status IN ('open', 'investigating');
CREATE INDEX idx_conflicts_subject ON conflict_records(subject_kind, entity_id, relationship_instance_id, field_path);

20. Migration Path

Phase 1: Schema Persistence (Sprint 1)

Database tables for schema lines, versions, field configs
Versioned source contracts and tenant source adapters
Save/load from Schema Mapping Designer
Draft/publish lifecycle
No conflict detection yet — pure schema management

Phase 2: Mutation Queue & Provenance (Sprint 2)

Mutation queue table and processing pipeline
Field provenance tracking
Relationship instance identity and relationship-level mutations
Merge strategy execution (deterministic value resolution)
Auto-accept for initial population (baseline)

Phase 3: Conflict Detection & Response (Sprint 3)

Conflict detection during mutation processing
accept_trusted and flag_review conflict responses
Conflict records table
Analyst review inbox for flagged conflicts

Phase 4: Conflict-Driven Investigation (Sprint 4)

freeze_investigate conflict response
Investigation trigger mechanism (via Temporal workflows, ADR-007)
Field freezing with provisional scoring
Investigation resolution flow

Phase 5: Live Preview & Evaluations (Sprint 5)

Preview API that pulls data and runs through mappings
Snapshot-backed evaluation framework with test suites
Coverage, conflict, and scoring reports
Schema status gates (draft → evaluated → published)

Phase 6: Schema Composition (Sprint 6)

Parent/child schema inheritance
Inheritance override resolution at publish time
Relationship type definitions in custom ontology

21. Rejected Alternatives

Alternative 1: Auto-classify conflicts by field type

Rejected because conflict materiality is a compliance judgment, not a technical property. The same field (employee count) might be material for one client and immaterial for another. The compliance officer must configure this explicitly.

Alternative 2: Real-time conflict resolution (no queue)

Rejected because conflict handling may involve human judgment, investigation, and audit trails. A synchronous model would block data ingestion on analyst availability.

Alternative 3: Global merge strategy (same rule for all fields)

Rejected because different fields have fundamentally different semantics. A boolean flag (is_sanctioned) needs any_true; a name needs highest_trust; an address collection needs accumulate. Per-field configuration is essential.

Alternative 4: Separate ontology and conflict systems

Rejected because conflict detection is inherent to the ontology update process. Separating them would require duplicating entity state tracking and create consistency risks.

Alternative 5: Duplicate entities on conflict instead of freezing fields

Rejected because entity duplication creates identity resolution problems downstream. The conflict is about a field value, not about entity identity. Freezing the field preserves the entity graph integrity while quarantining the uncertain data.

Alternative 6: Mutable source schemas per tenant

Rejected because it blurs the boundary between provider contract, tenant adapter, and ontology mapping. Mutable source schemas would make replay and auditability fragile. Source contracts must be versioned and immutable once published; tenant customization belongs in adapters and ontology mappings.

Alternative 7: Live-data-only evaluations

Rejected because upstream data drift would make publish gates non-deterministic. Live preview is useful for exploration, but authoritative evaluation must run against frozen evidence snapshots and pinned source contract versions.

22. Open Questions

Threshold expression language: should we use a simple DSL (delta > 10%), a JSON rules format, or reuse the existing rule engine from ADR-007?
Conflict resolution SLA: should there be a configurable time limit on how long a field can remain frozen before auto-escalating?
Cross-entity conflict correlation: if multiple fields across related entities conflict simultaneously (e.g., ownership changes across a corporate structure), should the system detect and group these as a single investigation?
Schema marketplace: should published schemas be shareable across tenants (e.g., an "NL Corporate KYC" schema that any Dutch-focused client can adopt and customize)?
Backward compatibility: when a new schema version adds fields, how should existing entities be handled? Lazy migration (compute on read) vs eager migration (backfill on publish)?
Evidence retention policy: how long should frozen evidence snapshots be retained for replay, regulator lookback, and regression testing?
Regulatory mapping: should the schema designer be able to tag fields with regulatory references (e.g., "this field satisfies EBA GL 2.5(a)") for audit purposes?

Table of Contents​

1. Context & Problem Statement​

2. Decision Summary​

3. Three-Schema Model​

Architecture​

Source Schemas (Layer 1)​

Source Contracts vs Tenant Adapters​

Custom Ontology Schema (Layer 2)​

Risk Matrix Schema (Layer 3)​

4. Custom Ontology Schema Lifecycle​

States​

Versioning​

5. Field Configuration Model​

5.1 Merge Strategy (How to pick a value)​

5.2 Conflict Response (What to do when sources disagree)​

5.3 Investigation Trigger (Which investigation to launch)​

Visual Representation in Schema Mapping Designer​

6. Conflict Detection & Response​

Detection Flow​

Conflict Classification​

Conflict Record​

7. Conflict-Driven Investigation Triggers​

Core Concept​

Integration with Scheduled Re-Investigation​

Investigation Scoping​

Threshold Expressions​

8. Mutation Queue & Curation Layer​

Architecture​

Mutation Record​

Provenance Chain​

Relationship Instance Identity​

9. Analyst as Data Source​

Design Principle​

Analyst Schema Design​

Analyst Fields That No Automated Source Can Provide​

10. Live Data Preview​

Purpose​

Preview Flow​

Preview Detail View​

Coverage Report​

11. Schema Persistence & Versioning​

Database Schema​

12. Evaluations​

Purpose​

Evaluation Types​

Evaluation Workflow​

Test Suites​

13. Data Source Schema Management​

Every Data Source Has an Ontology Schema​

Managing Source Schemas in Compliance Studio​

Source Schema Registry​

Linking Data Mappings to Risk Matrices​

14. Compliance Studio Organization​

Navigation Structure​

Menu Items and Their Purpose​

Design Principle: Pipeline Visibility​

15. Future Data Source Types​

Extensibility Model​

Plugin Manifest Extension​

16. Relationship Modeling​

Gap in Current Design​

Relationship Types in Custom Ontology​

Relationship Runtime Model​

17. Schema Composition & Inheritance​

Use Case​

Inheritance Model​

18. Scoring Projection Integration​

Compile-Time Validation​

Runtime Scoring Flow​

19. Database Schema​

New Tables Summary​

Conflict Records Table​

20. Migration Path​

Phase 1: Schema Persistence (Sprint 1)​

Phase 2: Mutation Queue & Provenance (Sprint 2)​

Phase 3: Conflict Detection & Response (Sprint 3)​

Phase 4: Conflict-Driven Investigation (Sprint 4)​

Phase 5: Live Preview & Evaluations (Sprint 5)​

Phase 6: Schema Composition (Sprint 6)​

21. Rejected Alternatives​

Table of Contents

1. Context & Problem Statement

2. Decision Summary

3. Three-Schema Model

Architecture

Source Schemas (Layer 1)

Source Contracts vs Tenant Adapters

Custom Ontology Schema (Layer 2)

Risk Matrix Schema (Layer 3)

4. Custom Ontology Schema Lifecycle

States

Versioning

5. Field Configuration Model

5.1 Merge Strategy (How to pick a value)

5.2 Conflict Response (What to do when sources disagree)

5.3 Investigation Trigger (Which investigation to launch)

Visual Representation in Schema Mapping Designer

6. Conflict Detection & Response

Detection Flow

Conflict Classification

Conflict Record

7. Conflict-Driven Investigation Triggers

Core Concept

Integration with Scheduled Re-Investigation

Investigation Scoping

Threshold Expressions

8. Mutation Queue & Curation Layer

Architecture

Mutation Record

Provenance Chain

Relationship Instance Identity

9. Analyst as Data Source

Design Principle

Analyst Schema Design

Analyst Fields That No Automated Source Can Provide

10. Live Data Preview

Purpose

Preview Flow

Preview Detail View

Coverage Report

11. Schema Persistence & Versioning

Database Schema

12. Evaluations

Purpose

Evaluation Types

Evaluation Workflow

Test Suites

13. Data Source Schema Management

Every Data Source Has an Ontology Schema

Managing Source Schemas in Compliance Studio

Source Schema Registry

Linking Data Mappings to Risk Matrices

14. Compliance Studio Organization

Navigation Structure

Menu Items and Their Purpose

Design Principle: Pipeline Visibility

15. Future Data Source Types

Extensibility Model

Plugin Manifest Extension

16. Relationship Modeling

Gap in Current Design

Relationship Types in Custom Ontology

Relationship Runtime Model

17. Schema Composition & Inheritance

Use Case

Inheritance Model

18. Scoring Projection Integration

Compile-Time Validation

Runtime Scoring Flow

19. Database Schema

New Tables Summary

Conflict Records Table

20. Migration Path

Phase 1: Schema Persistence (Sprint 1)

Phase 2: Mutation Queue & Provenance (Sprint 2)

Phase 3: Conflict Detection & Response (Sprint 3)

Phase 4: Conflict-Driven Investigation (Sprint 4)

Phase 5: Live Preview & Evaluations (Sprint 5)

Phase 6: Schema Composition (Sprint 6)

21. Rejected Alternatives