Claims & survivorship

Atlas does not store a single "true" value for each attribute and overwrite it as new data arrives. Instead, every attribute can carry multiple claims — one per provider — and survivorship rules decide which claim is presented as canonical. This preserves provenance and makes disagreement auditable. The decision logic lives in src/ontology/survivorship.py, deal_breaker_consolidator.py, and the reconciler.

Why claims

A claim records who said it, with what confidence, and from what source. Because claims are kept, Atlas can answer "why does the report say Acme Corp?" and show the competing assertions and the rule that chose between them. See ADR-020.

The survivorship decision

Strategies

Each attribute in the ontology declares a strategy:

Strategy	Behaviour	Example field
`most_trusted`	Use the value from the most trusted source	`legal_name`, `registration_number`
`most_recent`	Use the most recently retrieved value	volatile status fields
`most_complete`	Prefer a non-null / non-empty value	sparse attributes
`combine`	Merge arrays / union values	`registration_numbers`

Trust weights

Ties within most_trusted are resolved by two weight tables from the ontology:

Module trust — relative authority of OSINT modules (cir 10 → dfwo 3).
Provider trust — per-field confidence for external providers (e.g. NorthData registration_number 0.99, KVK postal_code 0.95, OSINT _default 0.70).

Protected fields

Some fields must never be silently overwritten by a provider — the PEP and sanctions flags on Person and LegalEntity. They originate from screening crews (SPEPWS, AMLRR), and enforce_protected_fields() guarantees a registry provider cannot clobber them. This is load-bearing compliance behaviour, covered by tests.

Deal-breaker consolidation

For multi-value fields where claims genuinely conflict (and a simple merge would be wrong), deal_breaker_consolidator.py can call an LLM to consolidate — "which of these values are real?" — using the survivorship weights as context, with a deterministic fallback if the LLM is unavailable.

Every decision is a mutation

Whatever survivorship decides is written as a mutation (before/after, with provider attribution). Significant changes can raise a conflict for human review. This is the audit trail described in Mutation queue, and it is what lets an analyst trust — and defend — the canonical value.

Deep dive: how a winner is chosen

This reflects src/ontology/survivorship.py, reconciliation.py, and canonical_synthesis.py.

The pairwise decision

resolve_field_value(existing, existing_provider, new, new_provider, field) decides each field with an explicit branch order:

"More complete" (_is_more_complete) prefers longer non-placeholder strings, longer lists, and dicts with more keys — and explicitly skips placeholders like "unknown", "n/a", "not found".

Trust lookup

get_provider_trust(provider, field) reads the ontology's provider_trust table via the SchemaCache; an unknown provider/field falls back to 0.50. Reconciliation combines field confidence with module trust as effective_confidence = (field_confidence + module_trust) / 2.

Strategy implementations

The reconciler (reconciliation.py) applies the per-field strategy and emits a conflict record when multiple distinct values exist:

Strategy	Implementation
`most_trusted`	max by effective confidence; ties → module then provider trust
`most_recent`	max by timestamp
`most_complete`	longest / non-empty
`most_specific`	rank by part-count + length + confidence (e.g. "England & Wales" ＞ "UK")
`combine` / `aggregate`	union arrays; boolean OR for flags (a `true` PEP wins)
`canonical`	normalize to canonical form (proper case, ISO codes)

Claim-plus-rank (Phase 110) and the read path

Each attribute can have many entity_claims rows. recompute_preferred_for_entity() does a pairwise reduction across all claims for an (entity_id, attribute_path) and moves the is_preferred flag to the winner. Because the partial unique index entity_claims_one_preferred is checked per modified row, this is a demote-then-promote within one transaction (the old loser is cleared before the new winner is set) — a single combined UPDATE could transiently violate the index. The read path, synthesize_entity_attributes() (canonical_synthesis.py), then:

Each result is a CanonicalAttribute(value, source, trust, updated_at, alternative_claims?). When only one source contributed, alternative_claims is absent from the JSON (not null) — the model_dump(exclude_none=True) contract. Paths with no preferred claim fall back to the legacy entity_data + _field_provenance (source "legacy", trust 0.5).

Protected fields, precisely

enforce_protected_fields() strips protected fields from an incoming payload unless the provider is authorized (spepws, amlrr). On Person the protected set is is_pep, pep_position, pep_source, pep_details, pep_category, is_sanctioned, sanctions_matches, sanctions_source, sanctions_lists.

Invariants

One winner per attribute path (the orphan-loser guard prevents a loser surfacing as canonical).
Single-source omission: alternative_claims is None, never [].
JSONB decode discipline: claim leaves are decoded before winner selection (the asyncpg pool registers no jsonb codec, so raw rows arrive as JSON text) — so a placeholder like "unknown" can never beat a real "active" on an encoded-length tie-break, and a numeric registration number never coerces to int.

To change policy without code, see Add a risk matrix and the ontology's survivorship config.

Why claims​

The survivorship decision​

Strategies​

Trust weights​

Protected fields​

Deal-breaker consolidation​

Every decision is a mutation​

Deep dive: how a winner is chosen​

The pairwise decision​

Trust lookup​

Strategy implementations​

Claim-plus-rank (Phase 110) and the read path​

Protected fields, precisely​

Invariants​