Skip to main content

The ontology (v3.5)

The ontology is the heart of Atlas. It defines the entity types, their attributes, the relationships between them, the risk categories, and — crucially — the survivorship rules that decide which provider's data wins when sources disagree. It is a versioned YAML document (schemas/ontology/ontology_v3.5.yaml, currently v3.5.2) loaded into the database at startup.

Versioning

The file name stays ontology_v3.5.yaml across minor bumps (v3.5 → v3.5.1 → v3.5.2); history lives in git. A companion migration (migrations/V130__ontology_schema_v3_5_2.sql) publishes the YAML byte-equal into the ontology_schema_versions table, and a CI drift check enforces parity. The active schema is loaded via SchemaCache.initialize(pool) and the app fails to boot if it cannot load.

Entity types

The eight entity types are:

Entity typePurpose
LegalEntityCompanies and organisations — the primary subject
PersonIndividuals — directors, shareholders, UBOs
AddressRegistered and operating addresses
DocumentFilings and source documents
DomainWeb domains owned by an entity
SanctionsMatchA match against a sanctions list
PEPExposurePolitically-exposed-person exposure
AdverseMediaAdverse media mentions (with sentiment)

Relationship types

RelationshipFrom → ToMeaning
DirectorshipPerson → LegalEntityDirector / officer role
OwnershipEntity → LegalEntityShareholding / beneficial ownership
RegisteredAtLegalEntity → AddressRegistered/operating address
DocumentsEntityDocument → EntityA document evidences an entity
OwnsDomainLegalEntity → DomainDomain ownership
MatchedToPerson → SanctionsMatchScreening match
MentionedInEntity → AdverseMediaAdverse-media mention

Risk categories

Findings are classified into eleven categories, used by both module rules and the risk matrix:

sanctions · pep · adverse_media · ownership · governance · corporate_status · jurisdiction · digital_footprint · regulatory · data_quality · secrecy_jurisdiction

Survivorship rules

When two sources disagree about an attribute, survivorship decides the winner. The ontology declares four strategies, a per-attribute default, and trust weights at two levels.

  • Strategies: most_trusted, most_recent, most_complete, combine. Each attribute in the schema declares which strategy applies (for example legal_name: most_trusted, registration_numbers: combine).
  • Module trust ranks the OSINT modules by authority — cir (company registry) is most authoritative at 10, down to dfwo (digital footprint) at 3.
  • Provider trust assigns per-field confidence to external providers. For example NorthData's registration_number is trusted at 0.99, while OSINT's _default is 0.70 and KVK's Dutch postal_code is 0.95.
  • Protected fields can never be silently overwritten by a provider. On Person these include is_pep, pep_position, is_sanctioned, and sanctions_matches; they originate from the screening crews (SPEPWS, AMLRR), not from registries.

See Claims & survivorship for how these rules execute at merge time, ADR-010 for ontology lifecycle, and ADR-020 for field provenance.

Crew output contract

The ontology also defines the JSON contract every OSINT crew must return — entities, relationships, risk_indicators, and a summary with a data_quality_score. This is what binds the LLM crews to the ontology: their output is validated and mapped into the entity types above before resolution.

For the field-by-field reference, see Reference → Ontology schema.