Skip to main content

Plugin architecture

Data providers are integrated as plugins — self-contained directories under plugins/, each declaring its metadata, credentials schema, capabilities, and ontology mapping. The plugin system is declarative: adding a provider is mostly authoring YAML plus a thin client and mapper, not changing the core. See ADR-009.

Anatomy of a plugin

FilePurpose
plugin.yamlIdentity, version, execution mode, connection, credentials schema, capabilities
mapping_spec.yamlDeclarative field mapping to ontology entity types/attributes, with confidence
client.pyTransport — HTTP client (sync) or crew entry (async)
mapper.pyTransforms a raw provider response into ontology entities + claims
transforms.pyField-level normalisation/coercion
tests/Plugin-specific tests

The bundled plugins

PluginModeSourceNotes
_examplesynctemplateReference implementation for new plugins
kvksyncDutch Chamber of CommerceRegistry data for NL companies
northdatasyncNorthDataFinancials, sheets, events, ownership
osintasyncOSINT crewsSeven modules orchestrated via Temporal

plugin.yaml

# Representative — see plugins/kvk/plugin.yaml
plugin_schema_version: "1"
plugin: "kvk"
version: "1.0.0"
display_name: "Dutch Chamber of Commerce"
provider_type: "sync" # or "investigation" (async)
allow_platform_fallback: false # tenant must bring its own credentials

connection:
base_url: "https://api.kvk.nl/v1" # async plugins use an async:// sentinel
rate_limits: { requests_per_minute: 100 }
timeout: 30

credentials:
schema: # JSON Schema; drives the Studio credentials form
type: object
required: ["api_key"]
properties:
api_key: { type: string, title: "KVK API Key", format: "password" }

capabilities:
jurisdictions: ["NL"]
entity_types: ["LegalEntity", "Person", "Address"]

execution:
mode: "sync"

The PluginLoader (src/integrations/plugin_loader.py) parses these into typed PluginSpec, ExecutionSpec, CredentialsSpec, and CapabilitiesSpec models. A mis-authored plugin raises PluginValidationError, which the API surfaces as HTTP 503 — the platform fails closed rather than silently ignoring a broken provider.

mapping_spec.yaml

# Representative — see plugins/kvk/mapping_spec.yaml
version: "1.0"
plugin: "kvk"
target_schema: "ontology_schema_v3_5"
provider_id: "kvk"

entity_mappings:
- source_type: "company"
target_type: "LegalEntity"
confidence: 0.95
field_mappings:
- { source_field: "name", target_field: "legal_name", required: true }
- { source_field: "kvk_number", target_field: "registration_number", required: true }
- { source_field: "country_code", target_field: "country_code", confidence: 0.95 }

relationship_mappings:
- { source_rel: "director_of", target_rel: "Directorship",
source_entity: "person", target_entity: "company" }

The confidence values here feed the survivorship weights. The target_schema is validated against the active ontology at boot by the TranslationRegistry — see Ingestion pipelines.

How a plugin is selected at runtime

Credentials are stored encrypted per tenant (data_provider_credentials, AES-GCM via EnvKeyEncryptor) and decrypted only when a client is instantiated. See ADR-024 and Security & multi-tenancy.

Source contracts

In parallel to mapping specs, source contracts (src/source_contracts/) declare which ontology fields each provider is allowed to emit. This bounds what a provider can assert and supports the multi-source disagreement enforcement of ADR-017. See API → Source contracts.


Deep dive: loader, registry, and credential resolution

This reflects src/integrations/plugin_loader.py, mapping_spec.py, translation_registry.py, provider_router.py.

Loader validation is aggregated and fail-loud

PluginLoader.load() doesn't stop at the first problem — it accumulates missing-file, YAML-parse, Pydantic-field, credentials-schema, and mapping-spec errors and raises them together, each formatted as path: field: message. YAML is parsed with safe_load only. The credentials.schema is itself meta-validated as JSON Schema Draft-7 at load time.

TranslationRegistry build (Phase 114)

At boot, the registry projects the union of all plugin mapping specs against the active ontology:

The result is an immutable (entity_type, flat_key) → "Entity.field" index. A target_schema that isn't an exact match for the active family, or any field that doesn't exist on the entity type, fails the boot — drift is impossible to ship. The registry is installed as a singleton from both the FastAPI lifespan and the Temporal worker startup.

Credential resolution — the five-branch chain

ProviderRouter._resolve_credentials() is the security-critical path; it never reads env vars from a plugin:

Branch (a) emits a sampled error log (once per provider per 60s) because platform fallback is a security-relevant event. Decryption always returns a fresh dict, never the underlying reference.

Multi-provider trust merge

When several providers answer for the same company, fetch_company_complete() runs them in three tiers — primary (one), supplementary (parallel asyncio.gather), fallback (sequential, only if primary failed and nothing collected) — then merge_provider_responses() keeps the highest-trust value per field and records a conflict when two providers within a 0.02 trust delta disagree.

To add a provider, see Add a provider.