Plugin architecture
Data providers are integrated as plugins — self-contained directories under plugins/, each
declaring its metadata, credentials schema, capabilities, and ontology mapping. The plugin system
is declarative: adding a provider is mostly authoring YAML plus a thin client and mapper, not
changing the core. See ADR-009.
Anatomy of a plugin
| File | Purpose |
|---|---|
plugin.yaml | Identity, version, execution mode, connection, credentials schema, capabilities |
mapping_spec.yaml | Declarative field mapping to ontology entity types/attributes, with confidence |
client.py | Transport — HTTP client (sync) or crew entry (async) |
mapper.py | Transforms a raw provider response into ontology entities + claims |
transforms.py | Field-level normalisation/coercion |
tests/ | Plugin-specific tests |
The bundled plugins
| Plugin | Mode | Source | Notes |
|---|---|---|---|
_example | sync | template | Reference implementation for new plugins |
kvk | sync | Dutch Chamber of Commerce | Registry data for NL companies |
northdata | sync | NorthData | Financials, sheets, events, ownership |
osint | async | OSINT crews | Seven modules orchestrated via Temporal |
plugin.yaml
# Representative — see plugins/kvk/plugin.yaml
plugin_schema_version: "1"
plugin: "kvk"
version: "1.0.0"
display_name: "Dutch Chamber of Commerce"
provider_type: "sync" # or "investigation" (async)
allow_platform_fallback: false # tenant must bring its own credentials
connection:
base_url: "https://api.kvk.nl/v1" # async plugins use an async:// sentinel
rate_limits: { requests_per_minute: 100 }
timeout: 30
credentials:
schema: # JSON Schema; drives the Studio credentials form
type: object
required: ["api_key"]
properties:
api_key: { type: string, title: "KVK API Key", format: "password" }
capabilities:
jurisdictions: ["NL"]
entity_types: ["LegalEntity", "Person", "Address"]
execution:
mode: "sync"
The PluginLoader (src/integrations/plugin_loader.py) parses these into typed PluginSpec,
ExecutionSpec, CredentialsSpec, and CapabilitiesSpec models. A mis-authored plugin raises
PluginValidationError, which the API surfaces as HTTP 503 — the platform fails closed rather
than silently ignoring a broken provider.
mapping_spec.yaml
# Representative — see plugins/kvk/mapping_spec.yaml
version: "1.0"
plugin: "kvk"
target_schema: "ontology_schema_v3_5"
provider_id: "kvk"
entity_mappings:
- source_type: "company"
target_type: "LegalEntity"
confidence: 0.95
field_mappings:
- { source_field: "name", target_field: "legal_name", required: true }
- { source_field: "kvk_number", target_field: "registration_number", required: true }
- { source_field: "country_code", target_field: "country_code", confidence: 0.95 }
relationship_mappings:
- { source_rel: "director_of", target_rel: "Directorship",
source_entity: "person", target_entity: "company" }
The confidence values here feed the survivorship weights. The
target_schema is validated against the active ontology at boot by the TranslationRegistry —
see Ingestion pipelines.
How a plugin is selected at runtime
Credentials are stored encrypted per tenant (data_provider_credentials, AES-GCM via
EnvKeyEncryptor) and decrypted only when a client is instantiated. See
ADR-024 and
Security & multi-tenancy.
Source contracts
In parallel to mapping specs, source contracts (src/source_contracts/) declare which ontology
fields each provider is allowed to emit. This bounds what a provider can assert and supports the
multi-source disagreement enforcement of
ADR-017. See
API → Source contracts.
Deep dive: loader, registry, and credential resolution
This reflects src/integrations/ — plugin_loader.py, mapping_spec.py, translation_registry.py,
provider_router.py.
Loader validation is aggregated and fail-loud
PluginLoader.load() doesn't stop at the first problem — it accumulates missing-file,
YAML-parse, Pydantic-field, credentials-schema, and mapping-spec errors and raises them together,
each formatted as path: field: message. YAML is parsed with safe_load only. The
credentials.schema is itself meta-validated as JSON Schema Draft-7 at load time.
TranslationRegistry build (Phase 114)
At boot, the registry projects the union of all plugin mapping specs against the active ontology:
The result is an immutable (entity_type, flat_key) → "Entity.field" index. A target_schema that
isn't an exact match for the active family, or any field that doesn't exist on the entity type,
fails the boot — drift is impossible to ship. The registry is installed as a singleton from both
the FastAPI lifespan and the Temporal worker startup.
Credential resolution — the five-branch chain
ProviderRouter._resolve_credentials() is the security-critical path; it never reads env vars from a
plugin:
Branch (a) emits a sampled error log (once per provider per 60s) because platform fallback is a
security-relevant event. Decryption always returns a fresh dict, never the underlying reference.
Multi-provider trust merge
When several providers answer for the same company, fetch_company_complete() runs them in three
tiers — primary (one), supplementary (parallel asyncio.gather), fallback (sequential,
only if primary failed and nothing collected) — then merge_provider_responses() keeps the
highest-trust value per field and records a conflict when two providers within a 0.02 trust
delta disagree.
To add a provider, see Add a provider.