Add a data provider (plugin)
A data provider is integrated as a plugin — a directory under plugins/ with two YAML
contracts and, for sync providers, a thin client class. This guide walks the full path. Read
Plugins and Ingestion pipelines first.
1. Create the plugin directory
mkdir plugins/<slug>
# copy plugins/kvk as a reference template
A plugin needs at minimum plugin.yaml and mapping_spec.yaml; the TranslationRegistry only
discovers directories that contain both (and it skips names starting with _, e.g. _example).
2. Author plugin.yaml
The schema is formally versioned (plugin_schema_version: "1") and validated by PluginLoader
against Pydantic models, with the credentials.schema meta-validated as JSON Schema Draft-7.
plugin_schema_version: "1"
plugin: "<slug>" # must equal the directory name
version: "1.0.0"
display_name: "My Registry"
description: "Company registry for X"
provider_type: "company_registry"
allow_platform_fallback: false # secure-by-default; tenants bring their own creds
connection:
base_url: "https://api.example.com" # async plugins use an async:// sentinel
rate_limits:
requests_per_minute: 60
credentials:
schema: # JSON Schema Draft-7; drives the Studio credentials form
$schema: "http://json-schema.org/draft-07/schema#"
type: object
required: ["api_key"]
properties:
api_key: { type: string, minLength: 8, format: "password" }
capabilities:
jurisdictions: ["RO"]
entity_types: ["LegalEntity", "Address"]
endpoints: ["search", "details"]
execution:
mode: "sync" # "sync" (HTTP) or "async" (Temporal)
# workflow_name: "MyWorkflow" # REQUIRED when mode: async
PluginLoader aggregates all errors — a missing file, a bad field, and an invalid credentials
schema surface together on first load, formatted as path: field: message. A mis-authored plugin
raises PluginValidationError → HTTP 503.
3. Author mapping_spec.yaml
This maps the provider's fields to the ontology. The target_schema must
match the active ontology family exactly (ontology_schema_v3_5) or the boot fails.
plugin: "<slug>"
version: "1.0.0"
target_schema: "ontology_schema_v3_5"
entity_mappings:
LegalEntity:
source: "$.company" # JSONPath root
identity:
registration_number: "$.regNo"
jurisdiction: "'RO'"
attributes:
legal_name: "$.name"
entity_type:
source: "$.form"
transform: "form_to_entity_type"
relationship_mappings:
RegisteredAt:
source_entity: "LegalEntity"
target_entity: "Address"
attributes: { address_type: "'registered'" }
transforms:
form_to_entity_type:
"SRL": "Private Limited Company"
_default: "Other"
The per-field confidence and the survivorship weights for your provider live in the
ontology's provider_trust — add a sub-block for your
provider there so survivorship knows how much to trust it.
4. Implement the provider (sync)
For mode: sync, implement a DataProvider subclass (see src/integrations/base.py) and register
it:
# plugins/<slug>/client.py — representative
from src.integrations.base import DataProvider, ProviderResponse, ProviderEntity
from src.integrations.provider_router import register_provider
@register_provider("<slug>")
class MyProvider(DataProvider):
name = "<slug>"
supported_countries = ["RO"]
trust_level = 0.9
async def fetch_company_complete(self, registration_number, country_code, **options):
raw = await self._get(f"/details?reg={registration_number}")
return ProviderResponse(
raw_response=raw,
entities=[ProviderEntity(
entity_type="LegalEntity",
external_id=raw["id"],
data={"legal_name": raw["name"], "registration_number": raw["regNo"]},
confidence=0.95, source_provider="<slug>",
)],
relationships=[], metadata={},
)
For mode: async, you don't write a sync client — you declare a workflow_name and implement the
workflow/activities (see Add an OSINT module for the agentic pattern).
5. Credentials — never read env vars directly
Credentials are resolved by the ProviderRouter through a five-branch chain and decrypted from
per-tenant encrypted storage. Your client receives config.credentials already resolved — never
reach for os.environ.
6. Test against the ontology
- Unit tests in
tests/plugins/<slug>/. - The CI harness validates your
mapping_spec.yamlagainst the active ontology — every mapped field must exist on the target entity type, or theTranslationRegistrybuild fails.
What gets validated at boot
If anything is wrong — wrong target_schema, a field that isn't on the entity type, a malformed
credentials schema — the platform refuses to start, with every problem reported at once. That's the
safety net: a broken provider can't reach production silently.
Next: Add an OSINT module for async/agentic providers.