Skip to main content

Add a data provider (plugin)

A data provider is integrated as a plugin — a directory under plugins/ with two YAML contracts and, for sync providers, a thin client class. This guide walks the full path. Read Plugins and Ingestion pipelines first.

1. Create the plugin directory

mkdir plugins/<slug>
# copy plugins/kvk as a reference template

A plugin needs at minimum plugin.yaml and mapping_spec.yaml; the TranslationRegistry only discovers directories that contain both (and it skips names starting with _, e.g. _example).

2. Author plugin.yaml

The schema is formally versioned (plugin_schema_version: "1") and validated by PluginLoader against Pydantic models, with the credentials.schema meta-validated as JSON Schema Draft-7.

plugin_schema_version: "1"
plugin: "<slug>" # must equal the directory name
version: "1.0.0"
display_name: "My Registry"
description: "Company registry for X"
provider_type: "company_registry"
allow_platform_fallback: false # secure-by-default; tenants bring their own creds

connection:
base_url: "https://api.example.com" # async plugins use an async:// sentinel
rate_limits:
requests_per_minute: 60

credentials:
schema: # JSON Schema Draft-7; drives the Studio credentials form
$schema: "http://json-schema.org/draft-07/schema#"
type: object
required: ["api_key"]
properties:
api_key: { type: string, minLength: 8, format: "password" }

capabilities:
jurisdictions: ["RO"]
entity_types: ["LegalEntity", "Address"]
endpoints: ["search", "details"]

execution:
mode: "sync" # "sync" (HTTP) or "async" (Temporal)
# workflow_name: "MyWorkflow" # REQUIRED when mode: async
Fail-loud validation

PluginLoader aggregates all errors — a missing file, a bad field, and an invalid credentials schema surface together on first load, formatted as path: field: message. A mis-authored plugin raises PluginValidationErrorHTTP 503.

3. Author mapping_spec.yaml

This maps the provider's fields to the ontology. The target_schema must match the active ontology family exactly (ontology_schema_v3_5) or the boot fails.

plugin: "<slug>"
version: "1.0.0"
target_schema: "ontology_schema_v3_5"

entity_mappings:
LegalEntity:
source: "$.company" # JSONPath root
identity:
registration_number: "$.regNo"
jurisdiction: "'RO'"
attributes:
legal_name: "$.name"
entity_type:
source: "$.form"
transform: "form_to_entity_type"

relationship_mappings:
RegisteredAt:
source_entity: "LegalEntity"
target_entity: "Address"
attributes: { address_type: "'registered'" }

transforms:
form_to_entity_type:
"SRL": "Private Limited Company"
_default: "Other"

The per-field confidence and the survivorship weights for your provider live in the ontology's provider_trust — add a sub-block for your provider there so survivorship knows how much to trust it.

4. Implement the provider (sync)

For mode: sync, implement a DataProvider subclass (see src/integrations/base.py) and register it:

# plugins/<slug>/client.py — representative
from src.integrations.base import DataProvider, ProviderResponse, ProviderEntity
from src.integrations.provider_router import register_provider

@register_provider("<slug>")
class MyProvider(DataProvider):
name = "<slug>"
supported_countries = ["RO"]
trust_level = 0.9

async def fetch_company_complete(self, registration_number, country_code, **options):
raw = await self._get(f"/details?reg={registration_number}")
return ProviderResponse(
raw_response=raw,
entities=[ProviderEntity(
entity_type="LegalEntity",
external_id=raw["id"],
data={"legal_name": raw["name"], "registration_number": raw["regNo"]},
confidence=0.95, source_provider="<slug>",
)],
relationships=[], metadata={},
)

For mode: async, you don't write a sync client — you declare a workflow_name and implement the workflow/activities (see Add an OSINT module for the agentic pattern).

5. Credentials — never read env vars directly

Credentials are resolved by the ProviderRouter through a five-branch chain and decrypted from per-tenant encrypted storage. Your client receives config.credentials already resolved — never reach for os.environ.

6. Test against the ontology

  • Unit tests in tests/plugins/<slug>/.
  • The CI harness validates your mapping_spec.yaml against the active ontology — every mapped field must exist on the target entity type, or the TranslationRegistry build fails.

What gets validated at boot

If anything is wrong — wrong target_schema, a field that isn't on the entity type, a malformed credentials schema — the platform refuses to start, with every problem reported at once. That's the safety net: a broken provider can't reach production silently.

Next: Add an OSINT module for async/agentic providers.