Data model & persistence
Atlas persists data in two complementary stores: PostgreSQL is the system of record for the canonical ontology and all operational data; Neo4j holds a synced property graph optimised for relationship traversal. MinIO stores binary report documents and Redis backs caching.
Two stores, one truth
| PostgreSQL | Neo4j | |
|---|---|---|
| Role | System of record | Read projection |
| Holds | Entities, attributes, claims, relationships, mutations, and all operational tables | Entities & relationships as a property graph |
| Optimised for | Tenant-scoped CRUD, transactions, RLS | Multi-hop traversal (ownership, shared addresses, common connections) |
| Written by | API + Temporal activities | Neo4jSyncService from SQL changes |
The graph is always downstream of PostgreSQL. Writes land in PostgreSQL first, then a sync activity projects them into Neo4j. A parity service audits divergence. See Graph sync.
Core relational tables
The ontology core models entities and their attributes as claims so that multiple providers can assert different values for the same field. The illustrative shape:
-- Representative structure — see src/database/entity_repository.py and migrations/
entities (id, tenant_id, entity_type, name, properties JSONB, created_at, updated_at)
relationships (id, tenant_id, source_id, target_id, rel_type, properties JSONB, …)
attributes (id, entity_id, attr_name, attr_value, provider_id, confidence, source_url, …)
claims (id, entity_id, claim_type, claim_data JSONB, supporting_evidence JSONB, …)
mutations (id, entity_id, provider_id, mutation_type, before_value JSONB, after_value JSONB, …)
conflicts (id, entity_id, mutation_ids, status, resolved_at, resolution_strategy)
Every table that holds tenant data carries a tenant_id and is guarded by an RLS policy. See
Security & multi-tenancy.
Operational tables
Beyond the ontology core, PostgreSQL holds the operational domain:
| Domain | Tables (representative) |
|---|---|
| Investigations | investigations, investigation_logs, provider_runs, evidence |
| Reports | reports |
| Companies | companies, portfolio status history |
| Risk | risk_scores, risk_matrix_schemas, risk_matrix_assignments, risk_matrix_evaluations |
| Tenancy & config | tenants, data_provider_credentials, agent_configurations, agent_prompts, ontology_schema_versions, reference_datasets |
Repositories
Data access is mediated by ~21 repository classes under src/database/ rather than ad-hoc SQL in
handlers. This keeps tenant-scoping, RLS, and query patterns in one place.
| Repository | Responsibility |
|---|---|
OntologyPersistenceRepository | Entity & relationship persistence, merges (entity_repository.py, large) |
CompanyRepository | Company registry and subject-entity lookup |
InvestigationRepository / ReportRepository | Investigation & report lifecycle |
EvidenceRepository | Finding evidence artefacts |
ProviderRunRepository | Provider execution logs |
RiskRepository | Risk scores & indicators |
MutationRepository / conflict repo | Provenance & conflict tracking (src/mutation_queue/) |
TenantRepository, SettingsRepository, SchemaRepository, SegmentsRepository, DataProviderRepository | Tenancy, settings, schema versions, segments, providers |
Connection pool & RLS
A single DatabasePool (asyncpg) is initialised at startup. The application connects through a
restricted atlas_app role so that RLS is enforced rather than bypassed by a superuser. The
current tenant is set on the session per request, and policies of the form
tenant_id = current_setting('app.current_tenant') filter every query. See
Security & multi-tenancy for the full model and the relevant ADR.
Migrations
Schema is versioned with Flyway (the migrations/ directory, 130+ versioned scripts) and
applied automatically at pod/container startup before the API serves traffic. See
Operations → Deployment.
Deep dive: pool, RLS binding, and JSONB discipline
This reflects src/database/ — connection.py, tenant_session.py, the repositories.
Two connection patterns
Tenant-owned tables are always read through a session that sets app.current_tenant_id; truly
platform-shared tables (provider registry, MCP servers, coverage) use a raw pool connection. The
distinction is deliberate — RLS-protected tables must carry a tenant context or they return
nothing.
Repositories own the SQL
Handlers never write ad-hoc SQL. ~21 repositories under src/database/ encapsulate query patterns,
so tenant scoping, JSONB handling, and RLS are applied consistently. New endpoints inject a
repository via Depends (see Backend).
JSONB decode discipline
A subtle but load-bearing rule: top-level JSONB columns (entity_data, provenance) are decoded at
the top level only — individual leaves are not re-parsed — so a numeric-looking registration
number like "007" never silently coerces to an int. The claims read path
(Claims & survivorship)
depends on this to preserve exact source values.
asyncpg parameter binding
Per project convention, JSON parameters use CAST(:param AS jsonb) rather than a ::jsonb cast on a
bind parameter — asyncpg binds positionally and the explicit CAST keeps parameterization correct.
To add a tenant-scoped table and endpoint, see Add a frontend feature → backend.