Data model & persistence

Atlas persists data in two complementary stores: PostgreSQL is the system of record for the canonical ontology and all operational data; Neo4j holds a synced property graph optimised for relationship traversal. MinIO stores binary report documents and Redis backs caching.

Two stores, one truth

	PostgreSQL	Neo4j
Role	System of record	Read projection
Holds	Entities, attributes, claims, relationships, mutations, and all operational tables	Entities & relationships as a property graph
Optimised for	Tenant-scoped CRUD, transactions, RLS	Multi-hop traversal (ownership, shared addresses, common connections)
Written by	API + Temporal activities	`Neo4jSyncService` from SQL changes

The graph is always downstream of PostgreSQL. Writes land in PostgreSQL first, then a sync activity projects them into Neo4j. A parity service audits divergence. See Graph sync.

Core relational tables

The ontology core models entities and their attributes as claims so that multiple providers can assert different values for the same field. The illustrative shape:

-- Representative structure — see src/database/entity_repository.py and migrations/
entities       (id, tenant_id, entity_type, name, properties JSONB, created_at, updated_at)
relationships  (id, tenant_id, source_id, target_id, rel_type, properties JSONB, …)
attributes     (id, entity_id, attr_name, attr_value, provider_id, confidence, source_url, …)
claims         (id, entity_id, claim_type, claim_data JSONB, supporting_evidence JSONB, …)
mutations      (id, entity_id, provider_id, mutation_type, before_value JSONB, after_value JSONB, …)
conflicts      (id, entity_id, mutation_ids, status, resolved_at, resolution_strategy)

Every table that holds tenant data carries a tenant_id and is guarded by an RLS policy. See Security & multi-tenancy.

Operational tables

Beyond the ontology core, PostgreSQL holds the operational domain:

Domain	Tables (representative)
Investigations	`investigations`, `investigation_logs`, `provider_runs`, `evidence`
Reports	`reports`
Companies	`companies`, portfolio status history
Risk	`risk_scores`, `risk_matrix_schemas`, `risk_matrix_assignments`, `risk_matrix_evaluations`
Tenancy & config	`tenants`, `data_provider_credentials`, `agent_configurations`, `agent_prompts`, `ontology_schema_versions`, `reference_datasets`

Repositories

Data access is mediated by ~21 repository classes under src/database/ rather than ad-hoc SQL in handlers. This keeps tenant-scoping, RLS, and query patterns in one place.

Repository	Responsibility
`OntologyPersistenceRepository`	Entity & relationship persistence, merges (`entity_repository.py`, large)
`CompanyRepository`	Company registry and subject-entity lookup
`InvestigationRepository` / `ReportRepository`	Investigation & report lifecycle
`EvidenceRepository`	Finding evidence artefacts
`ProviderRunRepository`	Provider execution logs
`RiskRepository`	Risk scores & indicators
`MutationRepository` / conflict repo	Provenance & conflict tracking (`src/mutation_queue/`)
`TenantRepository`, `SettingsRepository`, `SchemaRepository`, `SegmentsRepository`, `DataProviderRepository`	Tenancy, settings, schema versions, segments, providers

Connection pool & RLS

A single DatabasePool (asyncpg) is initialised at startup. The application connects through a restricted atlas_app role so that RLS is enforced rather than bypassed by a superuser. The current tenant is set on the session per request, and policies of the form tenant_id = current_setting('app.current_tenant') filter every query. See Security & multi-tenancy for the full model and the relevant ADR.

Migrations

Schema is versioned with Flyway (the migrations/ directory, 130+ versioned scripts) and applied automatically at pod/container startup before the API serves traffic. See Operations → Deployment.

Deep dive: pool, RLS binding, and JSONB discipline

This reflects src/database/ — connection.py, tenant_session.py, the repositories.

Two connection patterns

Tenant-owned tables are always read through a session that sets app.current_tenant_id; truly platform-shared tables (provider registry, MCP servers, coverage) use a raw pool connection. The distinction is deliberate — RLS-protected tables must carry a tenant context or they return nothing.

Repositories own the SQL

Handlers never write ad-hoc SQL. ~21 repositories under src/database/ encapsulate query patterns, so tenant scoping, JSONB handling, and RLS are applied consistently. New endpoints inject a repository via Depends (see Backend).

JSONB decode discipline

A subtle but load-bearing rule: top-level JSONB columns (entity_data, provenance) are decoded at the top level only — individual leaves are not re-parsed — so a numeric-looking registration number like "007" never silently coerces to an int. The claims read path (Claims & survivorship) depends on this to preserve exact source values.

asyncpg parameter binding

Per project convention, JSON parameters use CAST(:param AS jsonb) rather than a ::jsonb cast on a bind parameter — asyncpg binds positionally and the explicit CAST keeps parameterization correct.

To add a tenant-scoped table and endpoint, see Add a frontend feature → backend.

Two stores, one truth​

Core relational tables​

Operational tables​

Repositories​

Connection pool & RLS​

Migrations​

Deep dive: pool, RLS binding, and JSONB discipline​

Two connection patterns​

Repositories own the SQL​

JSONB decode discipline​

asyncpg parameter binding​