ADR-022: Multi-Tenancy Architecture
| Field | Value |
|---|---|
| Status | Draft |
| Date | 2026-04-08 |
| Relates | ADR-008, ADR-008a, ADR-009, ADR-019, ADR-020, ADR-021 |
1. Problem Statement
Atlas currently operates as a single-tenant application. While the auth layer correctly extracts a tenant_id from the Keycloak JWT issuer claim (realm name), the database schema, API layer, Neo4j graph, file storage, and workflow engine do not enforce tenant isolation. An authenticated user can access data belonging to any tenant in the system.
For Atlas to serve multiple compliance teams (each representing a separate obliged entity under AMLR), every layer must enforce strict tenant boundaries. A compliance officer at Organization A must never see, query, or influence data belonging to Organization B — this is both a security requirement and a regulatory one, since AML data includes personal information (PEP status, beneficial ownership, sanctions matches) protected under GDPR.
2. Decision: One Keycloak Realm Per Tenant, Row-Level Tenant Isolation
Each organization gets its own Keycloak realm, providing full isolation of users, roles, sessions, and identity provider configuration. The realm name maps directly to the tenant identifier used throughout the platform.
Data isolation uses row-level tenant scoping: every tenant-owned table includes a tenant_id column, and every query filters by it. This is simpler to implement and operate than schema-per-tenant or database-per-tenant, and scales to hundreds of tenants without operational overhead.
3. Keycloak Realm Architecture
One Realm Per Organization
Each tenant gets a dedicated Keycloak realm with:
- Independent user directory (no shared users between tenants)
- Tenant-specific identity provider configuration (Azure AD, Google Workspace, SAML, username/password)
- Tenant-specific MFA policies and password complexity rules
- Tenant-specific roles (analyst, compliance_officer, admin) that map to Atlas permissions
- Session isolation (no cross-tenant session leakage)
Realm Provisioning
New realms are created via the Keycloak Admin REST API during tenant onboarding:
- Create realm with naming convention:
atlas-{org_slug}(e.g.,atlas-acme) - Create the
atlas-frontendclient with appropriate redirect URIs - Configure default roles:
analyst,compliance_officer,admin - Set password policies per tenant's requirements
- Optionally configure external IdP (Azure AD, Google Workspace)
Tenant Discovery at Login
Subdomain-based routing: acme.atlas.app resolves to realm atlas-acme. The frontend reads the subdomain, constructs the Keycloak OIDC discovery URL, and initiates the auth flow against the correct realm. No organization selector screen needed.
Fallback for custom domains: a tenant configuration table maps custom domains to realm names.
JWT to Tenant Mapping
The existing extract_tenant_from_issuer() in auth.py already handles this:
JWT issuer: https://auth.atlas.app/realms/atlas-acme
Extracted tenant_id: atlas-acme
The AuthContext.tenant_id is populated on every request. What's missing is enforcement downstream.
4. Database Tenant Isolation
Strategy: Row-Level Isolation with tenant_id Column
Every tenant-owned table gets a tenant_id VARCHAR(100) NOT NULL column. Queries include WHERE tenant_id = $tenant_id in all operations. PostgreSQL Row-Level Security (RLS) is added as a defense-in-depth layer.
Tenant ID Type
VARCHAR(100) rather than UUID because the tenant ID is the Keycloak realm name (a human-readable slug), not a generated UUID. This makes debugging, logging, and manual queries straightforward.
Tables Requiring tenant_id
Tier 1 — Core Data (must ship together)
| Table | Current State | Notes |
|---|---|---|
companies | No tenant_id | Portfolio is per-tenant |
investigations | No tenant_id | Investigation data is per-tenant |
ontology_entities | No tenant_id | Entity graph is per-tenant |
ontology_relationships | No tenant_id | Relationship graph is per-tenant |
ontology_findings | No tenant_id | Findings are per-tenant |
ontology_risk_indicators | No tenant_id | Risk indicators are per-tenant |
risk_matrix_schemas | No tenant_id | Each tenant has their own matrices |
risk_matrix_evaluations | No tenant_id | Evaluation results are per-tenant |
company_matrix_assignments | No tenant_id | Assignments are per-tenant |
investigation_evidence | No tenant_id | Evidence is per-tenant |
workflow_schemas | No tenant_id | Workflow definitions are per-tenant |
workflow_executions | No tenant_id | Execution state is per-tenant |
workflow_tasks | No tenant_id | Tasks are per-tenant |
portal_drafts | No tenant_id | Portal content is per-tenant |
Tier 2 — Supporting Data
| Table | Current State | Notes |
|---|---|---|
data_providers | No tenant_id | Provider credentials are per-tenant |
data_provider_responses | No tenant_id | Response data is per-tenant |
ontology_mutations | No tenant_id | Mutation queue is per-tenant |
conflict_records | Has tenant_id | Already correct |
conflict_review_tasks | Has tenant_id | Already correct |
evaluation_runs | No tenant_id | Test runs are per-tenant |
events | No tenant_id | Event stream is per-tenant |
entity_projections | No tenant_id | Projections are per-tenant |
entity_cache | No tenant_id | Cache is per-tenant |
Shared / System-Wide (no tenant_id needed)
| Table | Reason |
|---|---|
reference_data_types | System-wide type definitions (is_system=true for built-in) |
reference_datasets | Already has tenant_id with NULL = system default |
ontology_schema_versions | Schema definitions are shared (data is tenant-scoped) |
app_settings | System configuration |
prompt_templates | System-wide AI prompts |
segments | Shared segment definitions |
Migration Strategy
- V103: Add
tenant_id VARCHAR(100) NULLto all Tier 1 tables - V104: Backfill existing data with a default tenant (
'default'or the single existing realm name) - V105: Add
NOT NULLconstraint after backfill - V106: Add composite indexes on
(tenant_id, id)and(tenant_id, status)for common query patterns - V107: Enable PostgreSQL Row-Level Security as defense-in-depth
Row-Level Security (Defense in Depth)
RLS policies ensure that even if application code misses a WHERE clause, the database rejects cross-tenant access:
ALTER TABLE companies ENABLE ROW LEVEL SECURITY;
CREATE POLICY companies_tenant_isolation ON companies
USING (tenant_id = current_setting('app.current_tenant_id'));
-- Set per-connection at the start of each request
SET app.current_tenant_id = 'atlas-acme';
The API middleware sets app.current_tenant_id on each database connection before executing queries. RLS acts as a safety net — the application queries should still include explicit WHERE tenant_id = $1 for clarity and index usage, but RLS catches any that miss it.
5. API Layer Enforcement
Tenant Middleware
Every request already has AuthContext.tenant_id from the JWT. A new middleware layer (or extension of the existing auth middleware) must:
- Extract
tenant_idfromAuthContext - Set
app.current_tenant_idon the database connection (for RLS) - Pass
tenant_idto all repository calls
Repository Pattern
All repository methods gain a tenant_id parameter. The mutation_queue repository is the reference implementation:
# CORRECT pattern (from mutation_queue/repository.py)
async def list_open(self, tenant_id: str) -> list[dict]:
return await self._pool.fetch(
"SELECT * FROM conflict_records WHERE tenant_id = $1 AND status = 'open'",
tenant_id,
)
Every repository must follow this pattern. List operations filter by tenant. Get-by-ID operations verify tenant ownership. Create operations set tenant_id. Update and delete operations include tenant_id in the WHERE clause.
Cross-Tenant Access Prevention
GET /investigations/{id}must verifyinvestigation.tenant_id == auth.tenant_idGET /companiesmust return onlyWHERE tenant_id = auth.tenant_id- All search and list endpoints must include tenant filtering
- Any endpoint that accepts an ID must verify tenant ownership before returning data
6. Neo4j Graph Tenant Isolation
Strategy: tenant_id Property + Label-Based Partitioning
Every node in Neo4j gets a tenant_id property. Additionally, tenant-specific labels are applied for performance:
// Node creation
CREATE (e:LegalEntity:Tenant_atlas_acme {
id: $id,
tenant_id: 'atlas-acme',
name: 'Acme Corp',
...
})
// Tenant-scoped query
MATCH (e:LegalEntity:Tenant_atlas_acme {id: $id})
RETURN e
The dual approach (property + label) provides:
- Property: Used in WHERE clauses for flexibility, stored in indexes
- Label: Used for partition-like performance — Neo4j can skip entire label groups during traversal. Index creation per tenant label ensures fast lookups.
CypherGenerator Changes
upsert_entity() gains a tenant_id parameter. All generated Cypher includes tenant scoping:
def upsert_entity(
entity_type: str,
entity_id: str,
entity_data: dict,
tenant_id: str, # NEW
company_id: str | None = None,
investigation_id: str | None = None,
) -> GeneratedQuery:
tenant_label = f"Tenant_{tenant_id.replace('-', '_')}"
return GeneratedQuery(
cypher=f"""
MERGE (e:{entity_type}:{tenant_label} {{id: $id}})
SET e += $props, e.tenant_id = $tenant_id
""",
params={"id": entity_id, "props": entity_data, "tenant_id": tenant_id},
)
neo4j_sync.py Changes
The sync process receives tenant_id from the source PostgreSQL row and includes it in all node/relationship creation. The sync query filters source rows by tenant.
Graph Query Isolation
All graph repository methods must include tenant scoping:
// Entity lookup
MATCH (e:LegalEntity {id: $id, tenant_id: $tenant_id})
RETURN e
// Relationship traversal — CRITICAL: both ends must be tenant-scoped
MATCH (a {tenant_id: $tenant_id})-[r]-(b {tenant_id: $tenant_id})
WHERE a.id = $entity_id
RETURN a, r, b
// Never traverse across tenant boundaries
7. File Storage Tenant Isolation
MinIO Object Key Prefixing
All file storage keys include the tenant as a prefix:
Current: evidence/{investigation_id}/{filename}
New: {tenant_id}/evidence/{investigation_id}/{filename}
This provides:
- Logical isolation in a shared bucket
- Easy per-tenant storage metrics (prefix-based size queries)
- Option to use separate buckets per tenant in the future (configuration change, not code change)
Presigned URL Validation
When generating presigned URLs for file access, the API must verify that the requesting user's tenant_id matches the file's tenant prefix. The object key itself encodes the tenant boundary.
8. Temporal Workflow Tenant Isolation
Workflow Execution Context
The Temporal workflow execution payload must include tenant_id:
@dataclass
class InvestigationContext:
investigation_id: str
company_id: str
tenant_id: str # NEW — propagated to all activities
...
Activity Tenant Scoping
Every Temporal activity receives tenant_id through the context and passes it to all repository and service calls. Activities must not query data outside the tenant boundary.
Task Queue Isolation
Option A: Shared task queue, tenant_id in payload (simpler, current approach extended)
Option B: Per-tenant task queues (investigation-queue-atlas-acme) for full isolation
Recommendation: Option A for now. Per-tenant queues add operational complexity (queue management, worker scaling per tenant) without a clear benefit until tenant count is high.
9. Shared vs Tenant-Scoped Resources
Shared (System-Wide)
| Resource | Rationale |
|---|---|
| Ontology schema definitions | Schema structure is platform-wide; data is tenant-scoped |
| Reference data types (is_system=true) | Built-in category types are shared |
| System default reference datasets (tenant_id IS NULL) | FATF lists, EU high-risk countries are global facts |
| Prompt templates | AI prompts are platform-managed |
| Application settings | System configuration |
Tenant-Scoped
| Resource | Rationale |
|---|---|
| Companies and investigations | Core business data |
| Ontology entity data | KYC/CDD data is confidential per tenant |
| Risk matrices and evaluations | Each tenant has their own risk methodology |
| Workflow schemas and executions | Each tenant has their own compliance processes |
| Evidence and findings | Investigation artifacts are per-tenant |
| Custom reference datasets (tenant_id IS NOT NULL) | Tenant-specific risk classifications |
| User-created category types (ADR-021) | Per-tenant risk category customization |
| Data provider configurations | Per-tenant API credentials and settings |
| Neo4j graph data | Entity relationship graph is per-tenant |
Tenant Override Pattern (Already Implemented)
The reference data resolver already supports tenant overrides: tenant-specific datasets take precedence over system defaults. This pattern extends naturally — a tenant can customize country risk lists while inheriting the system default for datasets they haven't overridden.
10. Implementation Phases
Phase 1: Database Foundation
- Add
tenant_idto all Tier 1 tables (nullable) - Backfill existing data with default tenant
- Add NOT NULL constraints
- Add composite indexes
- Enable RLS policies
Phase 2: API & Repository Enforcement
- Add
tenant_idparameter to all repository methods - Update all API routes to pass
tenant_idfrom AuthContext - Add tenant verification middleware (get-by-ID verifies ownership)
- Integration tests for cross-tenant access prevention
Phase 3: Neo4j Graph Isolation
- Add
tenant_idproperty to all nodes - Add tenant-specific labels
- Update CypherGenerator for tenant scoping
- Update neo4j_sync to include tenant context
- Update all graph queries
Phase 4: File Storage & Workflows
- Add tenant prefix to MinIO object keys
- Migrate existing files to tenant-prefixed paths
- Add
tenant_idto Temporal workflow context - Update all activities for tenant scoping
Phase 5: Keycloak Realm Provisioning
- Tenant onboarding API (create realm, configure client, set up roles)
- Subdomain-based realm discovery in frontend
- Custom domain mapping table
- Tenant admin UI for managing their realm settings
11. Testing Strategy
Isolation Tests
Every repository method must have a test that:
- Creates data for Tenant A
- Creates data for Tenant B
- Queries as Tenant A and verifies only Tenant A's data is returned
- Attempts to access Tenant B's data by ID and verifies rejection (403 or 404)
Neo4j Isolation Tests
- Create entities for both tenants
- Run graph traversal as Tenant A
- Verify no Tenant B nodes or relationships appear in results
- Verify that relationship traversal cannot cross tenant boundaries
RLS Verification
- Set
app.current_tenant_idto Tenant A - Run a raw SELECT without WHERE clause
- Verify RLS policy filters results to Tenant A only
12. Open Questions
-
Tenant ID format. Should tenant_id be the raw realm name (
atlas-acme) or a separate UUID with a mapping table? Realm names are human-readable and stable, but if a realm is renamed, all data references break. A mapping table adds indirection but decouples identity from naming. -
Cross-tenant reporting. Will there ever be a need for a super-admin view across all tenants (e.g., platform-level analytics)? If so, RLS policies need a bypass mechanism for the admin role.
-
Data migration for existing single-tenant deployment. What tenant_id should existing data receive? Options: the current realm name, a
defaultsentinel, or prompt the operator to specify during migration. -
Neo4j database-per-tenant vs shared database. For very large tenants, a separate Neo4j database (supported in Neo4j 5+) provides physical isolation. This is an operational decision that doesn't affect the application code if tenant_id scoping is implemented correctly.