Skip to main content

ADR-022: Multi-Tenancy Architecture

FieldValue
StatusDraft
Date2026-04-08
RelatesADR-008, ADR-008a, ADR-009, ADR-019, ADR-020, ADR-021

1. Problem Statement

Atlas currently operates as a single-tenant application. While the auth layer correctly extracts a tenant_id from the Keycloak JWT issuer claim (realm name), the database schema, API layer, Neo4j graph, file storage, and workflow engine do not enforce tenant isolation. An authenticated user can access data belonging to any tenant in the system.

For Atlas to serve multiple compliance teams (each representing a separate obliged entity under AMLR), every layer must enforce strict tenant boundaries. A compliance officer at Organization A must never see, query, or influence data belonging to Organization B — this is both a security requirement and a regulatory one, since AML data includes personal information (PEP status, beneficial ownership, sanctions matches) protected under GDPR.


2. Decision: One Keycloak Realm Per Tenant, Row-Level Tenant Isolation

Each organization gets its own Keycloak realm, providing full isolation of users, roles, sessions, and identity provider configuration. The realm name maps directly to the tenant identifier used throughout the platform.

Data isolation uses row-level tenant scoping: every tenant-owned table includes a tenant_id column, and every query filters by it. This is simpler to implement and operate than schema-per-tenant or database-per-tenant, and scales to hundreds of tenants without operational overhead.


3. Keycloak Realm Architecture

One Realm Per Organization

Each tenant gets a dedicated Keycloak realm with:

  • Independent user directory (no shared users between tenants)
  • Tenant-specific identity provider configuration (Azure AD, Google Workspace, SAML, username/password)
  • Tenant-specific MFA policies and password complexity rules
  • Tenant-specific roles (analyst, compliance_officer, admin) that map to Atlas permissions
  • Session isolation (no cross-tenant session leakage)

Realm Provisioning

New realms are created via the Keycloak Admin REST API during tenant onboarding:

  1. Create realm with naming convention: atlas-{org_slug} (e.g., atlas-acme)
  2. Create the atlas-frontend client with appropriate redirect URIs
  3. Configure default roles: analyst, compliance_officer, admin
  4. Set password policies per tenant's requirements
  5. Optionally configure external IdP (Azure AD, Google Workspace)

Tenant Discovery at Login

Subdomain-based routing: acme.atlas.app resolves to realm atlas-acme. The frontend reads the subdomain, constructs the Keycloak OIDC discovery URL, and initiates the auth flow against the correct realm. No organization selector screen needed.

Fallback for custom domains: a tenant configuration table maps custom domains to realm names.

JWT to Tenant Mapping

The existing extract_tenant_from_issuer() in auth.py already handles this:

JWT issuer: https://auth.atlas.app/realms/atlas-acme
Extracted tenant_id: atlas-acme

The AuthContext.tenant_id is populated on every request. What's missing is enforcement downstream.


4. Database Tenant Isolation

Strategy: Row-Level Isolation with tenant_id Column

Every tenant-owned table gets a tenant_id VARCHAR(100) NOT NULL column. Queries include WHERE tenant_id = $tenant_id in all operations. PostgreSQL Row-Level Security (RLS) is added as a defense-in-depth layer.

Tenant ID Type

VARCHAR(100) rather than UUID because the tenant ID is the Keycloak realm name (a human-readable slug), not a generated UUID. This makes debugging, logging, and manual queries straightforward.

Tables Requiring tenant_id

Tier 1 — Core Data (must ship together)

TableCurrent StateNotes
companiesNo tenant_idPortfolio is per-tenant
investigationsNo tenant_idInvestigation data is per-tenant
ontology_entitiesNo tenant_idEntity graph is per-tenant
ontology_relationshipsNo tenant_idRelationship graph is per-tenant
ontology_findingsNo tenant_idFindings are per-tenant
ontology_risk_indicatorsNo tenant_idRisk indicators are per-tenant
risk_matrix_schemasNo tenant_idEach tenant has their own matrices
risk_matrix_evaluationsNo tenant_idEvaluation results are per-tenant
company_matrix_assignmentsNo tenant_idAssignments are per-tenant
investigation_evidenceNo tenant_idEvidence is per-tenant
workflow_schemasNo tenant_idWorkflow definitions are per-tenant
workflow_executionsNo tenant_idExecution state is per-tenant
workflow_tasksNo tenant_idTasks are per-tenant
portal_draftsNo tenant_idPortal content is per-tenant

Tier 2 — Supporting Data

TableCurrent StateNotes
data_providersNo tenant_idProvider credentials are per-tenant
data_provider_responsesNo tenant_idResponse data is per-tenant
ontology_mutationsNo tenant_idMutation queue is per-tenant
conflict_recordsHas tenant_idAlready correct
conflict_review_tasksHas tenant_idAlready correct
evaluation_runsNo tenant_idTest runs are per-tenant
eventsNo tenant_idEvent stream is per-tenant
entity_projectionsNo tenant_idProjections are per-tenant
entity_cacheNo tenant_idCache is per-tenant

Shared / System-Wide (no tenant_id needed)

TableReason
reference_data_typesSystem-wide type definitions (is_system=true for built-in)
reference_datasetsAlready has tenant_id with NULL = system default
ontology_schema_versionsSchema definitions are shared (data is tenant-scoped)
app_settingsSystem configuration
prompt_templatesSystem-wide AI prompts
segmentsShared segment definitions

Migration Strategy

  1. V103: Add tenant_id VARCHAR(100) NULL to all Tier 1 tables
  2. V104: Backfill existing data with a default tenant ('default' or the single existing realm name)
  3. V105: Add NOT NULL constraint after backfill
  4. V106: Add composite indexes on (tenant_id, id) and (tenant_id, status) for common query patterns
  5. V107: Enable PostgreSQL Row-Level Security as defense-in-depth

Row-Level Security (Defense in Depth)

RLS policies ensure that even if application code misses a WHERE clause, the database rejects cross-tenant access:

ALTER TABLE companies ENABLE ROW LEVEL SECURITY;

CREATE POLICY companies_tenant_isolation ON companies
USING (tenant_id = current_setting('app.current_tenant_id'));

-- Set per-connection at the start of each request
SET app.current_tenant_id = 'atlas-acme';

The API middleware sets app.current_tenant_id on each database connection before executing queries. RLS acts as a safety net — the application queries should still include explicit WHERE tenant_id = $1 for clarity and index usage, but RLS catches any that miss it.


5. API Layer Enforcement

Tenant Middleware

Every request already has AuthContext.tenant_id from the JWT. A new middleware layer (or extension of the existing auth middleware) must:

  1. Extract tenant_id from AuthContext
  2. Set app.current_tenant_id on the database connection (for RLS)
  3. Pass tenant_id to all repository calls

Repository Pattern

All repository methods gain a tenant_id parameter. The mutation_queue repository is the reference implementation:

# CORRECT pattern (from mutation_queue/repository.py)
async def list_open(self, tenant_id: str) -> list[dict]:
return await self._pool.fetch(
"SELECT * FROM conflict_records WHERE tenant_id = $1 AND status = 'open'",
tenant_id,
)

Every repository must follow this pattern. List operations filter by tenant. Get-by-ID operations verify tenant ownership. Create operations set tenant_id. Update and delete operations include tenant_id in the WHERE clause.

Cross-Tenant Access Prevention

  • GET /investigations/{id} must verify investigation.tenant_id == auth.tenant_id
  • GET /companies must return only WHERE tenant_id = auth.tenant_id
  • All search and list endpoints must include tenant filtering
  • Any endpoint that accepts an ID must verify tenant ownership before returning data

6. Neo4j Graph Tenant Isolation

Strategy: tenant_id Property + Label-Based Partitioning

Every node in Neo4j gets a tenant_id property. Additionally, tenant-specific labels are applied for performance:

// Node creation
CREATE (e:LegalEntity:Tenant_atlas_acme {
id: $id,
tenant_id: 'atlas-acme',
name: 'Acme Corp',
...
})

// Tenant-scoped query
MATCH (e:LegalEntity:Tenant_atlas_acme {id: $id})
RETURN e

The dual approach (property + label) provides:

  • Property: Used in WHERE clauses for flexibility, stored in indexes
  • Label: Used for partition-like performance — Neo4j can skip entire label groups during traversal. Index creation per tenant label ensures fast lookups.

CypherGenerator Changes

upsert_entity() gains a tenant_id parameter. All generated Cypher includes tenant scoping:

def upsert_entity(
entity_type: str,
entity_id: str,
entity_data: dict,
tenant_id: str, # NEW
company_id: str | None = None,
investigation_id: str | None = None,
) -> GeneratedQuery:
tenant_label = f"Tenant_{tenant_id.replace('-', '_')}"
return GeneratedQuery(
cypher=f"""
MERGE (e:{entity_type}:{tenant_label} {{id: $id}})
SET e += $props, e.tenant_id = $tenant_id
""",
params={"id": entity_id, "props": entity_data, "tenant_id": tenant_id},
)

neo4j_sync.py Changes

The sync process receives tenant_id from the source PostgreSQL row and includes it in all node/relationship creation. The sync query filters source rows by tenant.

Graph Query Isolation

All graph repository methods must include tenant scoping:

// Entity lookup
MATCH (e:LegalEntity {id: $id, tenant_id: $tenant_id})
RETURN e

// Relationship traversal — CRITICAL: both ends must be tenant-scoped
MATCH (a {tenant_id: $tenant_id})-[r]-(b {tenant_id: $tenant_id})
WHERE a.id = $entity_id
RETURN a, r, b

// Never traverse across tenant boundaries

7. File Storage Tenant Isolation

MinIO Object Key Prefixing

All file storage keys include the tenant as a prefix:

Current: evidence/{investigation_id}/{filename}
New: {tenant_id}/evidence/{investigation_id}/{filename}

This provides:

  • Logical isolation in a shared bucket
  • Easy per-tenant storage metrics (prefix-based size queries)
  • Option to use separate buckets per tenant in the future (configuration change, not code change)

Presigned URL Validation

When generating presigned URLs for file access, the API must verify that the requesting user's tenant_id matches the file's tenant prefix. The object key itself encodes the tenant boundary.


8. Temporal Workflow Tenant Isolation

Workflow Execution Context

The Temporal workflow execution payload must include tenant_id:

@dataclass
class InvestigationContext:
investigation_id: str
company_id: str
tenant_id: str # NEW — propagated to all activities
...

Activity Tenant Scoping

Every Temporal activity receives tenant_id through the context and passes it to all repository and service calls. Activities must not query data outside the tenant boundary.

Task Queue Isolation

Option A: Shared task queue, tenant_id in payload (simpler, current approach extended) Option B: Per-tenant task queues (investigation-queue-atlas-acme) for full isolation

Recommendation: Option A for now. Per-tenant queues add operational complexity (queue management, worker scaling per tenant) without a clear benefit until tenant count is high.


9. Shared vs Tenant-Scoped Resources

Shared (System-Wide)

ResourceRationale
Ontology schema definitionsSchema structure is platform-wide; data is tenant-scoped
Reference data types (is_system=true)Built-in category types are shared
System default reference datasets (tenant_id IS NULL)FATF lists, EU high-risk countries are global facts
Prompt templatesAI prompts are platform-managed
Application settingsSystem configuration

Tenant-Scoped

ResourceRationale
Companies and investigationsCore business data
Ontology entity dataKYC/CDD data is confidential per tenant
Risk matrices and evaluationsEach tenant has their own risk methodology
Workflow schemas and executionsEach tenant has their own compliance processes
Evidence and findingsInvestigation artifacts are per-tenant
Custom reference datasets (tenant_id IS NOT NULL)Tenant-specific risk classifications
User-created category types (ADR-021)Per-tenant risk category customization
Data provider configurationsPer-tenant API credentials and settings
Neo4j graph dataEntity relationship graph is per-tenant

Tenant Override Pattern (Already Implemented)

The reference data resolver already supports tenant overrides: tenant-specific datasets take precedence over system defaults. This pattern extends naturally — a tenant can customize country risk lists while inheriting the system default for datasets they haven't overridden.


10. Implementation Phases

Phase 1: Database Foundation

  • Add tenant_id to all Tier 1 tables (nullable)
  • Backfill existing data with default tenant
  • Add NOT NULL constraints
  • Add composite indexes
  • Enable RLS policies

Phase 2: API & Repository Enforcement

  • Add tenant_id parameter to all repository methods
  • Update all API routes to pass tenant_id from AuthContext
  • Add tenant verification middleware (get-by-ID verifies ownership)
  • Integration tests for cross-tenant access prevention

Phase 3: Neo4j Graph Isolation

  • Add tenant_id property to all nodes
  • Add tenant-specific labels
  • Update CypherGenerator for tenant scoping
  • Update neo4j_sync to include tenant context
  • Update all graph queries

Phase 4: File Storage & Workflows

  • Add tenant prefix to MinIO object keys
  • Migrate existing files to tenant-prefixed paths
  • Add tenant_id to Temporal workflow context
  • Update all activities for tenant scoping

Phase 5: Keycloak Realm Provisioning

  • Tenant onboarding API (create realm, configure client, set up roles)
  • Subdomain-based realm discovery in frontend
  • Custom domain mapping table
  • Tenant admin UI for managing their realm settings

11. Testing Strategy

Isolation Tests

Every repository method must have a test that:

  1. Creates data for Tenant A
  2. Creates data for Tenant B
  3. Queries as Tenant A and verifies only Tenant A's data is returned
  4. Attempts to access Tenant B's data by ID and verifies rejection (403 or 404)

Neo4j Isolation Tests

  1. Create entities for both tenants
  2. Run graph traversal as Tenant A
  3. Verify no Tenant B nodes or relationships appear in results
  4. Verify that relationship traversal cannot cross tenant boundaries

RLS Verification

  1. Set app.current_tenant_id to Tenant A
  2. Run a raw SELECT without WHERE clause
  3. Verify RLS policy filters results to Tenant A only

12. Open Questions

  1. Tenant ID format. Should tenant_id be the raw realm name (atlas-acme) or a separate UUID with a mapping table? Realm names are human-readable and stable, but if a realm is renamed, all data references break. A mapping table adds indirection but decouples identity from naming.

  2. Cross-tenant reporting. Will there ever be a need for a super-admin view across all tenants (e.g., platform-level analytics)? If so, RLS policies need a bypass mechanism for the admin role.

  3. Data migration for existing single-tenant deployment. What tenant_id should existing data receive? Options: the current realm name, a default sentinel, or prompt the operator to specify during migration.

  4. Neo4j database-per-tenant vs shared database. For very large tenants, a separate Neo4j database (supported in Neo4j 5+) provides physical isolation. This is an operational decision that doesn't affect the application code if tenant_id scoping is implemented correctly.