ADR-022: Multi-Tenancy Architecture

Field	Value
Status	Draft
Date	2026-04-08
Relates	ADR-008, ADR-008a, ADR-009, ADR-019, ADR-020, ADR-021

1. Problem Statement

Atlas currently operates as a single-tenant application. While the auth layer correctly extracts a tenant_id from the Keycloak JWT issuer claim (realm name), the database schema, API layer, Neo4j graph, file storage, and workflow engine do not enforce tenant isolation. An authenticated user can access data belonging to any tenant in the system.

For Atlas to serve multiple compliance teams (each representing a separate obliged entity under AMLR), every layer must enforce strict tenant boundaries. A compliance officer at Organization A must never see, query, or influence data belonging to Organization B — this is both a security requirement and a regulatory one, since AML data includes personal information (PEP status, beneficial ownership, sanctions matches) protected under GDPR.

2. Decision: One Keycloak Realm Per Tenant, Row-Level Tenant Isolation

Each organization gets its own Keycloak realm, providing full isolation of users, roles, sessions, and identity provider configuration. The realm name maps directly to the tenant identifier used throughout the platform.

Data isolation uses row-level tenant scoping: every tenant-owned table includes a tenant_id column, and every query filters by it. This is simpler to implement and operate than schema-per-tenant or database-per-tenant, and scales to hundreds of tenants without operational overhead.

3. Keycloak Realm Architecture

One Realm Per Organization

Each tenant gets a dedicated Keycloak realm with:

Independent user directory (no shared users between tenants)
Tenant-specific identity provider configuration (Azure AD, Google Workspace, SAML, username/password)
Tenant-specific MFA policies and password complexity rules
Tenant-specific roles (analyst, compliance_officer, admin) that map to Atlas permissions
Session isolation (no cross-tenant session leakage)

Realm Provisioning

New realms are created via the Keycloak Admin REST API during tenant onboarding:

Create realm with naming convention: atlas-{org_slug} (e.g., atlas-acme)
Create the atlas-frontend client with appropriate redirect URIs
Configure default roles: analyst, compliance_officer, admin
Set password policies per tenant's requirements
Optionally configure external IdP (Azure AD, Google Workspace)

Subdomain-based routing: acme.atlas.app resolves to realm atlas-acme. The frontend reads the subdomain, constructs the Keycloak OIDC discovery URL, and initiates the auth flow against the correct realm. No organization selector screen needed.

Fallback for custom domains: a tenant configuration table maps custom domains to realm names.

JWT to Tenant Mapping

The existing extract_tenant_from_issuer() in auth.py already handles this:

JWT issuer: https://auth.atlas.app/realms/atlas-acme
Extracted tenant_id: atlas-acme

The AuthContext.tenant_id is populated on every request. What's missing is enforcement downstream.

4. Database Tenant Isolation

Strategy: Row-Level Isolation with tenant_id Column

Every tenant-owned table gets a tenant_id VARCHAR(100) NOT NULL column. Queries include WHERE tenant_id = $tenant_id in all operations. PostgreSQL Row-Level Security (RLS) is added as a defense-in-depth layer.

Tenant ID Type

VARCHAR(100) rather than UUID because the tenant ID is the Keycloak realm name (a human-readable slug), not a generated UUID. This makes debugging, logging, and manual queries straightforward.

Tables Requiring tenant_id

Tier 1 — Core Data (must ship together)

Table	Current State	Notes
`companies`	No tenant_id	Portfolio is per-tenant
`investigations`	No tenant_id	Investigation data is per-tenant
`ontology_entities`	No tenant_id	Entity graph is per-tenant
`ontology_relationships`	No tenant_id	Relationship graph is per-tenant
`ontology_findings`	No tenant_id	Findings are per-tenant
`ontology_risk_indicators`	No tenant_id	Risk indicators are per-tenant
`risk_matrix_schemas`	No tenant_id	Each tenant has their own matrices
`risk_matrix_evaluations`	No tenant_id	Evaluation results are per-tenant
`company_matrix_assignments`	No tenant_id	Assignments are per-tenant
`investigation_evidence`	No tenant_id	Evidence is per-tenant
`workflow_schemas`	No tenant_id	Workflow definitions are per-tenant
`workflow_executions`	No tenant_id	Execution state is per-tenant
`workflow_tasks`	No tenant_id	Tasks are per-tenant
`portal_drafts`	No tenant_id	Portal content is per-tenant

Tier 2 — Supporting Data

Table	Current State	Notes
`data_providers`	No tenant_id	Provider credentials are per-tenant
`data_provider_responses`	No tenant_id	Response data is per-tenant
`ontology_mutations`	No tenant_id	Mutation queue is per-tenant
`conflict_records`	Has tenant_id	Already correct
`conflict_review_tasks`	Has tenant_id	Already correct
`evaluation_runs`	No tenant_id	Test runs are per-tenant
`events`	No tenant_id	Event stream is per-tenant
`entity_projections`	No tenant_id	Projections are per-tenant
`entity_cache`	No tenant_id	Cache is per-tenant

Shared / System-Wide (no tenant_id needed)

Table	Reason
`reference_data_types`	System-wide type definitions (is_system=true for built-in)
`reference_datasets`	Already has tenant_id with NULL = system default
`ontology_schema_versions`	Schema definitions are shared (data is tenant-scoped)
`app_settings`	System configuration
`prompt_templates`	System-wide AI prompts
`segments`	Shared segment definitions

Migration Strategy

V103: Add tenant_id VARCHAR(100) NULL to all Tier 1 tables
V104: Backfill existing data with a default tenant ('default' or the single existing realm name)
V105: Add NOT NULL constraint after backfill
V106: Add composite indexes on (tenant_id, id) and (tenant_id, status) for common query patterns
V107: Enable PostgreSQL Row-Level Security as defense-in-depth

Row-Level Security (Defense in Depth)

RLS policies ensure that even if application code misses a WHERE clause, the database rejects cross-tenant access:

ALTER TABLE companies ENABLE ROW LEVEL SECURITY;

CREATE POLICY companies_tenant_isolation ON companies
  USING (tenant_id = current_setting('app.current_tenant_id'));

-- Set per-connection at the start of each request
SET app.current_tenant_id = 'atlas-acme';

The API middleware sets app.current_tenant_id on each database connection before executing queries. RLS acts as a safety net — the application queries should still include explicit WHERE tenant_id = $1 for clarity and index usage, but RLS catches any that miss it.

5. API Layer Enforcement

Tenant Middleware

Every request already has AuthContext.tenant_id from the JWT. A new middleware layer (or extension of the existing auth middleware) must:

Extract tenant_id from AuthContext
Set app.current_tenant_id on the database connection (for RLS)
Pass tenant_id to all repository calls

Repository Pattern

All repository methods gain a tenant_id parameter. The mutation_queue repository is the reference implementation:

# CORRECT pattern (from mutation_queue/repository.py)
async def list_open(self, tenant_id: str) -> list[dict]:
    return await self._pool.fetch(
        "SELECT * FROM conflict_records WHERE tenant_id = $1 AND status = 'open'",
        tenant_id,
    )

Every repository must follow this pattern. List operations filter by tenant. Get-by-ID operations verify tenant ownership. Create operations set tenant_id. Update and delete operations include tenant_id in the WHERE clause.

Cross-Tenant Access Prevention

GET /investigations/{id} must verify investigation.tenant_id == auth.tenant_id
GET /companies must return only WHERE tenant_id = auth.tenant_id
All search and list endpoints must include tenant filtering
Any endpoint that accepts an ID must verify tenant ownership before returning data

6. Neo4j Graph Tenant Isolation

Strategy: tenant_id Property + Label-Based Partitioning

Every node in Neo4j gets a tenant_id property. Additionally, tenant-specific labels are applied for performance:

// Node creation
CREATE (e:LegalEntity:Tenant_atlas_acme {
  id: $id,
  tenant_id: 'atlas-acme',
  name: 'Acme Corp',
  ...
})

// Tenant-scoped query
MATCH (e:LegalEntity:Tenant_atlas_acme {id: $id})
RETURN e

The dual approach (property + label) provides:

Property: Used in WHERE clauses for flexibility, stored in indexes
Label: Used for partition-like performance — Neo4j can skip entire label groups during traversal. Index creation per tenant label ensures fast lookups.

CypherGenerator Changes

upsert_entity() gains a tenant_id parameter. All generated Cypher includes tenant scoping:

def upsert_entity(
    entity_type: str,
    entity_id: str,
    entity_data: dict,
    tenant_id: str,  # NEW
    company_id: str | None = None,
    investigation_id: str | None = None,
) -> GeneratedQuery:
    tenant_label = f"Tenant_{tenant_id.replace('-', '_')}"
    return GeneratedQuery(
        cypher=f"""
        MERGE (e:{entity_type}:{tenant_label} {{id: $id}})
        SET e += $props, e.tenant_id = $tenant_id
        """,
        params={"id": entity_id, "props": entity_data, "tenant_id": tenant_id},
    )

neo4j_sync.py Changes

The sync process receives tenant_id from the source PostgreSQL row and includes it in all node/relationship creation. The sync query filters source rows by tenant.

Graph Query Isolation

All graph repository methods must include tenant scoping:

// Entity lookup
MATCH (e:LegalEntity {id: $id, tenant_id: $tenant_id})
RETURN e

// Relationship traversal — CRITICAL: both ends must be tenant-scoped
MATCH (a {tenant_id: $tenant_id})-[r]-(b {tenant_id: $tenant_id})
WHERE a.id = $entity_id
RETURN a, r, b

// Never traverse across tenant boundaries

7. File Storage Tenant Isolation

MinIO Object Key Prefixing

All file storage keys include the tenant as a prefix:

Current:  evidence/{investigation_id}/{filename}
New:      {tenant_id}/evidence/{investigation_id}/{filename}

This provides:

Logical isolation in a shared bucket
Easy per-tenant storage metrics (prefix-based size queries)
Option to use separate buckets per tenant in the future (configuration change, not code change)

Presigned URL Validation

When generating presigned URLs for file access, the API must verify that the requesting user's tenant_id matches the file's tenant prefix. The object key itself encodes the tenant boundary.

8. Temporal Workflow Tenant Isolation

Workflow Execution Context

The Temporal workflow execution payload must include tenant_id:

@dataclass
class InvestigationContext:
    investigation_id: str
    company_id: str
    tenant_id: str  # NEW — propagated to all activities
    ...

Activity Tenant Scoping

Every Temporal activity receives tenant_id through the context and passes it to all repository and service calls. Activities must not query data outside the tenant boundary.

Task Queue Isolation

Option A: Shared task queue, tenant_id in payload (simpler, current approach extended) Option B: Per-tenant task queues (investigation-queue-atlas-acme) for full isolation

Recommendation: Option A for now. Per-tenant queues add operational complexity (queue management, worker scaling per tenant) without a clear benefit until tenant count is high.

9. Shared vs Tenant-Scoped Resources

Shared (System-Wide)

Resource	Rationale
Ontology schema definitions	Schema structure is platform-wide; data is tenant-scoped
Reference data types (is_system=true)	Built-in category types are shared
System default reference datasets (tenant_id IS NULL)	FATF lists, EU high-risk countries are global facts
Prompt templates	AI prompts are platform-managed
Application settings	System configuration

Tenant-Scoped

Resource	Rationale
Companies and investigations	Core business data
Ontology entity data	KYC/CDD data is confidential per tenant
Risk matrices and evaluations	Each tenant has their own risk methodology
Workflow schemas and executions	Each tenant has their own compliance processes
Evidence and findings	Investigation artifacts are per-tenant
Custom reference datasets (tenant_id IS NOT NULL)	Tenant-specific risk classifications
User-created category types (ADR-021)	Per-tenant risk category customization
Data provider configurations	Per-tenant API credentials and settings
Neo4j graph data	Entity relationship graph is per-tenant

Tenant Override Pattern (Already Implemented)

The reference data resolver already supports tenant overrides: tenant-specific datasets take precedence over system defaults. This pattern extends naturally — a tenant can customize country risk lists while inheriting the system default for datasets they haven't overridden.

10. Implementation Phases

Phase 1: Database Foundation

Add tenant_id to all Tier 1 tables (nullable)
Backfill existing data with default tenant
Add NOT NULL constraints
Add composite indexes
Enable RLS policies

Phase 2: API & Repository Enforcement

Add tenant_id parameter to all repository methods
Update all API routes to pass tenant_id from AuthContext
Add tenant verification middleware (get-by-ID verifies ownership)
Integration tests for cross-tenant access prevention

Phase 3: Neo4j Graph Isolation

Add tenant_id property to all nodes
Add tenant-specific labels
Update CypherGenerator for tenant scoping
Update neo4j_sync to include tenant context
Update all graph queries

Phase 4: File Storage & Workflows

Add tenant prefix to MinIO object keys
Migrate existing files to tenant-prefixed paths
Add tenant_id to Temporal workflow context
Update all activities for tenant scoping

Phase 5: Keycloak Realm Provisioning

Tenant onboarding API (create realm, configure client, set up roles)
Subdomain-based realm discovery in frontend
Custom domain mapping table
Tenant admin UI for managing their realm settings

11. Testing Strategy

Isolation Tests

Every repository method must have a test that:

Creates data for Tenant A
Creates data for Tenant B
Queries as Tenant A and verifies only Tenant A's data is returned
Attempts to access Tenant B's data by ID and verifies rejection (403 or 404)

Neo4j Isolation Tests

Create entities for both tenants
Run graph traversal as Tenant A
Verify no Tenant B nodes or relationships appear in results
Verify that relationship traversal cannot cross tenant boundaries

RLS Verification

Set app.current_tenant_id to Tenant A
Run a raw SELECT without WHERE clause
Verify RLS policy filters results to Tenant A only

12. Open Questions

Tenant ID format. Should tenant_id be the raw realm name (atlas-acme) or a separate UUID with a mapping table? Realm names are human-readable and stable, but if a realm is renamed, all data references break. A mapping table adds indirection but decouples identity from naming.
Cross-tenant reporting. Will there ever be a need for a super-admin view across all tenants (e.g., platform-level analytics)? If so, RLS policies need a bypass mechanism for the admin role.
Data migration for existing single-tenant deployment. What tenant_id should existing data receive? Options: the current realm name, a default sentinel, or prompt the operator to specify during migration.
Neo4j database-per-tenant vs shared database. For very large tenants, a separate Neo4j database (supported in Neo4j 5+) provides physical isolation. This is an operational decision that doesn't affect the application code if tenant_id scoping is implemented correctly.

1. Problem Statement​

2. Decision: One Keycloak Realm Per Tenant, Row-Level Tenant Isolation​

3. Keycloak Realm Architecture​

One Realm Per Organization​

Realm Provisioning​

Tenant Discovery at Login​

JWT to Tenant Mapping​

4. Database Tenant Isolation​

Strategy: Row-Level Isolation with tenant_id Column​

Tenant ID Type​

Tables Requiring tenant_id​

Tier 1 — Core Data (must ship together)​

Tier 2 — Supporting Data​

Shared / System-Wide (no tenant_id needed)​

Migration Strategy​

Row-Level Security (Defense in Depth)​

5. API Layer Enforcement​

Tenant Middleware​

Repository Pattern​

Cross-Tenant Access Prevention​

6. Neo4j Graph Tenant Isolation​

Strategy: tenant_id Property + Label-Based Partitioning​

CypherGenerator Changes​

neo4j_sync.py Changes​

Graph Query Isolation​

7. File Storage Tenant Isolation​

MinIO Object Key Prefixing​

Presigned URL Validation​

8. Temporal Workflow Tenant Isolation​

Workflow Execution Context​

Activity Tenant Scoping​

Task Queue Isolation​

9. Shared vs Tenant-Scoped Resources​

Shared (System-Wide)​

Tenant-Scoped​

Tenant Override Pattern (Already Implemented)​

10. Implementation Phases​

Phase 1: Database Foundation​

Phase 2: API & Repository Enforcement​

Phase 3: Neo4j Graph Isolation​

Phase 4: File Storage & Workflows​

Phase 5: Keycloak Realm Provisioning​

11. Testing Strategy​

Isolation Tests​

Neo4j Isolation Tests​

RLS Verification​

12. Open Questions​