Security & multi-tenancy

Atlas is multi-tenant by design, and tenant isolation is enforced where it is hardest to bypass: in the database, with PostgreSQL Row-Level Security (RLS). Authentication is delegated to Keycloak, provider credentials are encrypted per tenant, and the architecture is fail-closed — a missing tenant context denies access rather than leaking data. See ADR-022 and ADR-024.

Three layers of isolation

Identity — each tenant has its own Keycloak realm. A JWT carries the tenant and user.
Application — PlatformAuthMiddleware validates the JWT, extracts the tenant, and stores it on request.state. require_tenant denies any request without a tenant context.
Database — the request's tenant is set on the connection (SET app.current_tenant), and RLS policies filter every query. Even a bug in a handler cannot return another tenant's rows.

Request → tenant → RLS

The application connects through a restricted atlas_app role specifically so that RLS is enforced — a superuser connection would bypass policies. A guarded allow_owner_db_fallback flag exists only for emergency recovery.

Why isolation can't leak

If tenant B asks for a record that belongs to tenant A, the RLS predicate appended by PostgreSQL returns zero rows and the API responds 404 — there is no code path that returns the row.

Authentication & authorization

Authentication — Keycloak OIDC; the backend validates JWTs (src/integrations/keycloak_client.py, src/api/auth.py). The frontend uses keycloak-js with PKCE over a same-origin proxy.
Authorization — role-based. Roles such as admin, analyst, viewer, and workflow_editor gate endpoints; Studio/Settings are admin-only. Workflow-level RBAC lives in src/workflows/auth.py.

Credential encryption

Provider and LLM credentials are stored encrypted at rest in data_provider_credentials and decrypted only when a client is instantiated. EnvKeyEncryptor (src/security/credential_encryption.py) uses AES-GCM with a key from the environment.

Credential resolution is tenant-scoped with a fallback chain; if no usable credential exists for a required provider, the system raises MissingTenantCredentialsError (HTTP 424) rather than falling back to someone else's key. See Plugins.

Defense-in-depth summary

Control	Mechanism
Tenant data isolation	PostgreSQL RLS via restricted `atlas_app` role
Identity	Keycloak realm per tenant, OIDC/JWT
Authorization	Role-based endpoint guards
Secrets at rest	AES-GCM credential encryption
Fail-closed	`require_tenant`, schema/registry boot checks, 424 on missing creds
Rate limiting	slowapi middleware + Redis
Transport	nginx ingress + TLS (cert-manager)

Deep dive: enforcement internals

This reflects src/database/connection.py, src/api/auth.py, src/api/rate_limit.py, and src/security/credential_encryption.py.

The fail-closed pool

The pool authenticates as the restricted atlas_app role, against which tenant tables have FORCE ROW LEVEL SECURITY — so even the application role cannot bypass RLS.

If the atlas_app credentials fail and the emergency ALLOW_OWNER_DB_FALLBACK flag is not set, the service refuses to boot rather than silently falling back to an RLS-bypassing owner role.

Per-request tenant binding

A 60-second TenantCache avoids a tenant lookup on every request. Exempt paths (/health, /tenants/resolve, docs) skip auth. Rate limiting keys on {tenant_id}:{ip} (falling back to anon:{ip}), backed by Redis, returning 429 with Retry-After.

Credential encryption — AES-256-GCM + HKDF per tenant

Each tenant gets a derived subkey, so ciphertext from one tenant is cryptographically undecryptable with another tenant's context:

EncryptedPayload stores nonce || ciphertext+tag plus a key_id (versioned for rotation).
Cross-tenant decrypt uses the wrong subkey → InvalidTag (the decrypt fails); the same happens on tampering. A key_id mismatch raises a distinct ValueError so rotation is unambiguous.
The master key lives in process memory only — never logged, never written to disk.

Invariants

No tenant context ⇒ no rows. An unset app.current_tenant_id makes RLS-protected queries return empty, never another tenant's data.
atlas_app cannot bypass RLS (FORCE RLS); only the explicit emergency flag changes that, with a loud warning.
Tenant id is required, never defaulted — missing it is an explicit error, not a silent fallback.

To add a tenant-scoped endpoint, see Add a frontend feature → backend.

Three layers of isolation​

Request → tenant → RLS​

Why isolation can't leak​

Authentication & authorization​

Credential encryption​

Defense-in-depth summary​

Deep dive: enforcement internals​

The fail-closed pool​

Per-request tenant binding​

Credential encryption — AES-256-GCM + HKDF per tenant​

Invariants​