Skip to main content

Backup & disaster recovery

Atlas state lives in four stores, each with a different recovery character. A sound DR posture backs up the system of record and treats derived state as rebuildable.

What to back up

StoreStrategyNotes
PostgreSQLRegular full + WAL/PITRThe system of record — entities, claims, mutations, investigations, reports, settings
MinIOObject backup / replicationGenerated report documents
KeycloakRealm export + its DBTenant identities and role mappings
Neo4jOptionalCan be rebuilt from PostgreSQL via graph sync
RedisNone requiredCache and rate-limit tokens only

The key insight: Neo4j is a projection. After restoring PostgreSQL you re-run graph sync to rebuild the property graph, so Neo4j backups are a convenience, not a requirement.

Recovery order

  1. PostgreSQL first — it is the source of truth. Confirm the schema version with flyway info; the app will refuse to boot against an incompatible schema.
  2. Keycloak — restore realms so tenants can authenticate.
  3. MinIO — restore report documents (or accept that historical PDFs can be regenerated from the graph if necessary).
  4. Rebuild Neo4j — run graph sync to project PostgreSQL into the graph, then run a parity check.
  5. Verify — run a test investigation per tenant and confirm RLS isolation.

Multi-tenancy and DR

Because every authoritative row is tenant-scoped, a full PostgreSQL restore brings back all tenants consistently. Per-tenant restore is possible by filtering on tenant_id, but whole-cluster restore is simpler and avoids partial-state inconsistencies between tenants.

Retention

REPORT_RETENTION_DAYS governs how long generated reports are kept. Align backup retention with this and with any tenant data-retention commitments (see the privacy/retention runbooks in the main repository).

Drills

Treat DR as something you practise, not something you hope for: periodically restore PostgreSQL into a scratch environment, rebuild Neo4j, and run the isolation checklist. The security runbooks in the repository include a DR-drill procedure; this page is the architecture-level summary of why the order above is correct.