Backup & disaster recovery
Atlas state lives in four stores, each with a different recovery character. A sound DR posture backs up the system of record and treats derived state as rebuildable.
What to back up
| Store | Strategy | Notes |
|---|---|---|
| PostgreSQL | Regular full + WAL/PITR | The system of record — entities, claims, mutations, investigations, reports, settings |
| MinIO | Object backup / replication | Generated report documents |
| Keycloak | Realm export + its DB | Tenant identities and role mappings |
| Neo4j | Optional | Can be rebuilt from PostgreSQL via graph sync |
| Redis | None required | Cache and rate-limit tokens only |
The key insight: Neo4j is a projection. After restoring PostgreSQL you re-run graph sync to rebuild the property graph, so Neo4j backups are a convenience, not a requirement.
Recovery order
- PostgreSQL first — it is the source of truth. Confirm the schema version with
flyway info; the app will refuse to boot against an incompatible schema. - Keycloak — restore realms so tenants can authenticate.
- MinIO — restore report documents (or accept that historical PDFs can be regenerated from the graph if necessary).
- Rebuild Neo4j — run graph sync to project PostgreSQL into the graph, then run a parity check.
- Verify — run a test investigation per tenant and confirm RLS isolation.
Multi-tenancy and DR
Because every authoritative row is tenant-scoped, a full PostgreSQL restore brings back all tenants
consistently. Per-tenant restore is possible by filtering on tenant_id, but whole-cluster restore
is simpler and avoids partial-state inconsistencies between tenants.
Retention
REPORT_RETENTION_DAYS governs how long generated reports are kept. Align backup retention with
this and with any tenant data-retention commitments (see the privacy/retention runbooks in the main
repository).
Drills
Treat DR as something you practise, not something you hope for: periodically restore PostgreSQL into a scratch environment, rebuild Neo4j, and run the isolation checklist. The security runbooks in the repository include a DR-drill procedure; this page is the architecture-level summary of why the order above is correct.