Temporal workflows
Investigations are long-running, multi-step, and must survive process restarts — exactly the
problem Temporal solves. Atlas models an investigation as a durable
workflow whose steps are activities: idempotent units of work that Temporal will retry and
whose progress is persisted. The code lives in src/temporal/.
Why Temporal
A single investigation runs seven OSINT modules, each an agentic LLM loop that can take minutes, then reconciles entities, syncs the graph, scores risk, and generates a report. If the worker crashes mid-investigation, the work must resume — not restart. Temporal gives Atlas durable execution, automatic retries, timeouts, and full history for observability.
The investigation workflow
InvestigationWorkflow (src/temporal/workflows.py) is the entry point. It fans out the seven
modules in parallel, then runs reconciliation, persistence, graph sync, risk scoring, and report
generation as activities. Partial module failure is tolerated — the investigation completes with
the data it has and surfaces what failed.
Activities
| Activity file | Responsibility |
|---|---|
activities.py | Core: module execution, persistence, reconciliation, risk |
enrichment_activities.py | Freshness checks, context building |
sync_activities.py | Neo4j sync orchestration |
risk_rules.py | Per-module risk-rule application |
Client, worker, and queues
- Client (
client.py) —start_investigation(),cancel_investigation(),get_workflow_history()over gRPC. - Worker (
worker.py) — polls task queues and executes workflows/activities. In production this is a separatetemporal-workerdeployment; a secondworkflow-engine-workerruns the low-code workflow engine. - Task queues separate concerns (general activities, graph sync, risk scoring) with their own timeouts and heartbeats for long-running steps.
Relationship to the low-code workflow engine
Beyond investigations, Atlas has an experimental workflow engine v2.0 (src/workflows/,
ADR-007) that lets users compose custom task
sequences in a visual builder. It runs on its own worker and is feature-flagged. The investigation
workflow described here is the production path; the workflow engine is the extensibility path.
Observability
Because every step is a recorded activity, an in-flight investigation is fully inspectable — the
Temporal UI (temporal-ui service) shows workflow history, and the frontend's
TemporalActivityPanel surfaces module/activity progress to analysts. See
Operations → Observability.
Deep dive: workflow internals
This reflects src/temporal/ — workflows.py, activities.py, enrichment_activities.py,
constants.py.
The six stages
- Enrichment fetches provider context for the prompts; failure is non-fatal (the investigation proceeds without it).
- Modules run in full parallel — each
execute_moduleis its own activity (not a child workflow), bracketed byopen_run/close_runfor provenance. - Reconciliation merges entities across modules and syncs the graph.
- Risk computes overall level/score/recommendation.
- Persist writes the result; a failure here is logged into
output.errors, not fatal. - Graph sync runs as a detached child workflow (
ABANDON) so it never blocks investigation completion.
Configuration constants
| Setting | Value |
|---|---|
| Task queue | osint-investigations (constants.py) |
| Module retry | initial 1s, max 1m, 3 attempts, backoff 2.0 |
| Per-module timeout | 20 minutes |
| Reconciliation / persist / enrichment timeouts | 5m / 2m / 2m |
| Tool result truncation (model) | MAX_LLM_TOOL_RESULT_CHARS = 12,000 |
| Evidence retention | MAX_EVIDENCE_TOOL_RESULT_CHARS = 200,000 |
Partial-failure semantics
The investigation is marked successful if any module completes (or all are skipped) — a single
failing module degrades the result rather than failing the whole run. Cancellation propagates
asyncio.CancelledError to in-flight activities.
Transient vs permanent errors
Transient exceptions (RemoteProtocolError, ConnectError, TimeoutException, ReadTimeout, …)
auto-retry per the RetryPolicy; permanent module exceptions are recorded in output.errors and
don't cascade. A MissingTenantCredentialsError surfaces as a structural failure (no retry).
To add a module, see Add an OSINT module.