Skip to main content

Temporal workflows

Investigations are long-running, multi-step, and must survive process restarts — exactly the problem Temporal solves. Atlas models an investigation as a durable workflow whose steps are activities: idempotent units of work that Temporal will retry and whose progress is persisted. The code lives in src/temporal/.

Why Temporal

A single investigation runs seven OSINT modules, each an agentic LLM loop that can take minutes, then reconciles entities, syncs the graph, scores risk, and generates a report. If the worker crashes mid-investigation, the work must resume — not restart. Temporal gives Atlas durable execution, automatic retries, timeouts, and full history for observability.

The investigation workflow

InvestigationWorkflow (src/temporal/workflows.py) is the entry point. It fans out the seven modules in parallel, then runs reconciliation, persistence, graph sync, risk scoring, and report generation as activities. Partial module failure is tolerated — the investigation completes with the data it has and surfaces what failed.

Activities

Activity fileResponsibility
activities.pyCore: module execution, persistence, reconciliation, risk
enrichment_activities.pyFreshness checks, context building
sync_activities.pyNeo4j sync orchestration
risk_rules.pyPer-module risk-rule application

Client, worker, and queues

  • Client (client.py) — start_investigation(), cancel_investigation(), get_workflow_history() over gRPC.
  • Worker (worker.py) — polls task queues and executes workflows/activities. In production this is a separate temporal-worker deployment; a second workflow-engine-worker runs the low-code workflow engine.
  • Task queues separate concerns (general activities, graph sync, risk scoring) with their own timeouts and heartbeats for long-running steps.

Relationship to the low-code workflow engine

Beyond investigations, Atlas has an experimental workflow engine v2.0 (src/workflows/, ADR-007) that lets users compose custom task sequences in a visual builder. It runs on its own worker and is feature-flagged. The investigation workflow described here is the production path; the workflow engine is the extensibility path.

Observability

Because every step is a recorded activity, an in-flight investigation is fully inspectable — the Temporal UI (temporal-ui service) shows workflow history, and the frontend's TemporalActivityPanel surfaces module/activity progress to analysts. See Operations → Observability.


Deep dive: workflow internals

This reflects src/temporal/workflows.py, activities.py, enrichment_activities.py, constants.py.

The six stages

  1. Enrichment fetches provider context for the prompts; failure is non-fatal (the investigation proceeds without it).
  2. Modules run in full parallel — each execute_module is its own activity (not a child workflow), bracketed by open_run/close_run for provenance.
  3. Reconciliation merges entities across modules and syncs the graph.
  4. Risk computes overall level/score/recommendation.
  5. Persist writes the result; a failure here is logged into output.errors, not fatal.
  6. Graph sync runs as a detached child workflow (ABANDON) so it never blocks investigation completion.

Configuration constants

SettingValue
Task queueosint-investigations (constants.py)
Module retryinitial 1s, max 1m, 3 attempts, backoff 2.0
Per-module timeout20 minutes
Reconciliation / persist / enrichment timeouts5m / 2m / 2m
Tool result truncation (model)MAX_LLM_TOOL_RESULT_CHARS = 12,000
Evidence retentionMAX_EVIDENCE_TOOL_RESULT_CHARS = 200,000

Partial-failure semantics

The investigation is marked successful if any module completes (or all are skipped) — a single failing module degrades the result rather than failing the whole run. Cancellation propagates asyncio.CancelledError to in-flight activities.

Transient vs permanent errors

Transient exceptions (RemoteProtocolError, ConnectError, TimeoutException, ReadTimeout, …) auto-retry per the RetryPolicy; permanent module exceptions are recorded in output.errors and don't cascade. A MissingTenantCredentialsError surfaces as a structural failure (no retry).

To add a module, see Add an OSINT module.