Skip to main content

ADR-021: Dynamic Risk Categories & Matrix Integration

FieldValue
StatusDraft
Date2026-04-06
RelatesADR-008a, ADR-019, ADR-020

1. Problem Statement

The Risk Categories page (formerly Risk Inputs) manages the reference datasets that feed into risk matrix scoring — country risk lists, PEP tiers, industry classifications, sanctions configuration, product risk taxonomy, and UBO thresholds. These are the configurable facts that compliance officers maintain independently from the scoring logic.

Two problems limit the platform's ability to support AMLR 2024/1624 and custom compliance methodologies:

Problem A: Hardcoded Category Types

The six category types are hardcoded in three places:

  1. dataset_types.json — a static config file loaded at seed time, not editable through the UI.
  2. EDITOR_MAP in RiskCategories.tsx — a compile-time mapping from type_id to a dedicated React component. Adding a new category type requires writing a new editor component, adding it to the map, and deploying.
  3. TYPE_ICON_MAP in RiskCategories.tsx — hardcoded icons per type.

A compliance officer who needs a new category — say "delivery channel risk classifications" or "high net worth thresholds" (required by AMLR Art. 30-35) — cannot create one without a code change. This is a deployment bottleneck that breaks the platform's self-service principle.

Problem B: Risk Categories Are Not Universally Available to Matrices

The REFERENCE_LOOKUP scoring method (ADR-019) allows a factor to reference any dataset by list_key. The ReferenceLookupPanel dropdown already loads all active datasets dynamically. However, there is a disconnect:

  1. The scorer's _lookup_single supports two data formats: flat arrays (membership check) and row-based lists of dicts (key/score column lookup). But some existing datasets (e.g., pep_classification) use a third format — nested dicts — that _lookup_single does not handle cleanly for scoring.
  2. There is no validation at category creation time that a dataset's data format is compatible with REFERENCE_LOOKUP scoring. A compliance officer could create a dataset with an arbitrary JSON structure, wire it to a factor, and only discover at evaluation time that the scorer cannot look up values in it.
  3. The FACTOR_DATASET_MAP in version_manager.py hardcodes well-known factor-to-dataset relationships (e.g., pep_exposure → pep_tiers). New category types are not automatically included in publish-time snapshot extraction unless they are referenced via scoring_config.reference_dataset in a REFERENCE_LOOKUP factor.

2. Decision

2.1 Make category types user-definable

Compliance officers can create new risk category types from the UI without a code deployment. Each type defines:

  • A name and description
  • A data shape (one of three standardized formats, not arbitrary JSON)
  • An icon (selected from a predefined palette)
  • A display order in the Risk Categories page

2.2 Standardize three data shapes for scoring compatibility

Every risk category dataset must conform to one of three data shapes. The data shape determines how the scorer reads the data and which editor UI is rendered. This replaces the current schema_spec free-form JSON Schema with an opinionated, scoring-aware contract.

Data ShapeStructureScorer BehaviorUse Cases
liststring[] — flat array of valuesMembership check: value in list → match_score, else default_scoreCountry lists, sanctions lists, EU high-risk third countries
scored_tableArray<{ [key_col]: string, [score_col]: number, ...extra }> — rows with a key and score columnKey-value lookup: find row where row[key_col] == valuerow[score_col]Industry risk classification, product risk taxonomy, PEP tiers, sanctions match types
configobject — structured configurationNot directly scorable via REFERENCE_LOOKUP. Used for platform configuration (e.g., UBO thresholds, sanctions list weights). Cannot be wired to a matrix factor.UBO thresholds, sanctions screening configuration

When creating a new category type, the compliance officer selects a data shape. The platform:

  • Renders the appropriate editor (generic table editor for scored_table, list editor for list, JSON editor for config)
  • Validates imported data against the shape
  • Makes the dataset available in the REFERENCE_LOOKUP dropdown only if it uses a scorable shape (list or scored_table)

2.3 Any scorable risk category is automatically available to all matrices

The ReferenceLookupPanel already loads all active datasets. This ADR formalizes the contract:

  1. Every dataset with data shape list or scored_table appears in the REFERENCE_LOOKUP factor configuration dropdown.
  2. The dropdown groups datasets by category type and shows the dataset name, active version, and entry count.
  3. Datasets with data shape config are filtered out of the dropdown — they are not scoring inputs.
  4. At publish time, _extract_reference_list_keys already scans REFERENCE_LOOKUP factors for scoring_config.reference_dataset. No changes needed — new category types are automatically included in the snapshot.

2.4 Migrate existing types to the new model

The six existing category types are migrated to use the standardized data shapes:

Existing TypeData ShapeNotes
country_risk_listlistAlready a flat array. No data migration.
pep_classificationscored_tableCurrently { "tier_name": score } dict. Migrate to [{ "classification": "...", "score": N }].
industry_risk_classificationscored_tableCurrently nested dict. Migrate to row-based format.
product_risk_taxonomyscored_tableCurrently nested dict. Migrate to row-based format.
sanctions_configconfigNot a scoring dataset — stays as structured config.
ubo_thresholdsconfigNot a scoring dataset — stays as structured config.

After migration, the six dedicated editor components remain as optimized special cases for their types. New types created by compliance officers use the generic editors.


3. Data Shape Contracts

3.1 list Shape

{
"data_shape": "list",
"data": ["AF", "MM", "SY", "KP", "IR"]
}

Scorer contract: entity_value in dataset.datamatch_score. Supports multi-value strategy when entity value is an array.

Generic editor: Tag-based list editor with bulk import (CSV paste, file upload). Search/filter. Visual world map for country codes (detected by ^[A-Z]{2}$ pattern).

3.2 scored_table Shape

{
"data_shape": "scored_table",
"columns": {
"key": "industry_code",
"score": "risk_score",
"extra": ["description", "risk_tier"]
},
"data": [
{ "industry_code": "64.1", "risk_score": 80, "description": "Banking", "risk_tier": "high" },
{ "industry_code": "66.1", "risk_score": 60, "description": "Insurance", "risk_tier": "medium" }
]
}

Scorer contract: _lookup_single finds row where row[key_col] == entity_valuerow[score_col]. The columns.key and columns.score fields map to scoring_config.lookup_key_column and scoring_config.score_column respectively.

Generic editor: Table with sortable columns. Inline editing. Add/remove rows. CSV import/export. Score column rendered with a bar visualization showing the range within the factor's max_score.

Column definition at type creation time: When creating a scored_table type, the compliance officer defines the column structure:

{
"id": "delivery_channel_risk",
"name": "Delivery Channel Risk Classifications",
"data_shape": "scored_table",
"column_definitions": [
{ "name": "channel_type", "label": "Channel Type", "role": "key", "type": "string" },
{ "name": "risk_score", "label": "Risk Score", "role": "score", "type": "number" },
{ "name": "description", "label": "Description", "role": "display", "type": "string" },
{ "name": "eba_reference", "label": "EBA Reference", "role": "display", "type": "string" }
]
}

Each column has a role: key (the lookup column — exactly one required), score (the score column — exactly one required), or display (informational, shown in the editor but not used by the scorer).

3.3 config Shape

{
"data_shape": "config",
"data": {
"lists": ["OFAC_SDN", "EU_CONSOLIDATED"],
"weights": { "exact_name": 100, "fuzzy_name": 70 }
}
}

Scorer contract: None — config datasets are consumed by platform services (sanctions screening, UBO analysis), not by REFERENCE_LOOKUP factors.

Generic editor: JSON tree editor with expand/collapse. Schema-aware if schema_spec is defined on the type. Not shown in the REFERENCE_LOOKUP dropdown.


4. Database Changes

4.1 Extend reference_data_types Table

ALTER TABLE reference_data_types
ADD COLUMN data_shape VARCHAR(50) NOT NULL DEFAULT 'config',
ADD COLUMN column_definitions JSONB,
ADD COLUMN icon VARCHAR(50) DEFAULT 'th-list',
ADD COLUMN display_order INTEGER DEFAULT 0,
ADD COLUMN created_by VARCHAR(255) DEFAULT 'system_seed',
ADD COLUMN is_system BOOLEAN NOT NULL DEFAULT false;

-- Migrate existing types
UPDATE reference_data_types SET data_shape = 'list', is_system = true
WHERE id = 'country_risk_list';
UPDATE reference_data_types SET data_shape = 'scored_table', is_system = true
WHERE id IN ('industry_risk_classification', 'product_risk_taxonomy', 'pep_classification');
UPDATE reference_data_types SET data_shape = 'config', is_system = true
WHERE id IN ('sanctions_config', 'ubo_thresholds');

is_system distinguishes the 6 built-in types (which keep their dedicated editors) from user-created types (which use generic editors). System types cannot be deleted.

4.2 Extend reference_datasets Table

ALTER TABLE reference_datasets
ADD COLUMN entry_count INTEGER;

entry_count is computed on insert/update for list and scored_table shapes. Used for display in the REFERENCE_LOOKUP dropdown and Risk Categories overview.


5. API Changes

5.1 New Endpoint: Create Category Type

POST /api/reference-data/types
{
"id": "delivery_channel_risk",
"name": "Delivery Channel Risk Classifications",
"description": "Risk classifications for customer delivery channels per EBA GL 2.18-2.19",
"data_shape": "scored_table",
"column_definitions": [
{ "name": "channel_type", "label": "Channel Type", "role": "key", "type": "string" },
{ "name": "risk_score", "label": "Risk Score", "role": "score", "type": "number" },
{ "name": "description", "label": "Description", "role": "display", "type": "string" }
],
"icon": "satellite"
}

Validation:

  • id must be unique, lowercase, snake_case
  • data_shape must be one of list, scored_table, config
  • For scored_table: exactly one column with role: "key" and exactly one with role: "score"
  • For list: no column_definitions needed

5.2 Updated Endpoint: List Types

GET /api/reference-data/types?scorable=true

New scorable query parameter filters to types with data_shape in (list, scored_table). Used by the ReferenceLookupPanel to populate the dataset dropdown.

5.3 Updated Endpoint: Create Dataset

POST /api/reference-data/datasets

On create, validates data against the type's data_shape and column_definitions:

  • list → data must be an array of strings
  • scored_table → data must be an array of objects, each containing the key and score columns with correct types
  • config → data must be a valid object (optionally validated against schema_spec)

Computes and stores entry_count.


6. Frontend Changes

6.1 Risk Categories Page

The EDITOR_MAP gains a fallback:

function getEditor(typeId: string, dataShape: string): React.ComponentType<EditorProps> {
// System types keep their dedicated editors
if (EDITOR_MAP[typeId]) return EDITOR_MAP[typeId];

// User-created types use generic editors based on data shape
switch (dataShape) {
case 'list': return GenericListEditor;
case 'scored_table': return GenericTableEditor;
case 'config': return GenericConfigEditor;
default: return GenericConfigEditor;
}
}

6.2 "New Category Type" Dialog

A button at the top of the Risk Categories page opens a creation dialog:

┌─────────────────────────────────────────────────────────────┐
│ Create Risk Category Type │
├─────────────────────────────────────────────────────────────┤
│ │
│ Name: [Delivery Channel Risk Classifications] │
│ ID: delivery_channel_risk (auto-generated from name) │
│ Icon: [satellite ▼] │
│ │
│ Data Shape: ○ List ● Scored Table ○ Configuration │
│ │
│ ┌─ Columns ──────────────────────────────────────────────┐ │
│ │ Name Label Role Type │ │
│ │ channel_type Channel Type [key ▼] [string ▼] │ │
│ │ risk_score Risk Score [score▼] [number ▼] │ │
│ │ description Description [display] [string ▼] │ │
│ │ [+ Add Column] │ │
│ └────────────────────────────────────────────────────────┘ │
│ │
│ Description: │
│ [Risk classifications for customer delivery channels │
│ per EBA GL 2.18-2.19 ] │
│ │
│ [Cancel] [Create Category] │
└─────────────────────────────────────────────────────────────┘

6.3 Generic Table Editor (scored_table)

Renders a HTMLTable with columns derived from the type's column_definitions. Features:

  • Inline cell editing (double-click to edit, Enter to confirm)
  • Add row button at bottom
  • Delete row button per row
  • Sort by any column (click header)
  • CSV import: paste or file upload, columns matched by header name
  • CSV export
  • Score column shows a small inline bar (0 to max, colored by risk tier)
  • Search/filter input above the table

6.4 Generic List Editor (list)

Renders a tag-based list with:

  • Add value input with Enter to add
  • Click tag × to remove
  • Bulk paste (comma or newline separated)
  • CSV import
  • Search/filter
  • Country code detection: if all values match ^[A-Z]{2}$, render a world map visualization below the list

6.5 ReferenceLookupPanel Update

Group datasets in the dropdown by category type:

┌─────────────────────────────────────────────────────────┐
│ Reference Dataset: │
│ ┌──────────────────────────────────────────────────────┐│
│ │ Country Risk Lists ││
│ │ EU High-Risk Third Countries (27 entries, v1) ││
│ │ FATF Grey List (22 entries, v1) ││
│ │ FATF Black List (3 entries, v1) ││
│ │ Industry Risk Classifications ││
│ │ Industry Risk Classification (45 entries, v1) ││
│ │ Delivery Channel Risk ← user-created type ││
│ │ Channel Risk Ratings (8 entries, v1) ││
│ └──────────────────────────────────────────────────────┘│
│ │
│ Lookup Key Column: channel_type (auto-filled) │
│ Score Column: risk_score (auto-filled) │
│ Default Score: [0] │
│ Multi-Value: [max ▼] │
└─────────────────────────────────────────────────────────┘

When a dataset is selected, the lookup_key_column and score_column fields are auto-populated from the category type's column_definitions (if scored_table). For list datasets, these fields are hidden and match_score is shown instead. This eliminates a common misconfiguration where the compliance officer has to manually type column names.


7. Scorer Compatibility

7.1 No Changes to _score_reference_lookup

The scorer already handles both data formats: flat arrays (membership check) and row-based lists of dicts (key/score lookup). The standardized data shapes (list and scored_table) map directly to these two paths.

7.2 Publish-Time Snapshot

_extract_reference_list_keys in version_manager.py already scans all REFERENCE_LOOKUP factors for scoring_config.reference_dataset and includes them in the snapshot. No changes needed — new user-created datasets are automatically snapshotted if referenced by a factor.

7.3 Publish-Time Validation

_validate_reference_lists in scorer.py already checks that every referenced dataset exists in the snapshot. The only addition: validate that a referenced dataset's data_shape is scorable (list or scored_table). If a factor references a config-shaped dataset, publish validation should fail with a clear error: "Dataset 'ubo_thresholds' has data shape 'config' and cannot be used for REFERENCE_LOOKUP scoring."


8. Migration of Existing Datasets

8.1 pep_classificationscored_table

Current format:

{ "head_of_state": 30, "government_minister": 25, "senior_military": 20, ... }

Migrated format:

[
{ "classification": "head_of_state", "score": 30 },
{ "classification": "government_minister", "score": 25 },
{ "classification": "senior_military", "score": 20 }
]

Column definitions: classification (key), score (score).

8.2 industry_risk_classificationscored_table

Current format (nested dict with extra fields):

{
"64.1": { "risk_tier": "high", "score": 80 },
"66.1": { "risk_tier": "medium", "score": 60 }
}

Migrated format:

[
{ "industry_code": "64.1", "risk_score": 80, "risk_tier": "high" },
{ "industry_code": "66.1", "risk_score": 60, "risk_tier": "medium" }
]

Column definitions: industry_code (key), risk_score (score), risk_tier (display).

8.3 product_risk_taxonomyscored_table

Same pattern as industry risk — flatten from nested dict to array of rows.

8.4 Migration Script

A database migration script converts the data payload in-place for active and draft datasets. The data_shape column is set on the type, and column_definitions are populated. The dedicated editor components are updated to read the new row-based format (this is the main code change for existing editors).


9. "Referenced By" Column

As discussed in the risk categories / matrix integration design, each dataset row in the Risk Categories overview shows which published matrices reference it:

| Dataset | Type | Status | Entries | Referenced By |
|-------------------------------|---------------|--------|---------|---------------------|
| EU High-Risk Third Countries | Country Risk | Active | 27 | EBA Standard v3 |
| FATF Grey List | Country Risk | Active | 22 | -- |
| Industry Risk Classification | Industry Risk | Active | 45 | EBA Standard v3 |
| Channel Risk Ratings | Channel Risk | Active | 8 | Custom Matrix v1 |

Implementation: query all published matrix definitions, scan scoring_config.reference_dataset fields, and build a reverse index from list_key → matrix names. This is a read-time join, not a stored relationship — keeps the data model clean. Cached per page load.


10. Dataset Status in Factor Editor

Inside the REFERENCE_LOOKUP config panel, below the dataset dropdown, show a status line:

StateDisplay
Dataset active, healthy✅ Active — 27 entries, last updated 3 days ago
Dataset active, stale⚠️ Active — 27 entries, last updated 8 months ago
Dataset not found❌ Dataset not found — create in Risk Categories with link
Dataset in draft only⚠️ Draft only — activate before publishing matrix with link

The staleness threshold is configurable (default: 6 months). This catches misconfiguration early — before the compliance officer tries to publish.


11. Scope & Boundaries

In Scope

  • reference_data_types table extension (data_shape, column_definitions, icon, display_order, is_system)
  • New API endpoint: POST /api/reference-data/types
  • "Create Category Type" dialog in Risk Categories page
  • Generic list editor and generic table editor components
  • Data shape validation on dataset create/update
  • Auto-population of lookup_key_column / score_column in ReferenceLookupPanel
  • Migration of pep_classification, industry_risk_classification, product_risk_taxonomy to scored_table format
  • "Referenced By" column on Risk Categories overview
  • Dataset status line in factor editor

Out of Scope

  • Editing or deleting system types (the 6 built-in types are immutable)
  • Custom validation rules beyond data shape (e.g., "score must be between 0 and 100")
  • Hierarchical / nested scoring datasets (e.g., country → region → city)
  • Sharing category types across tenants (each tenant defines their own)
  • Type versioning (types are simple metadata; datasets are versioned, types are not)

12. Open Questions

  1. Should system types be migrated to scored_table immediately, or behind a feature flag? The migration changes the data payload format, which means all six dedicated editors need to be updated to read the new format. A feature flag would allow gradual rollout but adds temporary code complexity.

  2. Should the generic table editor support computed columns? For example, a "Risk Tier" column that auto-fills based on the score value (0-30 = Low, 31-70 = Medium, 71-100 = High). This would be a convenience feature for compliance officers but adds complexity to the editor.

  3. Maximum dataset size. Country lists are small (~250 rows). Industry classifications can be larger (thousands of NACE codes). Should the generic table editor support pagination, or is client-side rendering sufficient up to a practical limit (e.g., 10,000 rows)?