ADR-018: Evaluation Rework — Live Data Preview & Portfolio Scoring

Status: Draft Date: 2026-04-02 Author: Atlas Architecture Supersedes (partially): ADR-010 §4 (Schema Lifecycle), §12 (Evaluations) Depends on: ADR-008 (EBA Risk Matrix Engine), ADR-010 (Ontology Lifecycle), ADR-016 (Port-and-Wire Interaction Model) Impacts: Schema Mapping Designer, Evaluations page, Risk Matrix Editor, Compliance Studio sidebar

Context & Problem Statement
Decision Summary
Simplified Schema Lifecycle
Live Data Preview in Schema Designer
Evaluations Page Rework
Matrix Evaluation
Matrix Comparison
Pre-Publish Workflow
Sidebar Navigation Changes
Data Model Changes
Migration Path
Rejected Alternatives
Open Questions

1. Context & Problem Statement

ADR-010 introduced a four-state schema lifecycle (draft → evaluated → published → archived) and a schema evaluation framework that validated coverage, conflict rates, and scoring completeness before a schema could be published. While architecturally sound, the implementation revealed several usability and conceptual problems:

1. The "evaluated" status creates unnecessary friction. Once a schema passes evaluation, it becomes semi-locked ("minor edits" only). Any significant change requires re-running evaluation, creating a gatekeeping loop that slows down iterative schema design. Compliance officers reported that the lock-then-fork pattern felt punitive during the exploratory phase of schema design.

2. Schema evaluation conflates design-time validation with runtime scoring. The Evaluations page currently has two tabs: "Schema Runs" (which evaluates a schema against test entities) and "Scoring Runs" (placeholder for portfolio scoring). These are fundamentally different operations with different audiences and different triggers. Combining them under one menu item created confusion about what "evaluation" means.

3. Schema evaluation is disconnected from the design experience. Running a schema evaluation requires navigating away from the Schema Designer to the Evaluations page, selecting the schema version, configuring a test suite, and running the evaluation. The feedback loop is too long. Designers need to see data flowing through their mappings while they are designing, not after.

4. Portfolio scoring was never implemented. The Scoring Runs tab was always a placeholder ("coming soon"). Meanwhile, the most critical evaluation question — "what happens to my portfolio's risk scores if I publish this matrix?" — has no answer in the product.

5. Risk matrix changes have no impact assessment. A compliance officer can publish a new risk matrix version with no visibility into how it changes scores across the portfolio. This is a significant compliance risk — a misconfigured matrix could reclassify hundreds of entities overnight.

2. Decision Summary

Simplify the schema lifecycle from four states to three: draft → published → archived. Remove the evaluated state. Schemas remain fully editable while in draft.
Replace schema evaluation with live data preview in the Schema Designer. A toggle lets compliance officers see real entity data flowing through the wire connections in real time, providing immediate visual feedback during design.
Rework the Evaluations page into a dedicated Matrix Evaluation tool. Remove the Schema Runs tab. The page focuses exclusively on running risk matrices against the entity portfolio.
Implement portfolio scoring as the core capability of the reworked Evaluations page. Users select a matrix (draft or published), run it against all entities, and see the resulting risk score distribution.
Add matrix comparison to the evaluation page. Users can select two matrix versions and see score deltas side by side, enabling impact assessment before publishing changes.
Integrate matrix evaluation into the pre-publish workflow. The risk matrix editor links to the evaluation page with the draft pre-selected, encouraging (but not requiring) impact assessment before publishing.

3. Simplified Schema Lifecycle

Before (ADR-010 §4)

┌──────────┐     ┌───────────┐     ┌───────────┐     ┌──────────┐
│  DRAFT   │────▶│ EVALUATED │────▶│ PUBLISHED │────▶│ ARCHIVED │
│          │     │           │     │           │     │          │
│ Mutable  │     │ Validated │     │ Immutable │     │ Frozen   │
│ No data  │     │ Test data │     │ Live data │     │ Replaced │
│ flow     │     │ preview   │     │ flow      │     │ by newer │
└──────────┘     └───────────┘     └───────────┘     └──────────┘

After (ADR-018)

┌──────────┐                       ┌───────────┐     ┌──────────┐
│  DRAFT   │──────────────────────▶│ PUBLISHED │────▶│ ARCHIVED │
│          │                       │           │     │          │
│ Mutable  │                       │ Immutable │     │ Frozen   │
│ Live     │                       │ Live data │     │ Replaced │
│ preview  │                       │ flow      │     │ by newer │
└──────────┘                       └───────────┘     └──────────┘
     ▲                                    │
     │                                    │
     └────── fork (new version) ──────────┘

State Definitions

State	Mutable	Data Flow	Scoring	Description
`draft`	Yes	Live preview (toggle)	No	Work in progress. Fully editable. Compliance officers can toggle live data preview to validate their design at any time. Not used for production scoring.
`published`	No	Live	Yes	Active production schema. At most one published version per `schema_line_id` per tenant. Immutable once published.
`archived`	No	None	Historical only	Superseded by a newer published version. Retained for audit trail.

What Changes

The evaluated status is removed from the ontology_schema_versions.status enum.
The read-only callout ("This schema is evaluated. Fork to make changes.") is removed from the Schema Designer. Only published and archived versions show the read-only callout.
The "Run Evaluation" button is removed from the Schema Designer toolbar.
The isDraft guard in the Schema Designer applies only to published and archived statuses.
Any existing schema versions with status = 'evaluated' are migrated to draft (see §11).

4. Live Data Preview in Schema Designer

Concept

The Schema Designer (port-and-wire view from ADR-016) gains a toggle that activates a live data preview mode. When enabled, the compliance officer selects an entity from the portfolio and sees the actual data values flowing through the wire connections — from source fields, through the ontology layer, into risk matrix factors.

UX

A toggle switch in the Schema Designer toolbar: "Live Preview" (off by default).
When toggled on, a compact entity selector appears (searchable dropdown).
After selecting an entity, source field ports display the actual values from that entity's data, and the values visually propagate through the wires into ontology fields and matrix factors.
Data values appear as inline annotations on or near the ports and wire connections.
If a source has no data for a field, the port shows a "no data" indicator.
If multiple sources disagree on a field, the disagreement is visually highlighted (consistent with ADR-017 disagreement detection).
The preview updates automatically if the officer changes mappings while preview is active.
Toggling off returns the designer to its normal schematic view.

Purpose

This replaces the schema evaluation workflow from ADR-010 §12. Instead of navigating away to run a formal evaluation, the officer gets immediate, contextual feedback. The key questions answered by live preview:

"Does my mapping produce data for this ontology field?"
"Which sources contribute to this field for a real entity?"
"Do sources agree or disagree on this value?"
"Does data flow all the way through to the risk matrix factors?"

Technical Notes

Live preview operates on the current draft schema version. It does not require saving or publishing.
Data is fetched on demand from the existing entity data store (field provenance records).
The preview is read-only — it visualizes existing data, it does not trigger new data collection.
Performance target: preview should load within 2 seconds for a single entity.

5. Evaluations Page Rework

Before

The Evaluations page (/studio/evaluations) had two tabs:

Schema Runs — ran a schema against test entities to validate coverage, conflict rates, and scoring completeness. Controlled the schema lifecycle gate (draft → evaluated).
Scoring Runs — placeholder tab showing "Coming soon. Portfolio-level scoring runs will be available in a future release."

After

The Evaluations page becomes a Matrix Evaluation tool. The sidebar item label stays as "Evaluations" and the route stays as /studio/evaluations.

The page has a single, focused workflow:

Select a risk matrix (any version — draft or published)
Run the matrix against the entity portfolio
View results: per-entity risk scores, score distribution, factor breakdowns
Optionally compare with a second matrix version (see §7)

The Schema Runs tab is removed entirely. Its functionality is replaced by the live data preview in the Schema Designer (§4).

6. Matrix Evaluation

Purpose

Matrix evaluation answers the question: "If this matrix were applied to our entity portfolio right now, what would the risk scores look like?" This is the critical compliance question that was missing from the product.

Workflow

1. Officer navigates to Evaluations page
2. Selects a risk matrix version (draft or published) from a dropdown
3. Optionally filters the portfolio (by entity type, risk tier, date range, etc.)
4. Clicks "Run Evaluation"
5. System scores all entities in the portfolio using the selected matrix
6. Results display:
   a. Score distribution histogram (how many entities in each risk tier)
   b. Sortable entity table with per-entity scores and factor breakdowns
   c. Summary statistics (mean, median, std dev, count by tier)
   d. Outlier highlighting (entities with extreme scores)

Entry Points

Entry Point	Context	Behavior
Sidebar → Evaluations	General use	Page opens with no matrix pre-selected
Risk Matrix Editor → "Evaluate Portfolio" button	Draft matrix	Page opens with the draft matrix pre-selected
Risk Matrix Detail → "Evaluate Portfolio" action	Published matrix	Page opens with the published matrix pre-selected

Technical Notes

Scoring uses the existing runtime scoring flow (ADR-010 §18) — read accepted entity graph, apply matrix factors, compute scores.
Results are persisted in evaluation_runs for audit trail (reusing the existing table with adjusted semantics — see §10).
For large portfolios, evaluation runs asynchronously via Temporal. A progress indicator shows completion percentage.
Results are cacheable — re-running the same matrix version against an unchanged portfolio returns cached results.

7. Matrix Comparison

Purpose

When editing a risk matrix, compliance officers need to understand the impact of their changes before publishing. Matrix comparison runs the portfolio through two matrix versions and shows the deltas.

Workflow

1. On the Evaluations page, officer selects a primary matrix version
2. Clicks "Compare With..." and selects a second matrix version
3. System runs the portfolio through both matrices
4. Results display side by side:
   a. Score delta per entity (new score − old score)
   b. Tier migration summary (e.g., "12 entities moved from Medium to High")
   c. Factor-level breakdown showing which factors contributed to score changes
   d. Entities sorted by absolute delta (biggest movers first)

Use Cases

Pre-publish impact assessment: Compare draft v3 with published v2 to see what changes.
Regulatory scenario modeling: Compare two draft configurations to evaluate different risk appetite approaches.
Audit: Compare the current published matrix with an archived version to document how scoring evolved.

8. Pre-Publish Workflow

Risk Matrix Pre-Publish

The risk matrix editor gains an "Evaluate Before Publishing" link that navigates to the Evaluations page with the draft matrix pre-selected. This encourages (but does not enforce) impact assessment before publishing.

┌──────────────────────┐     ┌──────────────────────────────┐
│ Risk Matrix Editor   │     │ Evaluations (Matrix Eval)    │
│                      │     │                              │
│ Editing draft v3     │     │ Matrix: draft v3 (pre-sel)   │
│                      │     │                              │
│ [Evaluate Portfolio] ─────▶│ [Run Evaluation]             │
│                      │     │ [Compare With v2 published]  │
│ [Publish]            │     │                              │
│                      │     │ Results:                     │
│                      │     │ - Score distribution          │
│                      │     │ - Tier migration from v2     │
│                      │     │ - Entity-level deltas        │
└──────────────────────┘     └──────────────────────────────┘

Portfolio evaluation is a recommended practice, not a hard gate. The Publish button remains independently available. Compliance teams can establish their own governance policy about whether evaluation is required before publishing.

Schema Pre-Publish

Schema publishing does not require formal evaluation. The live data preview (§4) provides continuous validation during design. The compile-time validation from ADR-010 §18 (verifying that all required matrix factors are satisfiable) remains as an automated check at publish time.

The Compliance Studio sidebar remains unchanged from the current structure:

Compliance Studio
├── Data Sources          /studio/sources
├── Schema Designer       /studio/mappings
├── Designer Mockup       /studio/mappings-mockup
├── Risk Matrices         /studio/matrices
├── Risk Categories       /studio/risk-categories
├── Evaluations           /studio/evaluations
└── Workflows             /studio/workflows

The Evaluations menu item stays in place. Only its content changes (from Schema Runs + Scoring Runs tabs to the Matrix Evaluation interface).

10. Data Model Changes

Schema Version Status Enum

-- Before (ADR-010)
-- status: 'draft', 'evaluated', 'published', 'archived'

-- After (ADR-018)
-- status: 'draft', 'published', 'archived'
ALTER TABLE ontology_schema_versions
  DROP CONSTRAINT IF EXISTS ontology_schema_versions_status_check;

ALTER TABLE ontology_schema_versions
  ADD CONSTRAINT ontology_schema_versions_status_check
  CHECK (status IN ('draft', 'published', 'archived'));

Evaluation Runs Table

The existing evaluation_runs table is repurposed for matrix evaluations:

-- Add matrix evaluation fields
ALTER TABLE evaluation_runs
  ADD COLUMN matrix_version_id UUID REFERENCES risk_matrix_versions(id),
  ADD COLUMN comparison_matrix_version_id UUID REFERENCES risk_matrix_versions(id),
  ADD COLUMN evaluation_type TEXT NOT NULL DEFAULT 'matrix_scoring',
  ADD COLUMN entity_count INTEGER,
  ADD COLUMN score_distribution JSONB,
  ADD COLUMN tier_summary JSONB,
  ADD COLUMN comparison_deltas JSONB;

-- evaluation_type: 'matrix_scoring', 'matrix_comparison'
-- score_distribution: { "critical": 12, "high": 45, "medium": 120, "low": 89 }
-- tier_summary: { "mean": 3.2, "median": 3.0, "std_dev": 1.1, "total": 266 }
-- comparison_deltas: [{ "entity_id": "...", "old_score": 2.8, "new_score": 3.5, "delta": 0.7, ... }]

Per-Entity Evaluation Results

CREATE TABLE evaluation_entity_results (
    id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    evaluation_run_id   UUID NOT NULL REFERENCES evaluation_runs(id),
    entity_id           UUID NOT NULL,
    entity_name         TEXT,
    risk_score          NUMERIC(5,2) NOT NULL,
    risk_tier           TEXT NOT NULL,
    factor_scores       JSONB NOT NULL,
    -- For comparison runs
    comparison_score    NUMERIC(5,2),
    comparison_tier     TEXT,
    score_delta         NUMERIC(5,2),
    tier_changed        BOOLEAN DEFAULT FALSE,

    UNIQUE (evaluation_run_id, entity_id)
);

CREATE INDEX idx_eval_entity_results_run
  ON evaluation_entity_results(evaluation_run_id);
CREATE INDEX idx_eval_entity_results_delta
  ON evaluation_entity_results(evaluation_run_id, score_delta DESC NULLS LAST);

11. Migration Path

Phase 1: Schema Lifecycle Simplification

Migrate any ontology_schema_versions rows with status = 'evaluated' to status = 'draft'.
Update the status enum constraint to remove evaluated.
Remove the read-only callout for evaluated status from the Schema Designer.
Remove the evaluated case from statusIntent() helper.
Remove the "Run Evaluation" button from the Schema Designer toolbar.

Phase 2: Schema Runs Removal

Remove the Schema Runs tab from the Evaluations page.
Archive (do not delete) existing schema evaluation run records for audit trail.
Add evaluation_type column to evaluation_runs and backfill existing rows as 'legacy_schema'.

Phase 3: Live Data Preview

Add the preview toggle and entity selector to the Schema Designer toolbar.
Implement data fetching from field provenance for the selected entity.
Render data values as annotations on port-and-wire connections.
Handle edge cases: missing data, multi-source disagreements, unmapped fields.

Phase 4: Matrix Evaluation

Build the Matrix Evaluation UI on the Evaluations page.
Implement the portfolio scoring backend (Temporal workflow for batch scoring).
Add evaluation entry points from Risk Matrix Editor and Detail pages.

Phase 5: Matrix Comparison

Add the "Compare With..." selector to the Evaluations page.
Implement dual-matrix scoring and delta computation.
Build the comparison results view (tier migration, entity deltas, factor breakdowns).

12. Rejected Alternatives

Keep the evaluated status but make it optional

Considered making the evaluated gate optional (a tenant-level setting). Rejected because the gate's value was in ensuring quality, but live data preview provides the same quality assurance with faster feedback. An optional gate adds complexity without clear benefit.

Embed portfolio scoring in the Risk Matrix Editor

Considered building the scoring interface directly into the matrix editor as a tab. Rejected because the evaluation results are substantial (full portfolio breakdown, comparison views) and would clutter the editor. A standalone page also allows evaluation of published matrices, not just drafts being edited.

Hard-gate publish behind portfolio evaluation

Considered requiring a passing portfolio evaluation before a matrix can be published. Rejected because different teams have different governance requirements. Some may want a hard gate, others may not. The product encourages evaluation through UX affordances (the "Evaluate Before Publishing" link) rather than enforcing it. Teams can add their own governance policies on top.

Keep Schema Runs alongside Matrix Evaluation

Considered keeping both Schema Runs and Matrix Evaluation as separate tabs. Rejected because schema evaluation as a standalone operation no longer has a clear use case — live data preview covers the design-time validation, and matrix evaluation covers the scoring validation. Maintaining two evaluation concepts creates confusion.

13. Open Questions

Should matrix evaluation results be shareable? Could be useful for compliance teams to share evaluation reports with auditors or management. Consider adding export (PDF/CSV) capability.
Portfolio filters for evaluation. The exact filtering options (entity type, risk tier, date range, custom segments) need to be defined during implementation.
Caching and invalidation. When should cached evaluation results be invalidated? Options: on any entity data change, on a time-based TTL, or never (always re-run for fresh results).
Live preview performance at scale. For schemas with many sources and complex mappings, live preview may need to lazy-load data per column rather than fetching everything at once.
Temporal workflow configuration. Portfolio scoring for large entity sets will run via Temporal. Need to determine batch sizes, parallelism, and timeout configuration to avoid the 429 issues addressed in the recent worker fixes.

Table of Contents​

1. Context & Problem Statement​

2. Decision Summary​

3. Simplified Schema Lifecycle​

Before (ADR-010 §4)​

After (ADR-018)​

State Definitions​

What Changes​

4. Live Data Preview in Schema Designer​

Concept​

UX​

Purpose​

Technical Notes​

5. Evaluations Page Rework​

Before​

After​

6. Matrix Evaluation​

Purpose​

Workflow​

Entry Points​

Technical Notes​

7. Matrix Comparison​

Purpose​

Workflow​

Use Cases​

8. Pre-Publish Workflow​

Risk Matrix Pre-Publish​

Schema Pre-Publish​

9. Sidebar Navigation Changes​

10. Data Model Changes​

Schema Version Status Enum​

Evaluation Runs Table​

Per-Entity Evaluation Results​

11. Migration Path​

Phase 1: Schema Lifecycle Simplification​

Phase 2: Schema Runs Removal​

Phase 3: Live Data Preview​

Phase 4: Matrix Evaluation​

Phase 5: Matrix Comparison​

12. Rejected Alternatives​

Keep the evaluated status but make it optional​

Embed portfolio scoring in the Risk Matrix Editor​

Hard-gate publish behind portfolio evaluation​

Keep Schema Runs alongside Matrix Evaluation​

13. Open Questions​

Table of Contents

1. Context & Problem Statement

2. Decision Summary

3. Simplified Schema Lifecycle

Before (ADR-010 §4)

After (ADR-018)

State Definitions

What Changes

4. Live Data Preview in Schema Designer

Concept

UX

Purpose

Technical Notes

5. Evaluations Page Rework

Before

After

6. Matrix Evaluation

Purpose

Workflow

Entry Points

Technical Notes

7. Matrix Comparison

Purpose

Workflow

Use Cases

8. Pre-Publish Workflow

Risk Matrix Pre-Publish

Schema Pre-Publish

9. Sidebar Navigation Changes

10. Data Model Changes

Schema Version Status Enum

Evaluation Runs Table

Per-Entity Evaluation Results

11. Migration Path

Phase 1: Schema Lifecycle Simplification

Phase 2: Schema Runs Removal

Phase 3: Live Data Preview

Phase 4: Matrix Evaluation

Phase 5: Matrix Comparison

12. Rejected Alternatives

Keep the evaluated status but make it optional

Embed portfolio scoring in the Risk Matrix Editor

Hard-gate publish behind portfolio evaluation

Keep Schema Runs alongside Matrix Evaluation

13. Open Questions