ADR-018: Evaluation Rework — Live Data Preview & Portfolio Scoring
Status: Draft Date: 2026-04-02 Author: Atlas Architecture Supersedes (partially): ADR-010 §4 (Schema Lifecycle), §12 (Evaluations) Depends on: ADR-008 (EBA Risk Matrix Engine), ADR-010 (Ontology Lifecycle), ADR-016 (Port-and-Wire Interaction Model) Impacts: Schema Mapping Designer, Evaluations page, Risk Matrix Editor, Compliance Studio sidebar
Table of Contents
- Context & Problem Statement
- Decision Summary
- Simplified Schema Lifecycle
- Live Data Preview in Schema Designer
- Evaluations Page Rework
- Matrix Evaluation
- Matrix Comparison
- Pre-Publish Workflow
- Sidebar Navigation Changes
- Data Model Changes
- Migration Path
- Rejected Alternatives
- Open Questions
1. Context & Problem Statement
ADR-010 introduced a four-state schema lifecycle (draft → evaluated → published → archived) and a schema evaluation framework that validated coverage, conflict rates, and scoring completeness before a schema could be published. While architecturally sound, the implementation revealed several usability and conceptual problems:
1. The "evaluated" status creates unnecessary friction. Once a schema passes evaluation, it becomes semi-locked ("minor edits" only). Any significant change requires re-running evaluation, creating a gatekeeping loop that slows down iterative schema design. Compliance officers reported that the lock-then-fork pattern felt punitive during the exploratory phase of schema design.
2. Schema evaluation conflates design-time validation with runtime scoring. The Evaluations page currently has two tabs: "Schema Runs" (which evaluates a schema against test entities) and "Scoring Runs" (placeholder for portfolio scoring). These are fundamentally different operations with different audiences and different triggers. Combining them under one menu item created confusion about what "evaluation" means.
3. Schema evaluation is disconnected from the design experience. Running a schema evaluation requires navigating away from the Schema Designer to the Evaluations page, selecting the schema version, configuring a test suite, and running the evaluation. The feedback loop is too long. Designers need to see data flowing through their mappings while they are designing, not after.
4. Portfolio scoring was never implemented. The Scoring Runs tab was always a placeholder ("coming soon"). Meanwhile, the most critical evaluation question — "what happens to my portfolio's risk scores if I publish this matrix?" — has no answer in the product.
5. Risk matrix changes have no impact assessment. A compliance officer can publish a new risk matrix version with no visibility into how it changes scores across the portfolio. This is a significant compliance risk — a misconfigured matrix could reclassify hundreds of entities overnight.
2. Decision Summary
-
Simplify the schema lifecycle from four states to three:
draft → published → archived. Remove theevaluatedstate. Schemas remain fully editable while in draft. -
Replace schema evaluation with live data preview in the Schema Designer. A toggle lets compliance officers see real entity data flowing through the wire connections in real time, providing immediate visual feedback during design.
-
Rework the Evaluations page into a dedicated Matrix Evaluation tool. Remove the Schema Runs tab. The page focuses exclusively on running risk matrices against the entity portfolio.
-
Implement portfolio scoring as the core capability of the reworked Evaluations page. Users select a matrix (draft or published), run it against all entities, and see the resulting risk score distribution.
-
Add matrix comparison to the evaluation page. Users can select two matrix versions and see score deltas side by side, enabling impact assessment before publishing changes.
-
Integrate matrix evaluation into the pre-publish workflow. The risk matrix editor links to the evaluation page with the draft pre-selected, encouraging (but not requiring) impact assessment before publishing.
3. Simplified Schema Lifecycle
Before (ADR-010 §4)
┌──────────┐ ┌───────────┐ ┌───────────┐ ┌──────────┐
│ DRAFT │────▶│ EVALUATED │────▶│ PUBLISHED │────▶│ ARCHIVED │
│ │ │ │ │ │ │ │
│ Mutable │ │ Validated │ │ Immutable │ │ Frozen │
│ No data │ │ Test data │ │ Live data │ │ Replaced │
│ flow │ │ preview │ │ flow │ │ by newer │
└──────────┘ └───────────┘ └───────────┘ └────── ────┘
After (ADR-018)
┌──────────┐ ┌───────────┐ ┌──────────┐
│ DRAFT │──────────────────────▶│ PUBLISHED │────▶│ ARCHIVED │
│ │ │ │ │ │
│ Mutable │ │ Immutable │ │ Frozen │
│ Live │ │ Live data │ │ Replaced │
│ preview │ │ flow │ │ by newer │
└──────────┘ └───────────┘ └──────────┘
▲ │
│ │
└────── fork (new version) ──────────┘
State Definitions
| State | Mutable | Data Flow | Scoring | Description |
|---|---|---|---|---|
draft | Yes | Live preview (toggle) | No | Work in progress. Fully editable. Compliance officers can toggle live data preview to validate their design at any time. Not used for production scoring. |
published | No | Live | Yes | Active production schema. At most one published version per schema_line_id per tenant. Immutable once published. |
archived | No | None | Historical only | Superseded by a newer published version. Retained for audit trail. |
What Changes
- The
evaluatedstatus is removed from theontology_schema_versions.statusenum. - The read-only callout ("This schema is evaluated. Fork to make changes.") is removed from the Schema Designer. Only
publishedandarchivedversions show the read-only callout. - The "Run Evaluation" button is removed from the Schema Designer toolbar.
- The
isDraftguard in the Schema Designer applies only topublishedandarchivedstatuses. - Any existing schema versions with
status = 'evaluated'are migrated todraft(see §11).
4. Live Data Preview in Schema Designer
Concept
The Schema Designer (port-and-wire view from ADR-016) gains a toggle that activates a live data preview mode. When enabled, the compliance officer selects an entity from the portfolio and sees the actual data values flowing through the wire connections — from source fields, through the ontology layer, into risk matrix factors.
UX
- A toggle switch in the Schema Designer toolbar: "Live Preview" (off by default).
- When toggled on, a compact entity selector appears (searchable dropdown).
- After selecting an entity, source field ports display the actual values from that entity's data, and the values visually propagate through the wires into ontology fields and matrix factors.
- Data values appear as inline annotations on or near the ports and wire connections.
- If a source has no data for a field, the port shows a "no data" indicator.
- If multiple sources disagree on a field, the disagreement is visually highlighted (consistent with ADR-017 disagreement detection).
- The preview updates automatically if the officer changes mappings while preview is active.
- Toggling off returns the designer to its normal schematic view.
Purpose
This replaces the schema evaluation workflow from ADR-010 §12. Instead of navigating away to run a formal evaluation, the officer gets immediate, contextual feedback. The key questions answered by live preview:
- "Does my mapping produce data for this ontology field?"
- "Which sources contribute to this field for a real entity?"
- "Do sources agree or disagree on this value?"
- "Does data flow all the way through to the risk matrix factors?"
Technical Notes
- Live preview operates on the current draft schema version. It does not require saving or publishing.
- Data is fetched on demand from the existing entity data store (field provenance records).
- The preview is read-only — it visualizes existing data, it does not trigger new data collection.
- Performance target: preview should load within 2 seconds for a single entity.
5. Evaluations Page Rework
Before
The Evaluations page (/studio/evaluations) had two tabs:
- Schema Runs — ran a schema against test entities to validate coverage, conflict rates, and scoring completeness. Controlled the schema lifecycle gate (draft → evaluated).
- Scoring Runs — placeholder tab showing "Coming soon. Portfolio-level scoring runs will be available in a future release."
After
The Evaluations page becomes a Matrix Evaluation tool. The sidebar item label stays as "Evaluations" and the route stays as /studio/evaluations.
The page has a single, focused workflow:
- Select a risk matrix (any version — draft or published)
- Run the matrix against the entity portfolio
- View results: per-entity risk scores, score distribution, factor breakdowns
- Optionally compare with a second matrix version (see §7)
The Schema Runs tab is removed entirely. Its functionality is replaced by the live data preview in the Schema Designer (§4).
6. Matrix Evaluation
Purpose
Matrix evaluation answers the question: "If this matrix were applied to our entity portfolio right now, what would the risk scores look like?" This is the critical compliance question that was missing from the product.
Workflow
1. Officer navigates to Evaluations page
2. Selects a risk matrix version (draft or published) from a dropdown
3. Optionally filters the portfolio (by entity type, risk tier, date range, etc.)
4. Clicks "Run Evaluation"
5. System scores all entities in the portfolio using the selected matrix
6. Results display:
a. Score distribution histogram (how many entities in each risk tier)
b. Sortable entity table with per-entity scores and factor breakdowns
c. Summary statistics (mean, median, std dev, count by tier)
d. Outlier highlighting (entities with extreme scores)
Entry Points
| Entry Point | Context | Behavior |
|---|---|---|
| Sidebar → Evaluations | General use | Page opens with no matrix pre-selected |
| Risk Matrix Editor → "Evaluate Portfolio" button | Draft matrix | Page opens with the draft matrix pre-selected |
| Risk Matrix Detail → "Evaluate Portfolio" action | Published matrix | Page opens with the published matrix pre-selected |
Technical Notes
- Scoring uses the existing runtime scoring flow (ADR-010 §18) — read accepted entity graph, apply matrix factors, compute scores.
- Results are persisted in
evaluation_runsfor audit trail (reusing the existing table with adjusted semantics — see §10). - For large portfolios, evaluation runs asynchronously via Temporal. A progress indicator shows completion percentage.
- Results are cacheable — re-running the same matrix version against an unchanged portfolio returns cached results.
7. Matrix Comparison
Purpose
When editing a risk matrix, compliance officers need to understand the impact of their changes before publishing. Matrix comparison runs the portfolio through two matrix versions and shows the deltas.
Workflow
1. On the Evaluations page, officer selects a primary matrix version
2. Clicks "Compare With..." and selects a second matrix version
3. System runs the portfolio through both matrices
4. Results display side by side:
a. Score delta per entity (new score − old score)
b. Tier migration summary (e.g., "12 entities moved from Medium to High")
c. Factor-level breakdown showing which factors contributed to score changes
d. Entities sorted by absolute delta (biggest movers first)
Use Cases
- Pre-publish impact assessment: Compare draft v3 with published v2 to see what changes.
- Regulatory scenario modeling: Compare two draft configurations to evaluate different risk appetite approaches.
- Audit: Compare the current published matrix with an archived version to document how scoring evolved.
8. Pre-Publish Workflow
Risk Matrix Pre-Publish
The risk matrix editor gains an "Evaluate Before Publishing" link that navigates to the Evaluations page with the draft matrix pre-selected. This encourages (but does not enforce) impact assessment before publishing.
┌──────────────────────┐ ┌──────────────────────────────┐
│ Risk Matrix Editor │ │ Evaluations (Matrix Eval) │
│ │ │ │
│ Editing draft v3 │ │ Matrix: draft v3 (pre-sel) │
│ │ │ │
│ [Evaluate Portfolio] ─────▶│ [Run Evaluation] │
│ │ │ [Compare With v2 published] │
│ [Publish] │ │ │
│ │ │ Results: │
│ │ │ - Score distribution │
│ │ │ - Tier migration from v2 │
│ │ │ - Entity-level deltas │
└──────────────────────┘ └──────────────────────────────┘
Portfolio evaluation is a recommended practice, not a hard gate. The Publish button remains independently available. Compliance teams can establish their own governance policy about whether evaluation is required before publishing.
Schema Pre-Publish
Schema publishing does not require formal evaluation. The live data preview (§4) provides continuous validation during design. The compile-time validation from ADR-010 §18 (verifying that all required matrix factors are satisfiable) remains as an automated check at publish time.
9. Sidebar Navigation Changes
The Compliance Studio sidebar remains unchanged from the current structure:
Compliance Studio
├── Data Sources /studio/sources
├── Schema Designer /studio/mappings
├── Designer Mockup /studio/mappings-mockup
├── Risk Matrices /studio/matrices
├── Risk Categories /studio/risk-categories
├── Evaluations /studio/evaluations
└── Workflows /studio/workflows
The Evaluations menu item stays in place. Only its content changes (from Schema Runs + Scoring Runs tabs to the Matrix Evaluation interface).
10. Data Model Changes
Schema Version Status Enum
-- Before (ADR-010)
-- status: 'draft', 'evaluated', 'published', 'archived'
-- After (ADR-018)
-- status: 'draft', 'published', 'archived'
ALTER TABLE ontology_schema_versions
DROP CONSTRAINT IF EXISTS ontology_schema_versions_status_check;
ALTER TABLE ontology_schema_versions
ADD CONSTRAINT ontology_schema_versions_status_check
CHECK (status IN ('draft', 'published', 'archived'));
Evaluation Runs Table
The existing evaluation_runs table is repurposed for matrix evaluations:
-- Add matrix evaluation fields
ALTER TABLE evaluation_runs
ADD COLUMN matrix_version_id UUID REFERENCES risk_matrix_versions(id),
ADD COLUMN comparison_matrix_version_id UUID REFERENCES risk_matrix_versions(id),
ADD COLUMN evaluation_type TEXT NOT NULL DEFAULT 'matrix_scoring',
ADD COLUMN entity_count INTEGER,
ADD COLUMN score_distribution JSONB,
ADD COLUMN tier_summary JSONB,
ADD COLUMN comparison_deltas JSONB;
-- evaluation_type: 'matrix_scoring', 'matrix_comparison'
-- score_distribution: { "critical": 12, "high": 45, "medium": 120, "low": 89 }
-- tier_summary: { "mean": 3.2, "median": 3.0, "std_dev": 1.1, "total": 266 }
-- comparison_deltas: [{ "entity_id": "...", "old_score": 2.8, "new_score": 3.5, "delta": 0.7, ... }]
Per-Entity Evaluation Results
CREATE TABLE evaluation_entity_results (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
evaluation_run_id UUID NOT NULL REFERENCES evaluation_runs(id),
entity_id UUID NOT NULL,
entity_name TEXT,
risk_score NUMERIC(5,2) NOT NULL,
risk_tier TEXT NOT NULL,
factor_scores JSONB NOT NULL,
-- For comparison runs
comparison_score NUMERIC(5,2),
comparison_tier TEXT,
score_delta NUMERIC(5,2),
tier_changed BOOLEAN DEFAULT FALSE,
UNIQUE (evaluation_run_id, entity_id)
);
CREATE INDEX idx_eval_entity_results_run
ON evaluation_entity_results(evaluation_run_id);
CREATE INDEX idx_eval_entity_results_delta
ON evaluation_entity_results(evaluation_run_id, score_delta DESC NULLS LAST);
11. Migration Path
Phase 1: Schema Lifecycle Simplification
- Migrate any
ontology_schema_versionsrows withstatus = 'evaluated'tostatus = 'draft'. - Update the status enum constraint to remove
evaluated. - Remove the read-only callout for
evaluatedstatus from the Schema Designer. - Remove the
evaluatedcase fromstatusIntent()helper. - Remove the "Run Evaluation" button from the Schema Designer toolbar.
Phase 2: Schema Runs Removal
- Remove the Schema Runs tab from the Evaluations page.
- Archive (do not delete) existing schema evaluation run records for audit trail.
- Add
evaluation_typecolumn toevaluation_runsand backfill existing rows as'legacy_schema'.
Phase 3: Live Data Preview
- Add the preview toggle and entity selector to the Schema Designer toolbar.
- Implement data fetching from field provenance for the selected entity.
- Render data values as annotations on port-and-wire connections.
- Handle edge cases: missing data, multi-source disagreements, unmapped fields.
Phase 4: Matrix Evaluation
- Build the Matrix Evaluation UI on the Evaluations page.
- Implement the portfolio scoring backend (Temporal workflow for batch scoring).
- Add evaluation entry points from Risk Matrix Editor and Detail pages.
Phase 5: Matrix Comparison
- Add the "Compare With..." selector to the Evaluations page.
- Implement dual-matrix scoring and delta computation.
- Build the comparison results view (tier migration, entity deltas, factor breakdowns).
12. Rejected Alternatives
Keep the evaluated status but make it optional
Considered making the evaluated gate optional (a tenant-level setting). Rejected because the gate's value was in ensuring quality, but live data preview provides the same quality assurance with faster feedback. An optional gate adds complexity without clear benefit.
Embed portfolio scoring in the Risk Matrix Editor
Considered building the scoring interface directly into the matrix editor as a tab. Rejected because the evaluation results are substantial (full portfolio breakdown, comparison views) and would clutter the editor. A standalone page also allows evaluation of published matrices, not just drafts being edited.
Hard-gate publish behind portfolio evaluation
Considered requiring a passing portfolio evaluation before a matrix can be published. Rejected because different teams have different governance requirements. Some may want a hard gate, others may not. The product encourages evaluation through UX affordances (the "Evaluate Before Publishing" link) rather than enforcing it. Teams can add their own governance policies on top.
Keep Schema Runs alongside Matrix Evaluation
Considered keeping both Schema Runs and Matrix Evaluation as separate tabs. Rejected because schema evaluation as a standalone operation no longer has a clear use case — live data preview covers the design-time validation, and matrix evaluation covers the scoring validation. Maintaining two evaluation concepts creates confusion.
13. Open Questions
-
Should matrix evaluation results be shareable? Could be useful for compliance teams to share evaluation reports with auditors or management. Consider adding export (PDF/CSV) capability.
-
Portfolio filters for evaluation. The exact filtering options (entity type, risk tier, date range, custom segments) need to be defined during implementation.
-
Caching and invalidation. When should cached evaluation results be invalidated? Options: on any entity data change, on a time-based TTL, or never (always re-run for fresh results).
-
Live preview performance at scale. For schemas with many sources and complex mappings, live preview may need to lazy-load data per column rather than fetching everything at once.
-
Temporal workflow configuration. Portfolio scoring for large entity sets will run via Temporal. Need to determine batch sizes, parallelism, and timeout configuration to avoid the 429 issues addressed in the recent worker fixes.