{% extends "base.html" %} {% block title %}Evaluation — ICH Screening{% endblock %} {% block content %}

Model Evaluation

Calibration metrics, confidence band analysis, and probability distribution from the inference pipeline.

{% if calib %}

Calibration Parameters

Method{{ calib.get('method', 'N/A') }}
Temperature{{ '%.4f'|format(calib.temperature) }}
Decision Threshold{{ '%.4f'|format(calib.calibrated_threshold) }}
Base Threshold{{ '%.4f'|format(calib.base_threshold) }}
High Band ≥{{ calib.high_threshold }}
Low Band <{{ calib.low_threshold }}

Calibration Quality

ECE (Raw)
{{ '%.4f'|format(calib.raw_ece) }}
ECE (Calibrated)
{{ '%.4f'|format(calib.cal_ece) }}
Brier (Raw)
{{ '%.4f'|format(calib.raw_brier) }}
Brier (Cal)
{{ '%.4f'|format(calib.cal_brier) }}

Temperature scaling adjusts logits by T={{ '%.4f'|format(calib.temperature) }} to produce better-calibrated probabilities. Lower ECE = better calibration.

{% endif %} {% if norm %}

Normalization Statistics

Mean (per channel){{ norm.mean }}
Std (per channel){{ norm.std }}
Computed from{{ norm.get('n_images', 'N/A') }} images
{% endif %}

Confidence Band Analysis

Distribution of {{ total }} processed cases across the three confidence bands.

{% for bnd in ['HIGH', 'MEDIUM', 'LOW'] %} {% set d = band_data.get(bnd, {'total': 0, 'positive': 0, 'negative': 0}) %}
{{ bnd }} {{ d.total }} cases
Positive
{{ d.positive }}
Negative
{{ d.negative }}
{% endfor %}

Calibrated Probability Distribution

Histogram of calibrated probabilities across all cases (10 bins).

{% set max_bin = bins|max if bins|max > 0 else 1 %} {% for count in bins %}
{{ count }}
{{ '%.1f'|format(loop.index0 * 0.1) }}
{% endfor %}

Summary Statistics

Total processed{{ stats.total }}
Positive (flagged){{ stats.positive }}
Negative{{ stats.negative }}
Urgent escalations{{ stats.urgent }}
Average calibrated prob{{ '%.4f'|format(stats.avg_cal_prob) }}
Heatmaps generated{{ stats.heatmaps }}
{% endblock %}