🤗 Try it interactively

Used as teaching material at the Alan Turing Institute Clinical AI Summer School and the clinical student cohort at the University of Southampton.

DINOv2-Small fine-tuned on MedNIST

Companion checkpoint to the Clinical AI 2026 notebook (Chapter 8 — Foundation Models). Linear-probe then partially fine-tune DINOv2-Small on the 6-class MedNIST dataset (64×64 medical tiles upsampled to 224×224).

Intended use

Educational. Radiology fellows and clinical learners load this in a Colab session to inspect a calibrated baseline and its failure modes alongside the curriculum.

This is NOT a clinical decision tool. Do not use for diagnosis, triage, or any patient-facing decision. MedNIST is 64×64 grayscale — real clinical images are out-of-distribution.

Failure gallery — augmentation-induced

This checkpoint reaches 100% in-distribution validation AUC (8 epochs of linear probe + partial unfreeze on a strong DINOv2 backbone — see notebook Ch 8 for the pedagogical reason). The gallery below therefore shows augmentation-induced misclassifications: the model classifies each original val image correctly, then we apply one of eight test-time augmentations (each mapped to a clinical scenario radiologists would recognise) and harvest the cases where the model flips to a confident wrong prediction.

The lesson: in-distribution accuracy and robustness to mild distribution shift are two different properties. Calibration (Ch 6) and explainability (Ch 7) are how you tell which regime you're in.

Test-time augmentations applied (each preserves the underlying anatomy but distorts the input distribution):

rot90 — film hung 90° wrong
rot180 — inverted patient orientation
inversion — DICOM window inverted (negative ↔ positive)
gauss_noise — low-SNR portable acquisition
gauss_blur — severe motion blur
low_contrast — dramatically narrowed window
gamma_high — mis-set DICOM gamma
center_crop_zoom — wrong field of view (only periphery)

Cells are ordered by descending augmentation confidence — the model was most sure and most wrong. A class that rendered "(robust to all 8 perturbations)" is one DINOv2 happens to defend well; the per-class robustness pattern is itself a teaching point.

Per-class reliability diagrams

Calibration matters more than raw accuracy in clinical contexts:

Metrics (validation split, val_indices.json)

Macro AUC: 1.0
Accuracy: 1.0
Macro ECE: 0.0001
val_indices_sha256: 6f241b0ad18f3aee8d91ce1b375b70b15c9c2e99f66d9774aa12c01eccc9aff2

Per class:

AbdomenCT — AUC 1.0 | ECE 0.0001
BreastMRI — AUC 1.0 | ECE 0.0001
CXR — AUC 1.0 | ECE 0.0001
ChestCT — AUC 1.0 | ECE 0.0
Hand — AUC 1.0 | ECE 0.0
HeadCT — AUC 1.0 | ECE 0.0

Training recipe

Mirrors Clinical_AI_2026.ipynb cells 85-88:

timm.create_model("vit_small_patch14_dinov2.lvd142m", pretrained=True, in_chans=1, num_classes=6, img_size=224) — img_size=224 overrides DINOv2's native 518×518 to match the 224×224 ViT loaders the curriculum reuses from Ch 4.
Linear probe (4 epochs, Adam, lr=0.001): freeze every parameter except model.head.
Fine-tune (4 epochs, Adam, lr=1e-05): additionally unfreeze model.blocks[-2:].

Data split: stratified train_test_split (seed 42, 80/10/10). Across torch versions seeds drift; val_indices.json is the invariant.

How to load

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import timm

ckpt = hf_hub_download("t22000t/dinov2-small-mednist", "model.safetensors")
model = timm.create_model("vit_small_patch14_dinov2.lvd142m", pretrained=False, in_chans=1, num_classes=6, img_size=224)
model.load_state_dict(load_file(ckpt)); model.eval()

Related work

The same family of explainability techniques shown here ships inside OsiriXgrpc - a clinical AI plugin used at The Royal Marsden Hospital, with the deployment paper OsiriXgrpc: Rapid Development and Deployment of State-of-the-Art AI for Clinical Practice (AAAI 2022, AI2SE Workshop).

References

MedMNIST — Yang et al., Scientific Data 2023. https://www.nature.com/articles/s41597-022-01721-8
DINOv2 — Oquab et al. 2023. https://arxiv.org/abs/2304.07193
Curriculum repository: https://github.com/timothy22000/clinical-ai-2026
Live demo (Gradio): https://huggingface.co/spaces/t22000t/clinical-ai-gradcam-demo

Downloads last month: 22

Safetensors

Model size

21.5M params

Tensor type

F32

Model tree for t22000t/dinov2-small-mednist

Base model

facebook/dinov2-small

Finetuned

(25)

this model

Dataset used to train t22000t/dinov2-small-mednist

Space using t22000t/dinov2-small-mednist 1

Paper for t22000t/dinov2-small-mednist

DINOv2: Learning Robust Visual Features without Supervision

Paper • 2304.07193 • Published Apr 14, 2023 • 9