🤗 Try it interactively

Live demo

Used as teaching material at the Alan Turing Institute Clinical AI Summer School and the clinical student cohort at the University of Southampton.

DINOv2-Small fine-tuned on MedNIST

Companion checkpoint to the Clinical AI 2026 notebook (Chapter 8 — Foundation Models). Linear-probe then partially fine-tune DINOv2-Small on the 6-class MedNIST dataset (64×64 medical tiles upsampled to 224×224).

Intended use

Educational. Radiology fellows and clinical learners load this in a Colab session to inspect a calibrated baseline and its failure modes alongside the curriculum.

This is NOT a clinical decision tool. Do not use for diagnosis, triage, or any patient-facing decision. MedNIST is 64×64 grayscale — real clinical images are out-of-distribution.

Failure gallery — augmentation-induced

This checkpoint reaches 100% in-distribution validation AUC (8 epochs of linear probe + partial unfreeze on a strong DINOv2 backbone — see notebook Ch 8 for the pedagogical reason). The gallery below therefore shows augmentation-induced misclassifications: the model classifies each original val image correctly, then we apply one of eight test-time augmentations (each mapped to a clinical scenario radiologists would recognise) and harvest the cases where the model flips to a confident wrong prediction.

The lesson: in-distribution accuracy and robustness to mild distribution shift are two different properties. Calibration (Ch 6) and explainability (Ch 7) are how you tell which regime you're in.

Test-time augmentations applied (each preserves the underlying anatomy but distorts the input distribution):

  • rot90 — film hung 90° wrong
  • rot180 — inverted patient orientation
  • inversion — DICOM window inverted (negative ↔ positive)
  • gauss_noise — low-SNR portable acquisition
  • gauss_blur — severe motion blur
  • low_contrast — dramatically narrowed window
  • gamma_high — mis-set DICOM gamma
  • center_crop_zoom — wrong field of view (only periphery)

Cells are ordered by descending augmentation confidence — the model was most sure and most wrong. A class that rendered "(robust to all 8 perturbations)" is one DINOv2 happens to defend well; the per-class robustness pattern is itself a teaching point.

failure gallery

Per-class reliability diagrams

Calibration matters more than raw accuracy in clinical contexts:

reliability AbdomenCT reliability BreastMRI reliability CXR reliability ChestCT reliability Hand reliability HeadCT

Metrics (validation split, val_indices.json)

  • Macro AUC: 1.0
  • Accuracy: 1.0
  • Macro ECE: 0.0001
  • val_indices_sha256: 6f241b0ad18f3aee8d91ce1b375b70b15c9c2e99f66d9774aa12c01eccc9aff2

Per class:

  • AbdomenCT — AUC 1.0 | ECE 0.0001
  • BreastMRI — AUC 1.0 | ECE 0.0001
  • CXR — AUC 1.0 | ECE 0.0001
  • ChestCT — AUC 1.0 | ECE 0.0
  • Hand — AUC 1.0 | ECE 0.0
  • HeadCT — AUC 1.0 | ECE 0.0

Training recipe

Mirrors Clinical_AI_2026.ipynb cells 85-88:

  1. timm.create_model("vit_small_patch14_dinov2.lvd142m", pretrained=True, in_chans=1, num_classes=6, img_size=224) — img_size=224 overrides DINOv2's native 518×518 to match the 224×224 ViT loaders the curriculum reuses from Ch 4.
  2. Linear probe (4 epochs, Adam, lr=0.001): freeze every parameter except model.head.
  3. Fine-tune (4 epochs, Adam, lr=1e-05): additionally unfreeze model.blocks[-2:].

Data split: stratified train_test_split (seed 42, 80/10/10). Across torch versions seeds drift; val_indices.json is the invariant.

How to load

from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
import timm

ckpt = hf_hub_download("t22000t/dinov2-small-mednist", "model.safetensors")
model = timm.create_model("vit_small_patch14_dinov2.lvd142m", pretrained=False, in_chans=1, num_classes=6, img_size=224)
model.load_state_dict(load_file(ckpt)); model.eval()

Related work

The same family of explainability techniques shown here ships inside OsiriXgrpc - a clinical AI plugin used at The Royal Marsden Hospital, with the deployment paper OsiriXgrpc: Rapid Development and Deployment of State-of-the-Art AI for Clinical Practice (AAAI 2022, AI2SE Workshop).

References

Downloads last month
22
Safetensors
Model size
21.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for t22000t/dinov2-small-mednist

Finetuned
(25)
this model

Dataset used to train t22000t/dinov2-small-mednist

Space using t22000t/dinov2-small-mednist 1

Paper for t22000t/dinov2-small-mednist