Eye MedSigLIP Linear Probe

Summary

This repository provides an eye (conjunctiva) anemia classifier built on a frozen MedSigLIP vision encoder with a lightweight linear probe (logistic regression converted to a torch linear head). The model outputs a triage score and a binary prediction (Anemia vs Non-Anemia).

This work follows Google’s recommended data‑efficient workflow: freeze the vision encoder, extract embeddings, and train a lightweight linear classifier on top. The linear probe is trained with scikit‑learn LogisticRegression using the saga solver (data‑efficient and scalable), then converted into a torch linear head for deployment. The training code also handles image decoding failures (Pillow -> OpenCV fallback), optional preprocessing toggles, and supports threshold calibration for high‑recall deployment.

Model Artifacts

The repository includes the full deployable bundle:

artifacts/vision_model/ (MedSigLIP vision encoder weights + config)
artifacts/linear_head.pt (linear probe head)
artifacts/scaler.joblib (mean/std for embedding standardization)
artifacts/config.json (threshold + preprocessing flags)
artifacts/optimized/ (color constancy utilities)

Default deployment threshold (from artifacts/config.json): 0.31.

Dataset (Counts Only)

Primary dataset folder used in this project:

Dataset/dataset anemia/ with class folders Anemia and Non-Anemia.

Full Cleaned Dataset (Image-Level)

Train: 686 images (Anemia: 280, Non-Anemia: 406)
Test: 172 images (Anemia: 82, Non-Anemia: 90)
Image decoding: train opencv_rescued 157, test opencv_rescued 30

Full-Dataset Linear Probe (Latest Run)

Train/Val/Test counts: 9283 / 1160 / 1162
Class balance:
- Train: Anemia 4730, Non-Anemia 4553
- Val: Anemia 591, Non-Anemia 569
- Test: Anemia 592, Non-Anemia 570

Segmented-Only Optimized Split (CP-AnemiC + segmented)

Train/Val/Test: 800 / 100 / 100
Class balance:
- Train: Anemia 400, Non-Anemia 400
- Val: Anemia 50, Non-Anemia 50
- Test: Anemia 50, Non-Anemia 50

Preprocessing

Resize to 448x448 using letterbox padding (no stretching) unless stated otherwise.
RGB conversion and normalization to [-1, 1].
OpenCV fallback for decoding certain PNGs.

Training Code Notes

Embedding extraction: frozen MedSigLIP vision encoder (no fine‑tuning).
Classifier: scikit‑learn LogisticRegression with solver=\"saga\".
Standardization: embedding mean/std saved to scaler.joblib.
Deployment conversion: learned coefficients are converted to a torch Linear head and saved as linear_head.pt.

Core Training Code (Key Steps)

# 1) Extract frozen MedSigLIP embeddings
vision_model.eval()
with torch.no_grad():
    outputs = vision_model(pixel_values=tensor)
    embeds = outputs.pooler_output
    embeds = embeds / embeds.norm(p=2, dim=-1, keepdim=True)

# 2) Standardize embeddings (mean/std from train set)
x_std = (x - scaler["mean"]) / scaler["std"]

# 3) Train linear probe (logistic regression, saga)
model = LogisticRegression(
    solver="saga",
    max_iter=5000,
    C=best_c,
    n_jobs=-1,
)
model.fit(x_train_std, y_train)

# 4) Convert to torch Linear head for deployment
linear_head = torch.nn.Linear(vision_model.config.hidden_size, 1, bias=True)
linear_head.weight.data.copy_(torch.from_numpy(model.coef_))
linear_head.bias.data.copy_(torch.from_numpy(model.intercept_))
torch.save(linear_head.state_dict(), "linear_head.pt")

Benchmarks and Experiments

All metrics below are reported on held-out test sets.

Comparison Table (Test Metrics)

Experiment	Accuracy	Precision	Recall	F1	ROC-AUC	Confusion Matrix	Threshold
Zero-shot baseline (subject-level)	0.487654	0.490446	0.962500	0.649789	0.562348	[[2, 80], [3, 77]]	n/a
Linear probe (before OpenCV fallback)	0.558333	0.500000	0.962264	0.658065	0.692763	[[16, 51], [2, 51]]	n/a
Linear probe (after OpenCV fallback)	0.598765	0.556391	0.925000	0.694836	0.652896	[[23, 59], [6, 74]]	n/a
Linear probe (full cleaned dataset)	0.715116	0.714286	0.670732	0.691824	0.783740	[[68, 22], [27, 55]]	n/a
Linear probe (TF resize + SAGA)	0.686047	0.694444	0.609756	0.649351	0.812195	[[68, 22], [32, 50]]	0.1475936 (ROC best)
Linear probe (Auto-PIL + Torch)	0.726744	0.733333	0.670732	0.700637	0.784688	[[70, 20], [27, 55]]	0.5850275 (ROC best)
Optimized segmented split (recall target 0.90)	0.710000	0.652174	0.900000	0.756303	0.836800	[[26, 24], [5, 45]]	0.05
Full-dataset linear probe (latest, recall target 0.90)	0.823580	0.775249	0.920608	0.841699	0.912441	[[412, 158], [47, 545]]	0.31

Zero-Shot Baseline (Official-Style Prompting, No Training)

Examples evaluated: 162 (opencv_rescued 42)
Accuracy: 0.487654
Precision: 0.490446
Recall: 0.962500
F1: 0.649789
ROC-AUC: 0.562348
Confusion Matrix: [[2, 80], [3, 77]]

Linear Probe (Before OpenCV Fallback)

Train/Test usable: 555 / 120 (skipped 145 / 42)
Accuracy: 0.558333
Precision: 0.500000
Recall: 0.962264
F1: 0.658065
ROC-AUC: 0.692763
Confusion Matrix: [[16, 51], [2, 51]]

Linear Probe (After OpenCV Fallback)

Train/Test: 700 / 162 (opencv_rescued 145 / 42, skipped 0)
Accuracy: 0.598765
Precision: 0.556391
Recall: 0.925000
F1: 0.694836
ROC-AUC: 0.652896
Confusion Matrix: [[23, 59], [6, 74]]

Linear Probe (Full Cleaned Dataset)

Train/Test: 686 / 172 (opencv_rescued 157 / 30)
Accuracy: 0.715116
Precision: 0.714286
Recall: 0.670732
F1: 0.691824
ROC-AUC: 0.783740
Confusion Matrix: [[68, 22], [27, 55]]
Parameters: batch_size=16, epochs=400, lr=0.01, weight_decay=0.0001

Linear Probe (TF Resize + SAGA)

Train/Test: 686 / 172
Accuracy: 0.686047
Precision: 0.694444
Recall: 0.609756
F1: 0.649351
ROC-AUC: 0.812195
Confusion Matrix: [[68, 22], [32, 50]]
ROC best threshold (Youden J): 0.1475936 (TPR 0.829268, FPR 0.333333)

Linear Probe (Auto-PIL + Torch)

Train/Test: 686 / 172
Accuracy: 0.726744
Precision: 0.733333
Recall: 0.670732
F1: 0.700637
ROC-AUC: 0.784688
Confusion Matrix: [[70, 20], [27, 55]]
ROC best threshold (Youden J): 0.5850275 (TPR 0.670732, FPR 0.200000)

Optimized Split + Grid Search + Recall Calibration (Segmented-Only)

Best C (grid): 3
Val AUC: 0.836400
Test AUC: 0.836800
Recall target: 0.90
Threshold (val-calibrated): 0.05
Metrics at threshold:
- Accuracy 0.71
- Precision 0.652174
- Recall 0.90
- F1 0.756303
- Confusion Matrix [[26, 24], [5, 45]]
Best-F1 threshold (val) applied to test:
- Threshold 0.23
- Accuracy 0.76
- Precision 0.716667
- Recall 0.86
- F1 0.781818
- Confusion Matrix [[33, 17], [7, 43]]
Recall-target sweep best by F1 at target 0.91:
- Threshold 0.04
- Accuracy 0.72
- Precision 0.657143
- Recall 0.92
- F1 0.766667
- Confusion Matrix [[26, 24], [4, 46]]
ROC best threshold (Youden J): 0.505267 (TPR 0.78, FPR 0.18)

Full-Dataset Linear Probe (Latest Run)

Best C (grid): 1
Val AUC: 0.909260
Test AUC: 0.912441
Recall target: 0.90
Threshold (val-calibrated): 0.31
Metrics at threshold:
- Accuracy 0.823580
- Precision 0.775249
- Recall 0.920608
- F1 0.841699
- Confusion Matrix [[412, 158], [47, 545]]
Best-F1 threshold (val) applied to test:
- Threshold 0.44
- Accuracy 0.840792
- Precision 0.823529
- Recall 0.875000
- F1 0.848485
- Confusion Matrix [[459, 111], [74, 518]]
Recall-target sweep best by F1 at target 0.90:
- Threshold 0.31
- Accuracy 0.812931
- Precision 0.769452
- Recall 0.903553
- F1 0.831128
- Confusion Matrix [[409, 160], [57, 534]]
ROC best threshold (Youden J): 0.594325 (TPR 0.795262, FPR 0.114236)

Comparison Summary

Zero-shot baseline shows very high recall but poor specificity and lower overall accuracy and AUC.
Linear probe improves accuracy and AUC consistently once OpenCV fallback and cleaned datasets are used.
Full-dataset training provides the strongest overall performance and better calibrated operating points.

Final Decision

The deployed eye model uses the full-dataset linear probe with letterbox resize and recall-target calibration. The deployment threshold is 0.31, which balances high recall with improved precision and overall accuracy.

Limitations

For research and triage only; not for clinical diagnosis.
Performance depends on dataset distribution and capture conditions.
Conjunctiva imaging conditions may vary in real-world settings.

Contact

Model author: Sidharth (Hugging Face: Sidharth1743).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support