Official implementation for:
Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction
ConvNeXt Masked-Diffusion (CMD): inference & downstream
Deps: Python 3.10+, torch, torchvision, Pillow, numpy, timm, PyYAML
Weights: Put files under weights/ — e.g. weights/CMD-L/pytorch_model.bin, weights/SegHead/best_model.pth, weights/H0-mini/pytorch_model.bin (+ config.json next to H0-mini). H0-mini is not redistributed; get it from bioptimus/H0-mini.
Inference
python infer.py \
--image test.png \
--cmd weights/CMD-L \
--seg-head weights/TNBC_SegHead/best_model.pth \
--pathology weights/H0-mini/pytorch_model.bin \
--output-dir outputs/tnbc
Outputs: outputs/tnbc/test_mask_vis.png, outputs/tnbc/test_overlay.png. --image can also be a folder.
Downstream (fine-tune head)
Edit configs/downstream.yaml (data.json → your manifest), then:
python train_downstream.py --config configs/downstream.yaml
Paths in YAML are relative to configs/ (e.g. ../weights/CMD-L).
Dataset JSON (short): top-level num_classes and data.train / data.val / data.test; each item is { "image_path", "mask_path" }. Paths are absolute or relative to the JSON file’s directory. Masks: single-channel class indices; 255 = ignore. If val is empty or omitted, test is used for validation (but test must exist).
Details: downstream/README.md.
Citation
@misc{chen2026vittokensmaskeddiffusionpretrained,
title={Beyond ViT Tokens: Masked-Diffusion Pretrained Convolutional Pathology Foundation Model for Cell-Level Dense Prediction},
author={Weiming Chen and Xitong Ling and Zhenyang Cai and Xidong Wang and Jiawen Li and Tian Guan and Benyou Wang and Yonghong He},
year={2026},
eprint={2605.08276},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.08276},
}