File size: 4,613 Bytes
56f558e
 
c1bc69c
 
 
 
 
 
 
 
 
56f558e
c1bc69c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
license: mit
library_name: pytorch
tags:
  - medical-image-segmentation
  - 3d-medical-imaging
  - self-supervised-learning
  - in-context-segmentation
  - pytorch
  - arxiv:2603.13660
pipeline_tag: image-segmentation
---

# MASS Base Checkpoint

This repository hosts `mass_base.pth`, the base checkpoint for **MASS: Learning
Generalizable 3D Medical Image Representations from Mask-Guided
Self-Supervision**.

MASS is a mask-guided self-supervised learning framework for 3D medical images.
The released checkpoint was trained with the data used in our paper and the Iris
in-context segmentation architecture. It uses automatically generated
class-agnostic masks for pretraining and does **not** use expert ground-truth
annotations during pretraining.

## What This Checkpoint Is For

`mass_base.pth` can be used with the official MASS codebase for:

- training-free in-context segmentation with reference image-mask examples;
- initialization for downstream segmentation finetuning;
- frozen-encoder or finetuned encoder classification experiments.

This is a PyTorch checkpoint for the MASS/Iris architecture, not a standalone
Transformers model. Please use it with the code release:

- GitHub: https://github.com/Stanford-AIMI/MASS
- Project page: https://yhygao.github.io/MASS_page/
- Paper: https://arxiv.org/abs/2603.13660

## Download

Using the Hugging Face CLI:

```bash
hf download StanfordAIMI/MASS mass_base.pth --local-dir checkpoints
```

Using Python:

```python
from huggingface_hub import hf_hub_download

checkpoint_path = hf_hub_download("StanfordAIMI/MASS", "mass_base.pth")
```

## Raw NIfTI In-Context Inference

```bash
python inference.py \
  --checkpoint checkpoints/mass_base.pth \
  --test-image /path/to/test_image.nii.gz \
  --reference-image /path/to/reference_image.nii.gz \
  --reference-mask /path/to/reference_mask.nii.gz \
  --output outputs/test_image_seg.nii.gz \
  --gpu 0 \
  --use-ema \
  --modality ct \
  --orientation RAS \
  --target-spacing 1.5 1.5 1.5 \
  --window-size 128 128 128 \
  --overlap 0.5
```

Please make sure the input NIfTI metadata is complete and reliable, especially
orientation and spacing. `mass_base.pth` was trained after standardizing images
to RAS orientation, so using `--orientation RAS` is recommended.

## Downstream Segmentation Finetuning

```bash
python train.py \
  --config config/downstream/segmentation_finetune_example.yaml \
  --gpu 0 \
  --name segmentation_finetune_example \
  --override \
    finetuning.pretrained_checkpoint=checkpoints/mass_base.pth \
    data.train.data_root=/path/to/mass_h5 \
    data.val.data_root=/path/to/mass_h5 \
    data.train.datasets='[example_segmentation]' \
    data.val.datasets='[example_segmentation]'
```

## Classification Linear Probing

```bash
python train.py \
  --config config/downstream/classification_linear_probe_example.yaml \
  --gpu 0 \
  --name classification_linear_probe_example \
  --override \
    classification.encoder.pretrained_checkpoint=checkpoints/mass_base.pth \
    classification.num_classes=2 \
    data.train.data_root=/path/to/classification_data \
    data.val.data_root=/path/to/classification_data \
    data.train.datasets='[example_classification]' \
    data.val.datasets='[example_classification]'
```

## Training Details

- Architecture: Iris in-context segmentation architecture.
- Pretraining objective: MASS mask-guided self-supervised learning.
- Supervision during pretraining: automatically generated class-agnostic masks.
- Expert annotations during pretraining: none.
- Modalities: 3D CT, MRI, and PET volumes used in the MASS paper.

The MASS objective is compatible with other in-context segmentation
architectures. The official codebase includes preprocessing and pretraining
utilities for training MASS on your own data.

## Limitations

- This checkpoint is intended for research use.
- It is not a medical device and should not be used for clinical decision-making.
- Raw NIfTI inference depends on reliable image metadata and preprocessing
  choices. Cases with missing or incorrect spacing/orientation metadata should be
  inspected carefully.
- Task-specific finetuning or validation is recommended before using the model on
  a new dataset or anatomy.

## Citation

```bibtex
@article{gao2026learning,
  title={Learning Generalizable 3D Medical Image Representations from Mask-Guided Self-Supervision},
  author={Gao, Yunhe and Zhang, Yabin and Wang, Chong and Liu, Jiaming and Varma, Maya and Delbrouck, Jean-Benoit and Chaudhari, Akshay and Langlotz, Curtis},
  journal={arXiv preprint arXiv:2603.13660},
  year={2026}
}
```