File size: 1,544 Bytes
2ebb24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9290b5b
23fd5b1
9290b5b
2ebb24a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e08da01
 
23fd5b1
 
 
 
 
 
 
 
2ebb24a
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
---
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: image-segmentation
tags:
  - referring-segmentation
  - image-segmentation
  - video-segmentation
  - vision-language
---

# SetCon-8B

SetCon-8B is the model checkpoint for **SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction**.

[\[📂 GitHub\]](https://github.com/rookiexiong7/SetCon)
[\[📄 Paper\]](https://arxiv.org/abs/2605.20110)

## Usage

Please use this checkpoint together with the official codebase:

```bash
git clone https://github.com/rookiexiong7/SetCon.git
cd SetCon
uv sync --extra latest
source .venv/bin/activate
```

Single-image inference:
```
python demo.py \
  --image-path assets/room.jpg \
  --query-text "the target objects" \
  --model-path path/to/SetCon-8B
```

## Intended Use

This model is intended for research on open-ended referring image/video segmentation.

## Limitations

The model may produce incomplete or inaccurate masks for ambiguous expressions, small objects, crowded scenes, or out-of-domain visual
concepts.

## Citation
If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

```bibtex
@article{zhang2026setcon,
  title={SetCon: towards open-ended referring segmentation via set-level concept prediction},
  author={Zhixiong Zhang and Yizhuo Li and Shuangrui Ding and Yuhang Zang and Shengyuan Ding and Long Xing and Yibin Wang and Qiaosheng Zhang and Jiaqi Wang},
  journal={arXiv preprint arXiv:2605.20110},
  year={2026}
}
```