--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: image-segmentation tags: - referring-segmentation - image-segmentation - video-segmentation - vision-language --- # SetCon-8B SetCon-8B is the model checkpoint for **SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction**. [\[📂 GitHub\]](https://github.com/rookiexiong7/SetCon) [\[📄 Paper\]](https://arxiv.org/abs/2605.20110) ## Usage Please use this checkpoint together with the official codebase: ```bash git clone https://github.com/rookiexiong7/SetCon.git cd SetCon uv sync --extra latest source .venv/bin/activate ``` Single-image inference: ``` python demo.py \ --image-path assets/room.jpg \ --query-text "the target objects" \ --model-path path/to/SetCon-8B ``` ## Intended Use This model is intended for research on open-ended referring image/video segmentation. ## Limitations The model may produce incomplete or inaccurate masks for ambiguous expressions, small objects, crowded scenes, or out-of-domain visual concepts. ## Citation If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝 ```bibtex @article{zhang2026setcon, title={SetCon: towards open-ended referring segmentation via set-level concept prediction}, author={Zhixiong Zhang and Yizhuo Li and Shuangrui Ding and Yuhang Zang and Shengyuan Ding and Long Xing and Yibin Wang and Qiaosheng Zhang and Jiaqi Wang}, journal={arXiv preprint arXiv:2605.20110}, year={2026} } ```