rookiexiong
/

SetCon-8B

Image Segmentation

feature-extraction

referring-segmentation

video-segmentation

vision-language

Model card Files Files and versions

SetCon-8B / README.md

rookiexiong's picture

Update README.md

e08da01 verified about 13 hours ago

|

history blame contribute delete

1.54 kB

	---
	license: apache-2.0
	language:
	- en
	library_name: transformers
	pipeline_tag: image-segmentation
	tags:
	- referring-segmentation
	- image-segmentation
	- video-segmentation
	- vision-language
	---

	# SetCon-8B

	SetCon-8B is the model checkpoint for SetCon: Towards Open-Ended Referring Segmentation via Set-Level Concept Prediction.

	[\[📂 GitHub\]](https://github.com/rookiexiong7/SetCon)
	[\[📄 Paper\]](https://arxiv.org/abs/2605.20110)

	## Usage

	Please use this checkpoint together with the official codebase:

	```bash
	git clone https://github.com/rookiexiong7/SetCon.git
	cd SetCon
	uv sync --extra latest
	source .venv/bin/activate
	```

	Single-image inference:
	```
	python demo.py \
	--image-path assets/room.jpg \
	--query-text "the target objects" \
	--model-path path/to/SetCon-8B
	```

	## Intended Use

	This model is intended for research on open-ended referring image/video segmentation.

	## Limitations

	The model may produce incomplete or inaccurate masks for ambiguous expressions, small objects, crowded scenes, or out-of-domain visual
	concepts.

	## Citation
	If you find our work helpful for your research, please consider giving a star ⭐ and citation 📝

	```bibtex
	@article{zhang2026setcon,
	title={SetCon: towards open-ended referring segmentation via set-level concept prediction},
	author={Zhixiong Zhang and Yizhuo Li and Shuangrui Ding and Yuhang Zang and Shengyuan Ding and Long Xing and Yibin Wang and Qiaosheng Zhang and Jiaqi Wang},
	journal={arXiv preprint arXiv:2605.20110},
	year={2026}
	}
	```