YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PLUM-13B Pretrained Checkpoint
PLUM-13B Pretrained (0-shot)
This README documents the Hugging Face checkpoint wjdghks950/plum-13b-pretrained for PLUM, the segmentation-enabled LMM introduced in PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding (NeurIPS 2025 Spotlight).
π Paper: arXiv:2505.20759 π½ Dataset: PARTONOMY-Core
π§ Overview
PLUM is a segmentation-enabled LMM designed for part-level visual understanding. It avoids <SEG>-token distribution shift via span tagging and conditions on previous predictions through a feedback loop.
This checkpoint corresponds to a pretrained (zero-shot) PLUM-13B model used as the base for later PARTONOMY fine-tuning.
βοΈ Setup
From the repo root:
cd src/models/PLUM
conda env create -f environment.yml
conda activate partonomy
pip install flash-attn --no-build-isolation
If you plan to train or re-evaluate on the full datasets, follow the dataset and weight setup in the main README.md (PARTONOMY datasets + SAM ViT-H weights).
π¬ Inference (Chat)
Use chat.py with the Hugging Face checkpoint:
cd src/models/PLUM
CUDA_VISIBLE_DEVICES=0 python chat.py \
--version wjdghks950/plum-13b-pretrained \
--precision bf16 \
--use_bidir_bio \
--use_feedback_loop
You will be prompted for:
- A text prompt (e.g., βWhat parts do you see in this airplane?β)
- An image path
Notes:
--use_bidir_bioand--use_feedback_loopmatch the pretrained configuration used in this codebase.- You can enable quantized inference with
--load_in_8bitor--load_in_4bitif needed.
β Evaluation β PARTONOMY
Edit src/models/PLUM/scripts/run_validate_partonomy.sh and set:
MODEL_CKPT_PATHto the HF checkpoint or a local merged model path.
Then run:
cd src/models/PLUM
chmod +x scripts/run_validate_partonomy.sh
./scripts/run_validate_partonomy.sh
π§ͺ Training Data (Pretraining)
The pretrained PLUM-13B checkpoint is trained using the same multi-source recipe described in the main repository README, including:
- Semantic segmentation: ADE20K, COCO-Stuff, Mapillary, PACO-LVIS, PASCAL-Part, COCO Images, PartImageNet
- Referring segmentation: refCOCO, refCOCO+, refCOCOg, refCLEF
- VQA: LLaVA-Instruct-150K
- Reasoning segmentation: ReasonSeg
See README.md for download links and the exact on-disk layout.
π§ Citation
If you use this checkpoint, please cite:
@inproceedings{
blume-kim-2025-partonomy,
title={{PARTONOMY}: Large Multimodal Models with Part-Level Visual Understanding},
author={Ansel Blume and Jeonghwan Kim and Hyeonjeong Ha and Elen Chatikyan and Xiaomeng Jin and Khanh Duy Nguyen and Nanyun Peng and Kai-Wei Chang and Derek Hoiem and Heng Ji},
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
year={2025},
url={https://openreview.net/forum?id=yjLew3Nd7z}
}
License
Copyright 2025
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at:
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.