ProCIR — Multi-View Product-Level Composed Image Retrieval

[Paper (arXiv)] | [Code (GitHub)] | [Dataset]

Model Description

ProCIR (0.8B) is a multi-view composed image retrieval model trained on the FashionMV dataset, based on Qwen3.5-0.8B. It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

Performance

Dataset R@5 R@10
DeepFashion 89.2 94.9
Fashion200K 77.6 86.6
FashionGen-val 75.0 85.3
Average 80.6 88.9

Usage

See our GitHub repository for evaluation code and data preparation instructions.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")

Citation

@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}

License

Model weights are released under the same license as the base model (Qwen3.5).

Downloads last month
52
Safetensors
Model size
0.9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yuandaxia/ProCIR

Finetuned
(154)
this model

Dataset used to train yuandaxia/ProCIR

Paper for yuandaxia/ProCIR