ProCIR — Multi-View Product-Level Composed Image Retrieval

[Paper (arXiv)] | [Code (GitHub)] | [Dataset]

Model Description

ProCIR (0.8B) is a multi-view composed image retrieval model trained on the FashionMV dataset, based on Qwen3.5-0.8B. It adopts a perception-reasoning decoupled dialogue architecture and leverages image-text alignment to inject product knowledge, enabling effective multi-view product-level CIR.

Performance

Dataset	R@5	R@10
DeepFashion	89.2	94.9
Fashion200K	77.6	86.6
FashionGen-val	75.0	85.3
Average	80.6	88.9

Usage

See our GitHub repository for evaluation code and data preparation instructions.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration

processor = AutoProcessor.from_pretrained("yuandaxia/ProCIR")
model = Qwen3_5ForConditionalGeneration.from_pretrained("yuandaxia/ProCIR", torch_dtype="bfloat16")

Citation

@article{yuan2026fashionmv,
  title={FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data},
  author={Yuan, Peng and Mei, Bingyin and Zhang, Hui},
  year={2026}
}

License

Model weights are released under the same license as the base model (Qwen3.5).

Downloads last month: 52

Safetensors

Model size

0.9B params

Tensor type

BF16

Model tree for yuandaxia/ProCIR

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

(154)

this model

Dataset used to train yuandaxia/ProCIR

Paper for yuandaxia/ProCIR

FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

Paper • 2604.10297 • Published 4 days ago