Safetensors
qwen3_vl
nielsr HF Staff commited on
Commit
a2b507e
·
verified ·
1 Parent(s): 32d0ea5

Add pipeline tag, library name, and citation to model card

Browse files

Hi! I'm Niels, part of the community science team at Hugging Face.

This PR improves the metadata and documentation for Visual-ERM:
- Added `pipeline_tag: image-text-to-text` and `library_name: transformers` to the YAML metadata to improve discoverability and enable the "Use in Transformers" button.
- Updated the **Citation** section with the full BibTeX details from the paper.
- Linked the model to its corresponding [Hugging Face paper page](https://huggingface.co/papers/2603.13224).

These changes help researchers find and credit your work correctly on the Hub.

Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - internlm/VC-RewardBench
5
  base_model:
6
  - Qwen/Qwen3-VL-8B-Instruct
 
 
 
 
 
7
  ---
8
 
9
  <p align="center">
@@ -12,7 +14,7 @@ base_model:
12
 
13
  # Visual-ERM
14
 
15
- Visual-ERM is a **multimodal generative reward model** for **vision-to-code** tasks.
16
  It evaluates outputs directly in the **rendered visual space** and produces **fine-grained**, **interpretable**, and **task-agnostic** discrepancy feedback for structured visual reconstruction.
17
 
18
  <p align="center">
@@ -147,7 +149,12 @@ Visual-ERM is intended for:
147
  If you find this model useful, please consider citing:
148
 
149
  ```bibtex
150
- TBD
 
 
 
 
 
151
  ```
152
 
153
  ## Contact
 
1
  ---
 
 
 
2
  base_model:
3
  - Qwen/Qwen3-VL-8B-Instruct
4
+ datasets:
5
+ - internlm/VC-RewardBench
6
+ license: apache-2.0
7
+ library_name: transformers
8
+ pipeline_tag: image-text-to-text
9
  ---
10
 
11
  <p align="center">
 
14
 
15
  # Visual-ERM
16
 
17
+ Visual-ERM is a **multimodal generative reward model** for **vision-to-code** tasks, introduced in the paper [Visual-ERM: Reward Modeling for Visual Equivalence](https://huggingface.co/papers/2603.13224).
18
  It evaluates outputs directly in the **rendered visual space** and produces **fine-grained**, **interpretable**, and **task-agnostic** discrepancy feedback for structured visual reconstruction.
19
 
20
  <p align="center">
 
149
  If you find this model useful, please consider citing:
150
 
151
  ```bibtex
152
+ @article{liu2026visualerm,
153
+ title={Visual-ERM: Reward Modeling for Visual Equivalence},
154
+ author={Liu, Ziyu and Ding, Shengyuan and Fang, Xinyu and Dai, Xuanlang and Yang, Penghui and Liang, Jianze and Wang, Jiaqi and Chen, Kai and Lin, Dahua and Zang, Yuhang},
155
+ journal={arXiv preprint arXiv:2603.13224},
156
+ year={2026}
157
+ }
158
  ```
159
 
160
  ## Contact