Add pipeline tag, library name, and citation to model card
Browse filesHi! I'm Niels, part of the community science team at Hugging Face.
This PR improves the metadata and documentation for Visual-ERM:
- Added `pipeline_tag: image-text-to-text` and `library_name: transformers` to the YAML metadata to improve discoverability and enable the "Use in Transformers" button.
- Updated the **Citation** section with the full BibTeX details from the paper.
- Linked the model to its corresponding [Hugging Face paper page](https://huggingface.co/papers/2603.13224).
These changes help researchers find and credit your work correctly on the Hub.
README.md
CHANGED
|
@@ -1,9 +1,11 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
datasets:
|
| 4 |
-
- internlm/VC-RewardBench
|
| 5 |
base_model:
|
| 6 |
- Qwen/Qwen3-VL-8B-Instruct
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
|
| 9 |
<p align="center">
|
|
@@ -12,7 +14,7 @@ base_model:
|
|
| 12 |
|
| 13 |
# Visual-ERM
|
| 14 |
|
| 15 |
-
Visual-ERM is a **multimodal generative reward model** for **vision-to-code** tasks.
|
| 16 |
It evaluates outputs directly in the **rendered visual space** and produces **fine-grained**, **interpretable**, and **task-agnostic** discrepancy feedback for structured visual reconstruction.
|
| 17 |
|
| 18 |
<p align="center">
|
|
@@ -147,7 +149,12 @@ Visual-ERM is intended for:
|
|
| 147 |
If you find this model useful, please consider citing:
|
| 148 |
|
| 149 |
```bibtex
|
| 150 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
```
|
| 152 |
|
| 153 |
## Contact
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- Qwen/Qwen3-VL-8B-Instruct
|
| 4 |
+
datasets:
|
| 5 |
+
- internlm/VC-RewardBench
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: image-text-to-text
|
| 9 |
---
|
| 10 |
|
| 11 |
<p align="center">
|
|
|
|
| 14 |
|
| 15 |
# Visual-ERM
|
| 16 |
|
| 17 |
+
Visual-ERM is a **multimodal generative reward model** for **vision-to-code** tasks, introduced in the paper [Visual-ERM: Reward Modeling for Visual Equivalence](https://huggingface.co/papers/2603.13224).
|
| 18 |
It evaluates outputs directly in the **rendered visual space** and produces **fine-grained**, **interpretable**, and **task-agnostic** discrepancy feedback for structured visual reconstruction.
|
| 19 |
|
| 20 |
<p align="center">
|
|
|
|
| 149 |
If you find this model useful, please consider citing:
|
| 150 |
|
| 151 |
```bibtex
|
| 152 |
+
@article{liu2026visualerm,
|
| 153 |
+
title={Visual-ERM: Reward Modeling for Visual Equivalence},
|
| 154 |
+
author={Liu, Ziyu and Ding, Shengyuan and Fang, Xinyu and Dai, Xuanlang and Yang, Penghui and Liang, Jianze and Wang, Jiaqi and Chen, Kai and Lin, Dahua and Zang, Yuhang},
|
| 155 |
+
journal={arXiv preprint arXiv:2603.13224},
|
| 156 |
+
year={2026}
|
| 157 |
+
}
|
| 158 |
```
|
| 159 |
|
| 160 |
## Contact
|