Lorangan commited on
Commit
ae18106
Β·
verified Β·
1 Parent(s): 358179b

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -34,3 +34,9 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ assets/attention_visualization.png filter=lfs diff=lfs merge=lfs -text
38
+ assets/logo.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/merbench_category_breakdown.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/omnigen2_rl_results.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/performance_table.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/rl_training_curves.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,178 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-VL-8B-Instruct
4
+ tags:
5
+ - reward-model
6
+ - image-editing
7
+ - reinforcement-learning
8
+ - spatial-reasoning
9
+ - vision-language-model
10
+ - icml2026
11
+ datasets:
12
+ - SpatialReward/SpatialReward-Train
13
+ pipeline_tag: image-text-to-text
14
+ language:
15
+ - en
16
+ ---
17
+
18
+ <p align="center">
19
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/logo.png" width="65%">
20
+ </p>
21
+
22
+ <p align="center">
23
+ <a href="https://lorangan-ddup.github.io/SpatialReward/"><img src="https://img.shields.io/badge/Project%20Page-SpatialReward-yellow" alt="project page"></a>
24
+ <a href="https://arxiv.org/abs/2602.07458"><img src="https://img.shields.io/badge/arXiv-2602.07458-b31b1b.svg" alt="arxiv"></a>
25
+ <a href="https://github.com/lorangan-ddup/SpatialReward"><img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="github"></a>
26
+ <a href="https://huggingface.co/SpatialReward/SpatialReward-8B"><img src="https://img.shields.io/badge/Model-SpatialReward--8B-πŸ€—-yellow" alt="model"></a>
27
+ <a href="https://huggingface.co/datasets/SpatialReward/MER-Bench"><img src="https://img.shields.io/badge/MER--Bench-πŸ€—-yellow" alt="dataset"></a>
28
+ <a href="https://huggingface.co/datasets/SpatialReward/SpatialReward-Train"><img src="https://img.shields.io/badge/Training--Data-πŸ€—-yellow" alt="dataset"></a>
29
+ </p>
30
+
31
+ <h4 align="center">
32
+ <p>
33
+ <a href=#-news>News</a> |
34
+ <a href=#-introduction>Introduction</a> |
35
+ <a href=#-quick-start>Quick Start</a> |
36
+ <a href=#-benchmark-evaluation>Benchmark Evaluation</a> |
37
+ <a href=#️-citing-us>Citation</a>
38
+ </p>
39
+ </h4>
40
+
41
+ **SpatialReward** is a state-of-the-art reward model for instruction-guided image editing that addresses the critical "Attention Collapse" problem through explicit spatial reasoning. By anchoring semantic judgments to predicted edit regions via bounding boxes, SpatialReward achieves unprecedented accuracy and reliability as both an evaluator and RL training signal.
42
+
43
+ <p align="center">
44
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/attention_visualization.png" width="95%">
45
+ <br>
46
+ <em>Visualizing the Attention Collapse problem vs. SpatialReward's spatial grounding.</em>
47
+ </p>
48
+
49
+ ## πŸ”₯ News
50
+
51
+ - **2026-05-05**: πŸŽ‰ We have open-sourced the **SpatialReward-8B** model weights, **[MER-Bench](https://huggingface.co/datasets/SpatialReward/MER-Bench)** benchmark, and **[SpatialReward-Train](https://huggingface.co/datasets/SpatialReward/SpatialReward-Train)** (260k spatial-aware training data)!
52
+ - **2026-05-01**: πŸŽ‰ **SpatialReward** has been accepted to **ICML 2026**!
53
+ - **2026-02-12**: We have released the **inference code**, **reward server**, and **training configurations**!
54
+ - **2026-02-07**: The paper is available on [arXiv](https://arxiv.org/abs/2602.07458).
55
+
56
+ ## πŸ“Œ Introduction
57
+
58
+ Online Reinforcement Learning (RL) holds immense potential for advancing instruction-guided image editing, but its progress has been severely hindered by a critical perception gap we term **"Attention Collapse"**. Existing reward models frequently neglect cross-image comparisons and fail to capture fine-grained editing details, leading to inaccurate evaluations and unstable RL training.
59
+
60
+ To overcome this, we propose **SpatialReward**, which:
61
+ - **Introduces MER-Bench**: A new benchmark featuring multi-edit scenarios and expert human annotations for measuring reward model quality.
62
+ - **Enforces spatial reasoning**: Predicts bounding boxes for edit regions and anchors semantic judgments to pixel-level evidence.
63
+
64
+ <p align="center">
65
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/performance_table.png" width="95%">
66
+ <br>
67
+ <em>Comprehensive benchmark results. SpatialReward achieves SOTA performance, outperforming GPT-4.1 and GPT-5 on MER-Bench.</em>
68
+ </p>
69
+
70
+ <p align="center">
71
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/merbench_category_breakdown.png" width="70%">
72
+ <br>
73
+ <em>MER-Bench performance breakdown by editing category.</em>
74
+ </p>
75
+
76
+ ## πŸš€ Quick Start
77
+
78
+ ### Installation
79
+
80
+ ```bash
81
+ git clone https://github.com/Kwai-Keye/SpatialReward.git
82
+ cd SpatialReward
83
+
84
+ conda create -n spatialreward python=3.11 -y
85
+ conda activate spatialreward
86
+
87
+ pip install torch==2.8.0 torchvision --extra-index-url https://download.pytorch.org/whl/cu126
88
+ pip install -r requirements.txt
89
+ ```
90
+
91
+ ### Reward Server
92
+
93
+ ```bash
94
+ # Start reward server
95
+ cd example/reward/server
96
+ bash start_servers.sh
97
+ bash start_proxy.sh
98
+
99
+ # Query from client
100
+ from example.reward.client.reward_client_edit import RewardClient
101
+
102
+ client = RewardClient(proxy_host="127.0.0.1", proxy_port=23456)
103
+ scores, rewards, reasoning, meta_data = client.evaluate(
104
+ input_images=[input_img],
105
+ output_image=[output_img],
106
+ meta_datas=[{"instruction": "Remove the dog"}]
107
+ )
108
+ ```
109
+
110
+ ## πŸ“Š Benchmark Evaluation
111
+
112
+ Model and data are loaded directly from HuggingFace by default.
113
+
114
+ ```bash
115
+ # MER-Bench
116
+ bash eval/MERBench/run.sh
117
+
118
+ # MMRB2
119
+ bash eval/MMRB2/run.sh
120
+
121
+ # EditReward-Bench
122
+ bash eval/EditReward-Bench/run.sh
123
+ ```
124
+
125
+ ## πŸ“š Datasets
126
+
127
+ | Dataset | Description | Link |
128
+ |---|---|---|
129
+ | **SpatialReward-Train** | 260k spatial-aware training data (SFT + RL) | [πŸ€— Hub](https://huggingface.co/datasets/SpatialReward/SpatialReward-Train) |
130
+ | **MER-Bench** | MultiEditReward-Bench evaluation benchmark | [πŸ€— Hub](https://huggingface.co/datasets/SpatialReward/MER-Bench) |
131
+
132
+ ## 🎯 Training
133
+
134
+ ### SFT (LLaMA-Factory)
135
+ ```bash
136
+ llamafactory-cli train example/SpatialReward-train/sft/qwen3vl_lora_spatial_reward.yaml
137
+ ```
138
+
139
+ ### RL (ms-swift / GRPO)
140
+
141
+ ```bash
142
+ # Replace ORM first
143
+ cp example/SpatialReward-train/rl/orm.py <ms-swift>/swift/plugin/orm.py
144
+ bash example/SpatialReward-train/rl/run_mater.sh
145
+ ```
146
+
147
+ ### RL Results on OmniGen2
148
+
149
+ <p align="center">
150
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/omnigen2_rl_results.png" width="85%">
151
+ <br>
152
+ <em>SpatialReward delivers +0.90 on GEdit-EN Overall, doubling GPT-4.1's gain (+0.45).</em>
153
+ </p>
154
+
155
+ <p align="center">
156
+ <img src="https://huggingface.co/SpatialReward/SpatialReward-8B/resolve/main/assets/rl_training_curves.png" width="95%">
157
+ <br>
158
+ <em>Stable RL training dynamics with SpatialReward as reward signal.</em>
159
+ </p>
160
+
161
+ ## πŸ™ Acknowledgements
162
+
163
+ We thank [EditScore](https://github.com/VectorSpaceLab/EditScore) and [EditReward](https://github.com/TIGER-AI-Lab/EditReward) for valuable references.
164
+
165
+ ## ❀️ Citing Us
166
+
167
+ ```bibtex
168
+ @article{long2026spatialreward,
169
+ title={SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning},
170
+ author={Long, Yancheng and Yang, Yankai and Wei, Hongyang and Chen, Wei and Zhang, Tianke and Fan, Haonan and Liu, Changyi and Jiang, Kaiyu and Chen, Jiankang and Tang, Kaiyu and Wen, Bin and Yang, Fan and Gao, Tingting and Li, Han and Yang, Shuo},
171
+ journal={arXiv preprint arXiv:2602.07458},
172
+ year={2026}
173
+ }
174
+ ```
175
+
176
+ ## πŸ“„ License
177
+
178
+ Apache 2.0
assets/attention_visualization.png ADDED

Git LFS Details

  • SHA256: 7763dd21848eeffe28edf1108e0ee656547963f59cec80ba2e74d0d6366c8c1c
  • Pointer size: 132 Bytes
  • Size of remote file: 3.07 MB
assets/logo.png ADDED

Git LFS Details

  • SHA256: 13aff2747d0365066ec0aaa96395a81c7b832d7ff7e79b0993117a70bcb15e6d
  • Pointer size: 131 Bytes
  • Size of remote file: 253 kB
assets/merbench_category_breakdown.png ADDED

Git LFS Details

  • SHA256: 5da56563b23c4d4bffb2afb831bd0b70de14bb0799cba5ded6137288b302751f
  • Pointer size: 131 Bytes
  • Size of remote file: 193 kB
assets/omnigen2_rl_results.png ADDED

Git LFS Details

  • SHA256: 9453c010bad88d51cceb13e645668bf8345a320a909755bc206979a4a797e39a
  • Pointer size: 131 Bytes
  • Size of remote file: 210 kB
assets/performance_table.png ADDED

Git LFS Details

  • SHA256: c33752eb639013ecb58a2775d1a8989a4206b2714f86850a4ef5e467a563087f
  • Pointer size: 131 Bytes
  • Size of remote file: 493 kB
assets/rl_training_curves.png ADDED

Git LFS Details

  • SHA256: aedadaf876c7c7f2af59ce5cd553a28f8ad8415993acd7a50478c166d6418f16
  • Pointer size: 131 Bytes
  • Size of remote file: 202 kB