Ferry1231 commited on
Commit
ce0fdd0
·
1 Parent(s): 2e92dce

update model card

Browse files
Files changed (1) hide show
  1. README.md +31 -23
README.md CHANGED
@@ -1,5 +1,6 @@
1
  ---
2
- license: apache-2.0
 
3
  library_name: diffusers
4
  tags:
5
  - text-to-image
@@ -14,27 +15,28 @@ tags:
14
  base_model:
15
  - black-forest-labs/FLUX.1-dev
16
  - Qwen/Qwen-Image-Edit
17
- ---
18
 
19
  # ARR-RPO
20
 
21
- [Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
22
 
23
  ## Model Description
24
 
25
  ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
26
 
27
- - **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
28
- - **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
29
 
30
  ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
31
 
32
  ## Model Details
33
 
34
- | Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
35
- | --- | --- | --- | --- | --- | --- |
36
- | `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
37
- | `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
 
 
38
 
39
  ### Adapter Files
40
 
@@ -155,23 +157,29 @@ image.save("arr_qwen_edit_example.png")
155
 
156
  If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
157
 
 
 
 
 
158
  ## Training Details
159
 
160
  ARR-RPO was trained with LoRA and pairwise online preference optimization.
161
 
162
- | Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
163
- | --- | --- | --- |
164
- | Training method | RPO with ARR reward | RPO with ARR reward |
165
- | Candidates per prompt | 2 | 2 |
166
- | Positive reward | `1.0` | `1.0` |
167
- | Negative reward | `0.1` | `0.1` |
168
- | Learning rate | `5e-5` | `1e-5` |
169
- | PPO clip range | `0.2` | `0.2` |
170
- | KL coefficient | `0.01` | `0.02` |
171
- | Sampling steps during training | 8 | 10 |
172
- | Optimizer | AdamW | AdamW |
173
- | Gradient clipping | `1.0` | `1.0` |
174
- | LoRA rank | 16 | 32 |
 
 
175
 
176
  The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
177
 
@@ -208,7 +216,7 @@ The model card metadata declares `apache-2.0`. Users must also comply with the l
208
  If you use these adapters, please cite the ARR-RPO project:
209
 
210
  ```bibtex
211
- @misc{visionautorubric2026,
212
  title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
213
  author = {Anonymous},
214
  year = {2026},
 
1
  ---
2
+
3
+ ## license: apache-2.0
4
  library_name: diffusers
5
  tags:
6
  - text-to-image
 
15
  base_model:
16
  - black-forest-labs/FLUX.1-dev
17
  - Qwen/Qwen-Image-Edit
 
18
 
19
  # ARR-RPO
20
 
21
+ [Project Page](#) | [Code](https://github.com/OpenEnvision/Vision-Auto-Rubric) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
22
 
23
  ## Model Description
24
 
25
  ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
26
 
27
+ - `**ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
28
+ - `**ARR-Qwen-Image-Edit/**`: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
29
 
30
  ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
31
 
32
  ## Model Details
33
 
34
+
35
+ | Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
36
+ | --------------------- | ------------------------------ | ------------- | --------- | ---------- | ---------------- |
37
+ | `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
38
+ | `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
39
+
40
 
41
  ### Adapter Files
42
 
 
157
 
158
  If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
159
 
160
+ ## Qualitative Examples
161
+
162
+ Qualitative examples for both released adapters are provided in the project materials. If you want to host rendered images directly in this repository, upload them with Hugging Face Xet storage rather than regular git binary tracking.
163
+
164
  ## Training Details
165
 
166
  ARR-RPO was trained with LoRA and pairwise online preference optimization.
167
 
168
+
169
+ | Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
170
+ | ------------------------------ | ------------------- | ------------------- |
171
+ | Training method | RPO with ARR reward | RPO with ARR reward |
172
+ | Candidates per prompt | 2 | 2 |
173
+ | Positive reward | `1.0` | `1.0` |
174
+ | Negative reward | `0.1` | `0.1` |
175
+ | Learning rate | `5e-5` | `1e-5` |
176
+ | PPO clip range | `0.2` | `0.2` |
177
+ | KL coefficient | `0.01` | `0.02` |
178
+ | Sampling steps during training | 8 | 10 |
179
+ | Optimizer | AdamW | AdamW |
180
+ | Gradient clipping | `1.0` | `1.0` |
181
+ | LoRA rank | 16 | 32 |
182
+
183
 
184
  The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
185
 
 
216
  If you use these adapters, please cite the ARR-RPO project:
217
 
218
  ```bibtex
219
+ @misc{open2026auto,
220
  title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
221
  author = {Anonymous},
222
  year = {2026},