Ferry1231 commited on
Commit
bdeda06
·
1 Parent(s): b9546c3

update model card

Browse files
Files changed (1) hide show
  1. README.md +251 -3
README.md CHANGED
@@ -1,3 +1,251 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ tags:
5
+ - text-to-image
6
+ - image-to-image
7
+ - image-editing
8
+ - diffusers
9
+ - lora
10
+ - peft
11
+ - reinforcement-learning
12
+ - rubric-policy-optimization
13
+ - auto-rubric
14
+ base_model:
15
+ - black-forest-labs/FLUX.1-dev
16
+ - Qwen/Qwen-Image-Edit
17
+ ---
18
+
19
+ # ARR-RPO
20
+
21
+ [Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
22
+
23
+ ## Model Description
24
+
25
+ ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
26
+
27
+ - **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
28
+ - **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
29
+
30
+ ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
31
+
32
+ ## Model Details
33
+
34
+ | Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
35
+ | --- | --- | --- | --- | --- | --- |
36
+ | `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
37
+ | `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
38
+
39
+ ### Adapter Files
40
+
41
+ ```text
42
+ ARR-RPO/
43
+ ARR-FLUX.1-dev/
44
+ adapter_config.json
45
+ adapter_model.safetensors
46
+ ARR-Qwen-Image-Edit/
47
+ adapter_config.json
48
+ adapter_model.safetensors
49
+ ```
50
+
51
+ ### FLUX Adapter Targets
52
+
53
+ The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including:
54
+
55
+ ```text
56
+ attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
57
+ attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
58
+ ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
59
+ ```
60
+
61
+ ### Qwen-Image-Edit Adapter Targets
62
+
63
+ The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including:
64
+
65
+ ```text
66
+ attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
67
+ attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
68
+ ```
69
+
70
+ ## Intended Use
71
+
72
+ These adapters are intended for research and development on:
73
+
74
+ - improving text-to-image generation with rubric-guided preference rewards;
75
+ - improving instruction-guided image editing while preserving source-image content;
76
+ - studying Auto-Rubric as an interpretable alternative to scalar reward models;
77
+ - reproducing and extending ARR-RPO experiments.
78
+
79
+ They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.
80
+
81
+ ## How ARR-RPO Works
82
+
83
+ ARR-RPO separates reward construction into explicit criteria and binary preference decisions:
84
+
85
+ ```text
86
+ visual preference examples
87
+ -> auto-generated rubrics
88
+ -> verified and structured rubric set
89
+ -> frozen VLM judge
90
+ -> pairwise preference decision
91
+ -> RPO binary reward
92
+ ```
93
+
94
+ For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`.
95
+
96
+ ## Using The Models
97
+
98
+ Install a recent Diffusers/PEFT environment that supports the corresponding base model.
99
+
100
+ ### FLUX.1-dev LoRA
101
+
102
+ ```python
103
+ import torch
104
+ from diffusers import FluxPipeline
105
+
106
+ base_model = "black-forest-labs/FLUX.1-dev"
107
+ adapter_repo = "OpenEnvisionLab/ARR-RPO"
108
+
109
+ pipe = FluxPipeline.from_pretrained(
110
+ base_model,
111
+ torch_dtype=torch.bfloat16,
112
+ )
113
+ pipe.load_lora_weights(
114
+ adapter_repo,
115
+ subfolder="ARR-FLUX.1-dev",
116
+ )
117
+ pipe.to("cuda")
118
+
119
+ image = pipe(
120
+ "A cinematic portrait of a ceramic robot chef in a warm kitchen.",
121
+ guidance_scale=3.5,
122
+ num_inference_steps=30,
123
+ ).images[0]
124
+ image.save("arr_flux_example.png")
125
+ ```
126
+
127
+ ### Qwen-Image-Edit LoRA
128
+
129
+ ```python
130
+ import torch
131
+ from PIL import Image
132
+ from diffusers import QwenImageEditPipeline
133
+
134
+ base_model = "Qwen/Qwen-Image-Edit"
135
+ adapter_repo = "OpenEnvisionLab/ARR-RPO"
136
+
137
+ pipe = QwenImageEditPipeline.from_pretrained(
138
+ base_model,
139
+ torch_dtype=torch.bfloat16,
140
+ )
141
+ pipe.load_lora_weights(
142
+ adapter_repo,
143
+ subfolder="ARR-Qwen-Image-Edit",
144
+ )
145
+ pipe.to("cuda")
146
+
147
+ source = Image.open("source.png").convert("RGB")
148
+ image = pipe(
149
+ image=source,
150
+ prompt="Replace the sky with a sunset while preserving the building.",
151
+ num_inference_steps=30,
152
+ ).images[0]
153
+ image.save("arr_qwen_edit_example.png")
154
+ ```
155
+
156
+ If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
157
+
158
+ ## Effective Prompting
159
+
160
+ ### FLUX Text-to-Image
161
+
162
+ The FLUX adapter works best with prompts that clearly specify:
163
+
164
+ - required objects and attributes;
165
+ - object counts;
166
+ - spatial relationships;
167
+ - style or medium;
168
+ - constraints that should not be ignored.
169
+
170
+ Example:
171
+
172
+ ```text
173
+ A high-resolution product photo of two matte blue ceramic cups on a wooden table,
174
+ with the smaller cup to the left of the larger cup, soft window lighting.
175
+ ```
176
+
177
+ ### Qwen-Image-Edit
178
+
179
+ The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.
180
+
181
+ Example:
182
+
183
+ ```text
184
+ Change the shirt color to dark green while preserving the person's face, pose,
185
+ background, lighting, and all other clothing details.
186
+ ```
187
+
188
+ ## Training Details
189
+
190
+ ARR-RPO was trained with LoRA and pairwise online preference optimization.
191
+
192
+ | Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
193
+ | --- | --- | --- |
194
+ | Training method | RPO with ARR reward | RPO with ARR reward |
195
+ | Candidates per prompt | 2 | 2 |
196
+ | Positive reward | `1.0` | `1.0` |
197
+ | Negative reward | `0.1` | `0.1` |
198
+ | Learning rate | `5e-5` | `1e-5` |
199
+ | PPO clip range | `0.2` | `0.2` |
200
+ | KL coefficient | `0.01` | `0.02` |
201
+ | Sampling steps during training | 8 | 10 |
202
+ | Optimizer | AdamW | AdamW |
203
+ | Gradient clipping | `1.0` | `1.0` |
204
+ | LoRA rank | 16 | 32 |
205
+
206
+ The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
207
+
208
+ ## Evaluation Summary
209
+
210
+ ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.
211
+
212
+ Recommended evaluation axes include:
213
+
214
+ - text-to-image prompt adherence and compositional correctness;
215
+ - image-edit instruction fulfillment;
216
+ - source-image preservation for editing;
217
+ - artifact control and visual coherence;
218
+ - pairwise human or VLM preference accuracy;
219
+ - position-bias checks by swapping candidate order.
220
+
221
+ ## Limitations
222
+
223
+ - These are LoRA adapters and require the corresponding base model weights.
224
+ - Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
225
+ - The ARR reward signal depends on the chosen VLM judge and rubric quality.
226
+ - Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
227
+ - The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.
228
+
229
+ ## License
230
+
231
+ The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models:
232
+
233
+ - `black-forest-labs/FLUX.1-dev`
234
+ - `Qwen/Qwen-Image-Edit`
235
+
236
+ ## Citation
237
+
238
+ If you use these adapters, please cite the ARR-RPO project:
239
+
240
+ ```bibtex
241
+ @misc{visionautorubric2026,
242
+ title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
243
+ author = {Anonymous},
244
+ year = {2026},
245
+ note = {arXiv coming soon}
246
+ }
247
+ ```
248
+
249
+ ## Contact
250
+
251
+ For questions, issues, or updates, please use the project repository or Hugging Face community tab.