OpenEnvisionLab
/

Auto-Rubric-as-Reward

@@ -1,5 +1,6 @@
 ---
-license: apache-2.0
 library_name: diffusers
 tags:
   - text-to-image
@@ -14,27 +15,28 @@ tags:
 base_model:
   - black-forest-labs/FLUX.1-dev
   - Qwen/Qwen-Image-Edit
----
 # ARR-RPO
-[Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
 ## Model Description
 ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
-- **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
-- **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
 ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
 ## Model Details
-| Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
-| --- | --- | --- | --- | --- | --- |
-| `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
-| `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
 ### Adapter Files
@@ -155,23 +157,29 @@ image.save("arr_qwen_edit_example.png")
 If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
 ## Training Details
 ARR-RPO was trained with LoRA and pairwise online preference optimization.
-| Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
-| --- | --- | --- |
-| Training method | RPO with ARR reward | RPO with ARR reward |
-| Candidates per prompt | 2 | 2 |
-| Positive reward | `1.0` | `1.0` |
-| Negative reward | `0.1` | `0.1` |
-| Learning rate | `5e-5` | `1e-5` |
-| PPO clip range | `0.2` | `0.2` |
-| KL coefficient | `0.01` | `0.02` |
-| Sampling steps during training | 8 | 10 |
-| Optimizer | AdamW | AdamW |
-| Gradient clipping | `1.0` | `1.0` |
-| LoRA rank | 16 | 32 |
 The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
@@ -208,7 +216,7 @@ The model card metadata declares `apache-2.0`. Users must also comply with the l
 If you use these adapters, please cite the ARR-RPO project:
 ```bibtex
-@misc{visionautorubric2026,
   title        = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
   author       = {Anonymous},
   year         = {2026},

 ---
+## license: apache-2.0
 library_name: diffusers
 tags:
   - text-to-image
 base_model:
   - black-forest-labs/FLUX.1-dev
   - Qwen/Qwen-Image-Edit
 # ARR-RPO
+[Project Page](#) | [Code](https://github.com/OpenEnvision/Vision-Auto-Rubric) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
 ## Model Description
 ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
+- `**ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
+- `**ARR-Qwen-Image-Edit/**`: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
 ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
 ## Model Details
+| Adapter               | Base model                     | Task          | LoRA rank | LoRA alpha | Framework        |
+| --------------------- | ------------------------------ | ------------- | --------- | ---------- | ---------------- |
+| `ARR-FLUX.1-dev`      | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16        | 32         | Diffusers + PEFT |
+| `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit`         | Image editing | 32        | 64         | Diffusers + PEFT |
 ### Adapter Files
 If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
+## Qualitative Examples
+Qualitative examples for both released adapters are provided in the project materials. If you want to host rendered images directly in this repository, upload them with Hugging Face Xet storage rather than regular git binary tracking.
 ## Training Details
 ARR-RPO was trained with LoRA and pairwise online preference optimization.
+| Hyperparameter                 | FLUX.1-dev          | Qwen-Image-Edit     |
+| ------------------------------ | ------------------- | ------------------- |
+| Training method                | RPO with ARR reward | RPO with ARR reward |
+| Candidates per prompt          | 2                   | 2                   |
+| Positive reward                | `1.0`               | `1.0`               |
+| Negative reward                | `0.1`               | `0.1`               |
+| Learning rate                  | `5e-5`              | `1e-5`              |
+| PPO clip range                 | `0.2`               | `0.2`               |
+| KL coefficient                 | `0.01`              | `0.02`              |
+| Sampling steps during training | 8                   | 10                  |
+| Optimizer                      | AdamW               | AdamW               |
+| Gradient clipping              | `1.0`               | `1.0`               |
+| LoRA rank                      | 16                  | 32                  |
 The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
 If you use these adapters, please cite the ARR-RPO project:
 ```bibtex
+@misc{open2026auto,
   title        = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
   author       = {Anonymous},
   year         = {2026},