OpenEnvisionLab
/

Auto-Rubric-as-Reward

+---
+license: apache-2.0
+library_name: diffusers
+tags:
+  - text-to-image
+  - image-to-image
+  - image-editing
+  - diffusers
+  - lora
+  - peft
+  - reinforcement-learning
+  - rubric-policy-optimization
+  - auto-rubric
+base_model:
+  - black-forest-labs/FLUX.1-dev
+  - Qwen/Qwen-Image-Edit
+---
+# ARR-RPO
+[Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
+## Model Description
+ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
+- **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
+- **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
+ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
+## Model Details
+| Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
+| --- | --- | --- | --- | --- | --- |
+| `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
+| `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
+### Adapter Files
+```text
+ARR-RPO/
+  ARR-FLUX.1-dev/
+    adapter_config.json
+    adapter_model.safetensors
+  ARR-Qwen-Image-Edit/
+    adapter_config.json
+    adapter_model.safetensors
+```
+### FLUX Adapter Targets
+The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including:
+```text
+attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
+attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
+ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
+```
+### Qwen-Image-Edit Adapter Targets
+The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including:
+```text
+attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
+attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
+```
+## Intended Use
+These adapters are intended for research and development on:
+- improving text-to-image generation with rubric-guided preference rewards;
+- improving instruction-guided image editing while preserving source-image content;
+- studying Auto-Rubric as an interpretable alternative to scalar reward models;
+- reproducing and extending ARR-RPO experiments.
+They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.
+## How ARR-RPO Works
+ARR-RPO separates reward construction into explicit criteria and binary preference decisions:
+```text
+visual preference examples
+  -> auto-generated rubrics
+  -> verified and structured rubric set
+  -> frozen VLM judge
+  -> pairwise preference decision
+  -> RPO binary reward
+```
+For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`.
+## Using The Models
+Install a recent Diffusers/PEFT environment that supports the corresponding base model.
+### FLUX.1-dev LoRA
+```python
+import torch
+from diffusers import FluxPipeline
+base_model = "black-forest-labs/FLUX.1-dev"
+adapter_repo = "OpenEnvisionLab/ARR-RPO"
+pipe = FluxPipeline.from_pretrained(
+    base_model,
+    torch_dtype=torch.bfloat16,
+)
+pipe.load_lora_weights(
+    adapter_repo,
+    subfolder="ARR-FLUX.1-dev",
+)
+pipe.to("cuda")
+image = pipe(
+    "A cinematic portrait of a ceramic robot chef in a warm kitchen.",
+    guidance_scale=3.5,
+    num_inference_steps=30,
+).images[0]
+image.save("arr_flux_example.png")
+```
+### Qwen-Image-Edit LoRA
+```python
+import torch
+from PIL import Image
+from diffusers import QwenImageEditPipeline
+base_model = "Qwen/Qwen-Image-Edit"
+adapter_repo = "OpenEnvisionLab/ARR-RPO"
+pipe = QwenImageEditPipeline.from_pretrained(
+    base_model,
+    torch_dtype=torch.bfloat16,
+)
+pipe.load_lora_weights(
+    adapter_repo,
+    subfolder="ARR-Qwen-Image-Edit",
+)
+pipe.to("cuda")
+source = Image.open("source.png").convert("RGB")
+image = pipe(
+    image=source,
+    prompt="Replace the sky with a sunset while preserving the building.",
+    num_inference_steps=30,
+).images[0]
+image.save("arr_qwen_edit_example.png")
+```
+If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
+## Effective Prompting
+### FLUX Text-to-Image
+The FLUX adapter works best with prompts that clearly specify:
+- required objects and attributes;
+- object counts;
+- spatial relationships;
+- style or medium;
+- constraints that should not be ignored.
+Example:
+```text
+A high-resolution product photo of two matte blue ceramic cups on a wooden table,
+with the smaller cup to the left of the larger cup, soft window lighting.
+```
+### Qwen-Image-Edit
+The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.
+Example:
+```text
+Change the shirt color to dark green while preserving the person's face, pose,
+background, lighting, and all other clothing details.
+```
+## Training Details
+ARR-RPO was trained with LoRA and pairwise online preference optimization.
+| Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
+| --- | --- | --- |
+| Training method | RPO with ARR reward | RPO with ARR reward |
+| Candidates per prompt | 2 | 2 |
+| Positive reward | `1.0` | `1.0` |
+| Negative reward | `0.1` | `0.1` |
+| Learning rate | `5e-5` | `1e-5` |
+| PPO clip range | `0.2` | `0.2` |
+| KL coefficient | `0.01` | `0.02` |
+| Sampling steps during training | 8 | 10 |
+| Optimizer | AdamW | AdamW |
+| Gradient clipping | `1.0` | `1.0` |
+| LoRA rank | 16 | 32 |
+The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
+## Evaluation Summary
+ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.
+Recommended evaluation axes include:
+- text-to-image prompt adherence and compositional correctness;
+- image-edit instruction fulfillment;
+- source-image preservation for editing;
+- artifact control and visual coherence;
+- pairwise human or VLM preference accuracy;
+- position-bias checks by swapping candidate order.
+## Limitations
+- These are LoRA adapters and require the corresponding base model weights.
+- Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
+- The ARR reward signal depends on the chosen VLM judge and rubric quality.
+- Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
+- The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.
+## License
+The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models:
+- `black-forest-labs/FLUX.1-dev`
+- `Qwen/Qwen-Image-Edit`
+## Citation
+If you use these adapters, please cite the ARR-RPO project:
+```bibtex
+@misc{visionautorubric2026,
+  title        = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
+  author       = {Anonymous},
+  year         = {2026},
+  note         = {arXiv coming soon}
+}
+```
+## Contact
+For questions, issues, or updates, please use the project repository or Hugging Face community tab.