Text-to-Image
Diffusers
Safetensors
PEFT
image-to-image
image-editing
lora
reinforcement-learning
rubric-policy-optimization
auto-rubric
Instructions to use OpenEnvisionLab/Auto-Rubric-as-Reward with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use OpenEnvisionLab/Auto-Rubric-as-Reward with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev,Qwen/Qwen-Image-Edit", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("OpenEnvisionLab/Auto-Rubric-as-Reward") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - PEFT
How to use OpenEnvisionLab/Auto-Rubric-as-Reward with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| library_name: diffusers | |
| tags: | |
| - text-to-image | |
| - image-to-image | |
| - image-editing | |
| - diffusers | |
| - lora | |
| - peft | |
| - reinforcement-learning | |
| - rubric-policy-optimization | |
| - auto-rubric | |
| base_model: | |
| - black-forest-labs/FLUX.1-dev | |
| - Qwen/Qwen-Image-Edit | |
| # ARR-RPO | |
| [Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO) | |
| ## Model Description | |
| ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation: | |
| - **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation. | |
| - **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing. | |
| ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model. | |
| ## Model Details | |
| | Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework | | |
| | --- | --- | --- | --- | --- | --- | | |
| | `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT | | |
| | `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT | | |
| ### Adapter Files | |
| ```text | |
| ARR-RPO/ | |
| ARR-FLUX.1-dev/ | |
| adapter_config.json | |
| adapter_model.safetensors | |
| ARR-Qwen-Image-Edit/ | |
| adapter_config.json | |
| adapter_model.safetensors | |
| ``` | |
| ### FLUX Adapter Targets | |
| The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including: | |
| ```text | |
| attn.to_q, attn.to_k, attn.to_v, attn.to_out.0, | |
| attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out, | |
| ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2 | |
| ``` | |
| ### Qwen-Image-Edit Adapter Targets | |
| The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including: | |
| ```text | |
| attn.to_q, attn.to_k, attn.to_v, attn.to_out.0, | |
| attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out | |
| ``` | |
| ## Intended Use | |
| These adapters are intended for research and development on: | |
| - improving text-to-image generation with rubric-guided preference rewards; | |
| - improving instruction-guided image editing while preserving source-image content; | |
| - studying Auto-Rubric as an interpretable alternative to scalar reward models; | |
| - reproducing and extending ARR-RPO experiments. | |
| They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products. | |
| ## How ARR-RPO Works | |
| ARR-RPO separates reward construction into explicit criteria and binary preference decisions: | |
| ```text | |
| visual preference examples | |
| -> auto-generated rubrics | |
| -> verified and structured rubric set | |
| -> frozen VLM judge | |
| -> pairwise preference decision | |
| -> RPO binary reward | |
| ``` | |
| For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`. | |
| ## Using The Models | |
| Install a recent Diffusers/PEFT environment that supports the corresponding base model. | |
| ### FLUX.1-dev LoRA | |
| ```python | |
| import torch | |
| from diffusers import FluxPipeline | |
| base_model = "black-forest-labs/FLUX.1-dev" | |
| adapter_repo = "OpenEnvisionLab/ARR-RPO" | |
| pipe = FluxPipeline.from_pretrained( | |
| base_model, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| pipe.load_lora_weights( | |
| adapter_repo, | |
| subfolder="ARR-FLUX.1-dev", | |
| ) | |
| pipe.to("cuda") | |
| image = pipe( | |
| "A cinematic portrait of a ceramic robot chef in a warm kitchen.", | |
| guidance_scale=3.5, | |
| num_inference_steps=30, | |
| ).images[0] | |
| image.save("arr_flux_example.png") | |
| ``` | |
| ### Qwen-Image-Edit LoRA | |
| ```python | |
| import torch | |
| from PIL import Image | |
| from diffusers import QwenImageEditPipeline | |
| base_model = "Qwen/Qwen-Image-Edit" | |
| adapter_repo = "OpenEnvisionLab/ARR-RPO" | |
| pipe = QwenImageEditPipeline.from_pretrained( | |
| base_model, | |
| torch_dtype=torch.bfloat16, | |
| ) | |
| pipe.load_lora_weights( | |
| adapter_repo, | |
| subfolder="ARR-Qwen-Image-Edit", | |
| ) | |
| pipe.to("cuda") | |
| source = Image.open("source.png").convert("RGB") | |
| image = pipe( | |
| image=source, | |
| prompt="Replace the sky with a sunset while preserving the building.", | |
| num_inference_steps=30, | |
| ).images[0] | |
| image.save("arr_qwen_edit_example.png") | |
| ``` | |
| If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example. | |
| ## Effective Prompting | |
| ### FLUX Text-to-Image | |
| The FLUX adapter works best with prompts that clearly specify: | |
| - required objects and attributes; | |
| - object counts; | |
| - spatial relationships; | |
| - style or medium; | |
| - constraints that should not be ignored. | |
| Example: | |
| ```text | |
| A high-resolution product photo of two matte blue ceramic cups on a wooden table, | |
| with the smaller cup to the left of the larger cup, soft window lighting. | |
| ``` | |
| ### Qwen-Image-Edit | |
| The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged. | |
| Example: | |
| ```text | |
| Change the shirt color to dark green while preserving the person's face, pose, | |
| background, lighting, and all other clothing details. | |
| ``` | |
| ## Training Details | |
| ARR-RPO was trained with LoRA and pairwise online preference optimization. | |
| | Hyperparameter | FLUX.1-dev | Qwen-Image-Edit | | |
| | --- | --- | --- | | |
| | Training method | RPO with ARR reward | RPO with ARR reward | | |
| | Candidates per prompt | 2 | 2 | | |
| | Positive reward | `1.0` | `1.0` | | |
| | Negative reward | `0.1` | `0.1` | | |
| | Learning rate | `5e-5` | `1e-5` | | |
| | PPO clip range | `0.2` | `0.2` | | |
| | KL coefficient | `0.01` | `0.02` | | |
| | Sampling steps during training | 8 | 10 | | |
| | Optimizer | AdamW | AdamW | | |
| | Gradient clipping | `1.0` | `1.0` | | |
| | LoRA rank | 16 | 32 | | |
| The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required. | |
| ## Evaluation Summary | |
| ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression. | |
| Recommended evaluation axes include: | |
| - text-to-image prompt adherence and compositional correctness; | |
| - image-edit instruction fulfillment; | |
| - source-image preservation for editing; | |
| - artifact control and visual coherence; | |
| - pairwise human or VLM preference accuracy; | |
| - position-bias checks by swapping candidate order. | |
| ## Limitations | |
| - These are LoRA adapters and require the corresponding base model weights. | |
| - Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings. | |
| - The ARR reward signal depends on the chosen VLM judge and rubric quality. | |
| - Image editing may still alter unrelated source-image regions, especially under ambiguous instructions. | |
| - The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment. | |
| ## License | |
| The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models: | |
| - `black-forest-labs/FLUX.1-dev` | |
| - `Qwen/Qwen-Image-Edit` | |
| ## Citation | |
| If you use these adapters, please cite the ARR-RPO project: | |
| ```bibtex | |
| @misc{visionautorubric2026, | |
| title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria}, | |
| author = {Anonymous}, | |
| year = {2026}, | |
| note = {arXiv coming soon} | |
| } | |
| ``` | |
| ## Contact | |
| For questions, issues, or updates, please use the project repository or Hugging Face community tab. | |