Instructions to use OpenEnvisionLab/Auto-Rubric-as-Reward with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use OpenEnvisionLab/Auto-Rubric-as-Reward with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("black-forest-labs/FLUX.1-dev,Qwen/Qwen-Image-Edit", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("OpenEnvisionLab/Auto-Rubric-as-Reward") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - PEFT
How to use OpenEnvisionLab/Auto-Rubric-as-Reward with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
license: apache-2.0
library_name: diffusers
tags:
- text-to-image
- image-to-image
- image-editing
- diffusers
- lora
- peft
- reinforcement-learning
- rubric-policy-optimization
- auto-rubric
base_model:
- black-forest-labs/FLUX.1-dev
- Qwen/Qwen-Image-Edit
ARR-RPO
Project Page | Code | Paper | Model Weights
Model Description
ARR-RPO provides two LoRA adapters trained with Auto-Rubric as Reward (ARR) and Rubric Policy Optimization (RPO) for visual generation:
ARR-FLUX.1-dev/: a LoRA adapter for FLUX.1-dev text-to-image generation.ARR-Qwen-Image-Edit/: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
Model Details
| Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
|---|---|---|---|---|---|
ARR-FLUX.1-dev |
black-forest-labs/FLUX.1-dev |
Text-to-image | 16 | 32 | Diffusers + PEFT |
ARR-Qwen-Image-Edit |
Qwen/Qwen-Image-Edit |
Image editing | 32 | 64 | Diffusers + PEFT |
Adapter Files
ARR-RPO/
ARR-FLUX.1-dev/
adapter_config.json
adapter_model.safetensors
ARR-Qwen-Image-Edit/
adapter_config.json
adapter_model.safetensors
FLUX Adapter Targets
The FLUX LoRA adapter is configured for FluxTransformer2DModel and targets attention and feed-forward modules, including:
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
Qwen-Image-Edit Adapter Targets
The Qwen-Image-Edit LoRA adapter is configured for QwenImageTransformer2DModel and targets attention projection modules, including:
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
Intended Use
These adapters are intended for research and development on:
- improving text-to-image generation with rubric-guided preference rewards;
- improving instruction-guided image editing while preserving source-image content;
- studying Auto-Rubric as an interpretable alternative to scalar reward models;
- reproducing and extending ARR-RPO experiments.
They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.
How ARR-RPO Works
ARR-RPO separates reward construction into explicit criteria and binary preference decisions:
visual preference examples
-> auto-generated rubrics
-> verified and structured rubric set
-> frozen VLM judge
-> pairwise preference decision
-> RPO binary reward
For pairwise RPO, the preferred candidate receives +1.0 and the dispreferred candidate receives -0.1.
Using The Models
Install a recent Diffusers/PEFT environment that supports the corresponding base model.
FLUX.1-dev LoRA
import torch
from diffusers import FluxPipeline
base_model = "black-forest-labs/FLUX.1-dev"
adapter_repo = "OpenEnvisionLab/ARR-RPO"
pipe = FluxPipeline.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
adapter_repo,
subfolder="ARR-FLUX.1-dev",
)
pipe.to("cuda")
image = pipe(
"A cinematic portrait of a ceramic robot chef in a warm kitchen.",
guidance_scale=3.5,
num_inference_steps=30,
).images[0]
image.save("arr_flux_example.png")
Qwen-Image-Edit LoRA
import torch
from PIL import Image
from diffusers import QwenImageEditPipeline
base_model = "Qwen/Qwen-Image-Edit"
adapter_repo = "OpenEnvisionLab/ARR-RPO"
pipe = QwenImageEditPipeline.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
adapter_repo,
subfolder="ARR-Qwen-Image-Edit",
)
pipe.to("cuda")
source = Image.open("source.png").convert("RGB")
image = pipe(
image=source,
prompt="Replace the sky with a sunset while preserving the building.",
num_inference_steps=30,
).images[0]
image.save("arr_qwen_edit_example.png")
If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
Effective Prompting
FLUX Text-to-Image
The FLUX adapter works best with prompts that clearly specify:
- required objects and attributes;
- object counts;
- spatial relationships;
- style or medium;
- constraints that should not be ignored.
Example:
A high-resolution product photo of two matte blue ceramic cups on a wooden table,
with the smaller cup to the left of the larger cup, soft window lighting.
Qwen-Image-Edit
The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.
Example:
Change the shirt color to dark green while preserving the person's face, pose,
background, lighting, and all other clothing details.
Training Details
ARR-RPO was trained with LoRA and pairwise online preference optimization.
| Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
|---|---|---|
| Training method | RPO with ARR reward | RPO with ARR reward |
| Candidates per prompt | 2 | 2 |
| Positive reward | 1.0 |
1.0 |
| Negative reward | 0.1 |
0.1 |
| Learning rate | 5e-5 |
1e-5 |
| PPO clip range | 0.2 |
0.2 |
| KL coefficient | 0.01 |
0.02 |
| Sampling steps during training | 8 | 10 |
| Optimizer | AdamW | AdamW |
| Gradient clipping | 1.0 |
1.0 |
| LoRA rank | 16 | 32 |
The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
Evaluation Summary
ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.
Recommended evaluation axes include:
- text-to-image prompt adherence and compositional correctness;
- image-edit instruction fulfillment;
- source-image preservation for editing;
- artifact control and visual coherence;
- pairwise human or VLM preference accuracy;
- position-bias checks by swapping candidate order.
Limitations
- These are LoRA adapters and require the corresponding base model weights.
- Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
- The ARR reward signal depends on the chosen VLM judge and rubric quality.
- Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
- The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.
License
The model card metadata declares apache-2.0. Users must also comply with the licenses and terms of the base models:
black-forest-labs/FLUX.1-devQwen/Qwen-Image-Edit
Citation
If you use these adapters, please cite the ARR-RPO project:
@misc{visionautorubric2026,
title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
author = {Anonymous},
year = {2026},
note = {arXiv coming soon}
}
Contact
For questions, issues, or updates, please use the project repository or Hugging Face community tab.