---
license: apache-2.0
library_name: diffusers
tags:
  - text-to-image
  - image-to-image
  - image-editing
  - diffusers
  - lora
  - peft
  - reinforcement-learning
  - rubric-policy-optimization
  - auto-rubric
base_model:
  - black-forest-labs/FLUX.1-dev
  - Qwen/Qwen-Image-Edit
---

# ARR-RPO

[Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)

## Model Description

ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:

- **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
- **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.

ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.

## Model Details

| Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
| --- | --- | --- | --- | --- | --- |
| `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
| `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |

### Adapter Files

```text
ARR-RPO/
  ARR-FLUX.1-dev/
    adapter_config.json
    adapter_model.safetensors
  ARR-Qwen-Image-Edit/
    adapter_config.json
    adapter_model.safetensors
```

### FLUX Adapter Targets

The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including:

```text
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
```

### Qwen-Image-Edit Adapter Targets

The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including:

```text
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
```

## Intended Use

These adapters are intended for research and development on:

- improving text-to-image generation with rubric-guided preference rewards;
- improving instruction-guided image editing while preserving source-image content;
- studying Auto-Rubric as an interpretable alternative to scalar reward models;
- reproducing and extending ARR-RPO experiments.

They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.

## How ARR-RPO Works

ARR-RPO separates reward construction into explicit criteria and binary preference decisions:

```text
visual preference examples
  -> auto-generated rubrics
  -> verified and structured rubric set
  -> frozen VLM judge
  -> pairwise preference decision
  -> RPO binary reward
```

For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`.

## Using The Models

Install a recent Diffusers/PEFT environment that supports the corresponding base model.

### FLUX.1-dev LoRA

```python
import torch
from diffusers import FluxPipeline

base_model = "black-forest-labs/FLUX.1-dev"
adapter_repo = "OpenEnvisionLab/ARR-RPO"

pipe = FluxPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
    adapter_repo,
    subfolder="ARR-FLUX.1-dev",
)
pipe.to("cuda")

image = pipe(
    "A cinematic portrait of a ceramic robot chef in a warm kitchen.",
    guidance_scale=3.5,
    num_inference_steps=30,
).images[0]
image.save("arr_flux_example.png")
```

### Qwen-Image-Edit LoRA

```python
import torch
from PIL import Image
from diffusers import QwenImageEditPipeline

base_model = "Qwen/Qwen-Image-Edit"
adapter_repo = "OpenEnvisionLab/ARR-RPO"

pipe = QwenImageEditPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
    adapter_repo,
    subfolder="ARR-Qwen-Image-Edit",
)
pipe.to("cuda")

source = Image.open("source.png").convert("RGB")
image = pipe(
    image=source,
    prompt="Replace the sky with a sunset while preserving the building.",
    num_inference_steps=30,
).images[0]
image.save("arr_qwen_edit_example.png")
```

If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.

## Effective Prompting

### FLUX Text-to-Image

The FLUX adapter works best with prompts that clearly specify:

- required objects and attributes;
- object counts;
- spatial relationships;
- style or medium;
- constraints that should not be ignored.

Example:

```text
A high-resolution product photo of two matte blue ceramic cups on a wooden table,
with the smaller cup to the left of the larger cup, soft window lighting.
```

### Qwen-Image-Edit

The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.

Example:

```text
Change the shirt color to dark green while preserving the person's face, pose,
background, lighting, and all other clothing details.
```

## Training Details

ARR-RPO was trained with LoRA and pairwise online preference optimization.

| Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
| --- | --- | --- |
| Training method | RPO with ARR reward | RPO with ARR reward |
| Candidates per prompt | 2 | 2 |
| Positive reward | `1.0` | `1.0` |
| Negative reward | `0.1` | `0.1` |
| Learning rate | `5e-5` | `1e-5` |
| PPO clip range | `0.2` | `0.2` |
| KL coefficient | `0.01` | `0.02` |
| Sampling steps during training | 8 | 10 |
| Optimizer | AdamW | AdamW |
| Gradient clipping | `1.0` | `1.0` |
| LoRA rank | 16 | 32 |

The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.

## Evaluation Summary

ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.

Recommended evaluation axes include:

- text-to-image prompt adherence and compositional correctness;
- image-edit instruction fulfillment;
- source-image preservation for editing;
- artifact control and visual coherence;
- pairwise human or VLM preference accuracy;
- position-bias checks by swapping candidate order.

## Limitations

- These are LoRA adapters and require the corresponding base model weights.
- Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
- The ARR reward signal depends on the chosen VLM judge and rubric quality.
- Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
- The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.

## License

The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models:

- `black-forest-labs/FLUX.1-dev`
- `Qwen/Qwen-Image-Edit`

## Citation

If you use these adapters, please cite the ARR-RPO project:

```bibtex
@misc{visionautorubric2026,
  title        = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
  author       = {Anonymous},
  year         = {2026},
  note         = {arXiv coming soon}
}
```

## Contact

For questions, issues, or updates, please use the project repository or Hugging Face community tab.