Ferry1231
update model card
bdeda06
|
raw
history blame
8.03 kB
---
license: apache-2.0
library_name: diffusers
tags:
- text-to-image
- image-to-image
- image-editing
- diffusers
- lora
- peft
- reinforcement-learning
- rubric-policy-optimization
- auto-rubric
base_model:
- black-forest-labs/FLUX.1-dev
- Qwen/Qwen-Image-Edit
---
# ARR-RPO
[Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)
## Model Description
ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
- **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
- **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
## Model Details
| Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework |
| --- | --- | --- | --- | --- | --- |
| `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT |
| `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT |
### Adapter Files
```text
ARR-RPO/
ARR-FLUX.1-dev/
adapter_config.json
adapter_model.safetensors
ARR-Qwen-Image-Edit/
adapter_config.json
adapter_model.safetensors
```
### FLUX Adapter Targets
The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including:
```text
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
```
### Qwen-Image-Edit Adapter Targets
The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including:
```text
attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
```
## Intended Use
These adapters are intended for research and development on:
- improving text-to-image generation with rubric-guided preference rewards;
- improving instruction-guided image editing while preserving source-image content;
- studying Auto-Rubric as an interpretable alternative to scalar reward models;
- reproducing and extending ARR-RPO experiments.
They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.
## How ARR-RPO Works
ARR-RPO separates reward construction into explicit criteria and binary preference decisions:
```text
visual preference examples
-> auto-generated rubrics
-> verified and structured rubric set
-> frozen VLM judge
-> pairwise preference decision
-> RPO binary reward
```
For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`.
## Using The Models
Install a recent Diffusers/PEFT environment that supports the corresponding base model.
### FLUX.1-dev LoRA
```python
import torch
from diffusers import FluxPipeline
base_model = "black-forest-labs/FLUX.1-dev"
adapter_repo = "OpenEnvisionLab/ARR-RPO"
pipe = FluxPipeline.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
adapter_repo,
subfolder="ARR-FLUX.1-dev",
)
pipe.to("cuda")
image = pipe(
"A cinematic portrait of a ceramic robot chef in a warm kitchen.",
guidance_scale=3.5,
num_inference_steps=30,
).images[0]
image.save("arr_flux_example.png")
```
### Qwen-Image-Edit LoRA
```python
import torch
from PIL import Image
from diffusers import QwenImageEditPipeline
base_model = "Qwen/Qwen-Image-Edit"
adapter_repo = "OpenEnvisionLab/ARR-RPO"
pipe = QwenImageEditPipeline.from_pretrained(
base_model,
torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
adapter_repo,
subfolder="ARR-Qwen-Image-Edit",
)
pipe.to("cuda")
source = Image.open("source.png").convert("RGB")
image = pipe(
image=source,
prompt="Replace the sky with a sunset while preserving the building.",
num_inference_steps=30,
).images[0]
image.save("arr_qwen_edit_example.png")
```
If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.
## Effective Prompting
### FLUX Text-to-Image
The FLUX adapter works best with prompts that clearly specify:
- required objects and attributes;
- object counts;
- spatial relationships;
- style or medium;
- constraints that should not be ignored.
Example:
```text
A high-resolution product photo of two matte blue ceramic cups on a wooden table,
with the smaller cup to the left of the larger cup, soft window lighting.
```
### Qwen-Image-Edit
The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.
Example:
```text
Change the shirt color to dark green while preserving the person's face, pose,
background, lighting, and all other clothing details.
```
## Training Details
ARR-RPO was trained with LoRA and pairwise online preference optimization.
| Hyperparameter | FLUX.1-dev | Qwen-Image-Edit |
| --- | --- | --- |
| Training method | RPO with ARR reward | RPO with ARR reward |
| Candidates per prompt | 2 | 2 |
| Positive reward | `1.0` | `1.0` |
| Negative reward | `0.1` | `0.1` |
| Learning rate | `5e-5` | `1e-5` |
| PPO clip range | `0.2` | `0.2` |
| KL coefficient | `0.01` | `0.02` |
| Sampling steps during training | 8 | 10 |
| Optimizer | AdamW | AdamW |
| Gradient clipping | `1.0` | `1.0` |
| LoRA rank | 16 | 32 |
The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.
## Evaluation Summary
ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.
Recommended evaluation axes include:
- text-to-image prompt adherence and compositional correctness;
- image-edit instruction fulfillment;
- source-image preservation for editing;
- artifact control and visual coherence;
- pairwise human or VLM preference accuracy;
- position-bias checks by swapping candidate order.
## Limitations
- These are LoRA adapters and require the corresponding base model weights.
- Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
- The ARR reward signal depends on the chosen VLM judge and rubric quality.
- Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
- The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.
## License
The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models:
- `black-forest-labs/FLUX.1-dev`
- `Qwen/Qwen-Image-Edit`
## Citation
If you use these adapters, please cite the ARR-RPO project:
```bibtex
@misc{visionautorubric2026,
title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
author = {Anonymous},
year = {2026},
note = {arXiv coming soon}
}
```
## Contact
For questions, issues, or updates, please use the project repository or Hugging Face community tab.