--- license: apache-2.0 library_name: diffusers tags: - text-to-image - image-to-image - image-editing - diffusers - lora - peft - reinforcement-learning - rubric-policy-optimization - auto-rubric base_model: - black-forest-labs/FLUX.1-dev - Qwen/Qwen-Image-Edit --- # ARR-RPO [Project Page](#) | [Code](#) | [Paper](#) | [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO) ## Model Description ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation: - **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation. - **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing. ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model. ## Model Details | Adapter | Base model | Task | LoRA rank | LoRA alpha | Framework | | --- | --- | --- | --- | --- | --- | | `ARR-FLUX.1-dev` | `black-forest-labs/FLUX.1-dev` | Text-to-image | 16 | 32 | Diffusers + PEFT | | `ARR-Qwen-Image-Edit` | `Qwen/Qwen-Image-Edit` | Image editing | 32 | 64 | Diffusers + PEFT | ### Adapter Files ```text ARR-RPO/ ARR-FLUX.1-dev/ adapter_config.json adapter_model.safetensors ARR-Qwen-Image-Edit/ adapter_config.json adapter_model.safetensors ``` ### FLUX Adapter Targets The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including: ```text attn.to_q, attn.to_k, attn.to_v, attn.to_out.0, attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out, ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2 ``` ### Qwen-Image-Edit Adapter Targets The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including: ```text attn.to_q, attn.to_k, attn.to_v, attn.to_out.0, attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out ``` ## Intended Use These adapters are intended for research and development on: - improving text-to-image generation with rubric-guided preference rewards; - improving instruction-guided image editing while preserving source-image content; - studying Auto-Rubric as an interpretable alternative to scalar reward models; - reproducing and extending ARR-RPO experiments. They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products. ## How ARR-RPO Works ARR-RPO separates reward construction into explicit criteria and binary preference decisions: ```text visual preference examples -> auto-generated rubrics -> verified and structured rubric set -> frozen VLM judge -> pairwise preference decision -> RPO binary reward ``` For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`. ## Using The Models Install a recent Diffusers/PEFT environment that supports the corresponding base model. ### FLUX.1-dev LoRA ```python import torch from diffusers import FluxPipeline base_model = "black-forest-labs/FLUX.1-dev" adapter_repo = "OpenEnvisionLab/ARR-RPO" pipe = FluxPipeline.from_pretrained( base_model, torch_dtype=torch.bfloat16, ) pipe.load_lora_weights( adapter_repo, subfolder="ARR-FLUX.1-dev", ) pipe.to("cuda") image = pipe( "A cinematic portrait of a ceramic robot chef in a warm kitchen.", guidance_scale=3.5, num_inference_steps=30, ).images[0] image.save("arr_flux_example.png") ``` ### Qwen-Image-Edit LoRA ```python import torch from PIL import Image from diffusers import QwenImageEditPipeline base_model = "Qwen/Qwen-Image-Edit" adapter_repo = "OpenEnvisionLab/ARR-RPO" pipe = QwenImageEditPipeline.from_pretrained( base_model, torch_dtype=torch.bfloat16, ) pipe.load_lora_weights( adapter_repo, subfolder="ARR-Qwen-Image-Edit", ) pipe.to("cuda") source = Image.open("source.png").convert("RGB") image = pipe( image=source, prompt="Replace the sky with a sunset while preserving the building.", num_inference_steps=30, ).images[0] image.save("arr_qwen_edit_example.png") ``` If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example. ## Effective Prompting ### FLUX Text-to-Image The FLUX adapter works best with prompts that clearly specify: - required objects and attributes; - object counts; - spatial relationships; - style or medium; - constraints that should not be ignored. Example: ```text A high-resolution product photo of two matte blue ceramic cups on a wooden table, with the smaller cup to the left of the larger cup, soft window lighting. ``` ### Qwen-Image-Edit The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged. Example: ```text Change the shirt color to dark green while preserving the person's face, pose, background, lighting, and all other clothing details. ``` ## Training Details ARR-RPO was trained with LoRA and pairwise online preference optimization. | Hyperparameter | FLUX.1-dev | Qwen-Image-Edit | | --- | --- | --- | | Training method | RPO with ARR reward | RPO with ARR reward | | Candidates per prompt | 2 | 2 | | Positive reward | `1.0` | `1.0` | | Negative reward | `0.1` | `0.1` | | Learning rate | `5e-5` | `1e-5` | | PPO clip range | `0.2` | `0.2` | | KL coefficient | `0.01` | `0.02` | | Sampling steps during training | 8 | 10 | | Optimizer | AdamW | AdamW | | Gradient clipping | `1.0` | `1.0` | | LoRA rank | 16 | 32 | The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required. ## Evaluation Summary ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression. Recommended evaluation axes include: - text-to-image prompt adherence and compositional correctness; - image-edit instruction fulfillment; - source-image preservation for editing; - artifact control and visual coherence; - pairwise human or VLM preference accuracy; - position-bias checks by swapping candidate order. ## Limitations - These are LoRA adapters and require the corresponding base model weights. - Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings. - The ARR reward signal depends on the chosen VLM judge and rubric quality. - Image editing may still alter unrelated source-image regions, especially under ambiguous instructions. - The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment. ## License The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models: - `black-forest-labs/FLUX.1-dev` - `Qwen/Qwen-Image-Edit` ## Citation If you use these adapters, please cite the ARR-RPO project: ```bibtex @misc{visionautorubric2026, title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria}, author = {Anonymous}, year = {2026}, note = {arXiv coming soon} } ``` ## Contact For questions, issues, or updates, please use the project repository or Hugging Face community tab.