Ferry1231
update model card
bdeda06
|
raw
history blame
8.03 kB
metadata
license: apache-2.0
library_name: diffusers
tags:
  - text-to-image
  - image-to-image
  - image-editing
  - diffusers
  - lora
  - peft
  - reinforcement-learning
  - rubric-policy-optimization
  - auto-rubric
base_model:
  - black-forest-labs/FLUX.1-dev
  - Qwen/Qwen-Image-Edit

ARR-RPO

Project Page | Code | Paper | Model Weights

Model Description

ARR-RPO provides two LoRA adapters trained with Auto-Rubric as Reward (ARR) and Rubric Policy Optimization (RPO) for visual generation:

  • ARR-FLUX.1-dev/: a LoRA adapter for FLUX.1-dev text-to-image generation.
  • ARR-Qwen-Image-Edit/: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.

ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.

Model Details

Adapter Base model Task LoRA rank LoRA alpha Framework
ARR-FLUX.1-dev black-forest-labs/FLUX.1-dev Text-to-image 16 32 Diffusers + PEFT
ARR-Qwen-Image-Edit Qwen/Qwen-Image-Edit Image editing 32 64 Diffusers + PEFT

Adapter Files

ARR-RPO/
  ARR-FLUX.1-dev/
    adapter_config.json
    adapter_model.safetensors
  ARR-Qwen-Image-Edit/
    adapter_config.json
    adapter_model.safetensors

FLUX Adapter Targets

The FLUX LoRA adapter is configured for FluxTransformer2DModel and targets attention and feed-forward modules, including:

attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2

Qwen-Image-Edit Adapter Targets

The Qwen-Image-Edit LoRA adapter is configured for QwenImageTransformer2DModel and targets attention projection modules, including:

attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out

Intended Use

These adapters are intended for research and development on:

  • improving text-to-image generation with rubric-guided preference rewards;
  • improving instruction-guided image editing while preserving source-image content;
  • studying Auto-Rubric as an interpretable alternative to scalar reward models;
  • reproducing and extending ARR-RPO experiments.

They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.

How ARR-RPO Works

ARR-RPO separates reward construction into explicit criteria and binary preference decisions:

visual preference examples
  -> auto-generated rubrics
  -> verified and structured rubric set
  -> frozen VLM judge
  -> pairwise preference decision
  -> RPO binary reward

For pairwise RPO, the preferred candidate receives +1.0 and the dispreferred candidate receives -0.1.

Using The Models

Install a recent Diffusers/PEFT environment that supports the corresponding base model.

FLUX.1-dev LoRA

import torch
from diffusers import FluxPipeline

base_model = "black-forest-labs/FLUX.1-dev"
adapter_repo = "OpenEnvisionLab/ARR-RPO"

pipe = FluxPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
    adapter_repo,
    subfolder="ARR-FLUX.1-dev",
)
pipe.to("cuda")

image = pipe(
    "A cinematic portrait of a ceramic robot chef in a warm kitchen.",
    guidance_scale=3.5,
    num_inference_steps=30,
).images[0]
image.save("arr_flux_example.png")

Qwen-Image-Edit LoRA

import torch
from PIL import Image
from diffusers import QwenImageEditPipeline

base_model = "Qwen/Qwen-Image-Edit"
adapter_repo = "OpenEnvisionLab/ARR-RPO"

pipe = QwenImageEditPipeline.from_pretrained(
    base_model,
    torch_dtype=torch.bfloat16,
)
pipe.load_lora_weights(
    adapter_repo,
    subfolder="ARR-Qwen-Image-Edit",
)
pipe.to("cuda")

source = Image.open("source.png").convert("RGB")
image = pipe(
    image=source,
    prompt="Replace the sky with a sunset while preserving the building.",
    num_inference_steps=30,
).images[0]
image.save("arr_qwen_edit_example.png")

If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.

Effective Prompting

FLUX Text-to-Image

The FLUX adapter works best with prompts that clearly specify:

  • required objects and attributes;
  • object counts;
  • spatial relationships;
  • style or medium;
  • constraints that should not be ignored.

Example:

A high-resolution product photo of two matte blue ceramic cups on a wooden table,
with the smaller cup to the left of the larger cup, soft window lighting.

Qwen-Image-Edit

The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.

Example:

Change the shirt color to dark green while preserving the person's face, pose,
background, lighting, and all other clothing details.

Training Details

ARR-RPO was trained with LoRA and pairwise online preference optimization.

Hyperparameter FLUX.1-dev Qwen-Image-Edit
Training method RPO with ARR reward RPO with ARR reward
Candidates per prompt 2 2
Positive reward 1.0 1.0
Negative reward 0.1 0.1
Learning rate 5e-5 1e-5
PPO clip range 0.2 0.2
KL coefficient 0.01 0.02
Sampling steps during training 8 10
Optimizer AdamW AdamW
Gradient clipping 1.0 1.0
LoRA rank 16 32

The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.

Evaluation Summary

ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.

Recommended evaluation axes include:

  • text-to-image prompt adherence and compositional correctness;
  • image-edit instruction fulfillment;
  • source-image preservation for editing;
  • artifact control and visual coherence;
  • pairwise human or VLM preference accuracy;
  • position-bias checks by swapping candidate order.

Limitations

  • These are LoRA adapters and require the corresponding base model weights.
  • Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
  • The ARR reward signal depends on the chosen VLM judge and rubric quality.
  • Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
  • The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.

License

The model card metadata declares apache-2.0. Users must also comply with the licenses and terms of the base models:

  • black-forest-labs/FLUX.1-dev
  • Qwen/Qwen-Image-Edit

Citation

If you use these adapters, please cite the ARR-RPO project:

@misc{visionautorubric2026,
  title        = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
  author       = {Anonymous},
  year         = {2026},
  note         = {arXiv coming soon}
}

Contact

For questions, issues, or updates, please use the project repository or Hugging Face community tab.