Ferry1231

update model card

bdeda06 21 days ago

8.03 kB

	---
	license: apache-2.0
	library_name: diffusers
	tags:
	- text-to-image
	- image-to-image
	- image-editing
	- diffusers
	- lora
	- peft
	- reinforcement-learning
	- rubric-policy-optimization
	- auto-rubric
	base_model:
	- black-forest-labs/FLUX.1-dev
	- Qwen/Qwen-Image-Edit
	---

	# ARR-RPO

	[Project Page](#) \| [Code](#) \| [Paper](#) \| [Model Weights](https://huggingface.co/OpenEnvisionLab/ARR-RPO)

	## Model Description

	ARR-RPO provides two LoRA adapters trained with Auto-Rubric as Reward (ARR) and Rubric Policy Optimization (RPO) for visual generation:

	- `ARR-FLUX.1-dev/`: a LoRA adapter for FLUX.1-dev text-to-image generation.
	- `ARR-Qwen-Image-Edit/`: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.

	ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.

	## Model Details

	\| Adapter \| Base model \| Task \| LoRA rank \| LoRA alpha \| Framework \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| `ARR-FLUX.1-dev` \| `black-forest-labs/FLUX.1-dev` \| Text-to-image \| 16 \| 32 \| Diffusers + PEFT \|
	\| `ARR-Qwen-Image-Edit` \| `Qwen/Qwen-Image-Edit` \| Image editing \| 32 \| 64 \| Diffusers + PEFT \|

	### Adapter Files

	```text
	ARR-RPO/
	ARR-FLUX.1-dev/
	adapter_config.json
	adapter_model.safetensors
	ARR-Qwen-Image-Edit/
	adapter_config.json
	adapter_model.safetensors
	```

	### FLUX Adapter Targets

	The FLUX LoRA adapter is configured for `FluxTransformer2DModel` and targets attention and feed-forward modules, including:

	```text
	attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
	attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out,
	ff.net.0.proj, ff.net.2, ff_context.net.0.proj, ff_context.net.2
	```

	### Qwen-Image-Edit Adapter Targets

	The Qwen-Image-Edit LoRA adapter is configured for `QwenImageTransformer2DModel` and targets attention projection modules, including:

	```text
	attn.to_q, attn.to_k, attn.to_v, attn.to_out.0,
	attn.add_q_proj, attn.add_k_proj, attn.add_v_proj, attn.to_add_out
	```

	## Intended Use

	These adapters are intended for research and development on:

	- improving text-to-image generation with rubric-guided preference rewards;
	- improving instruction-guided image editing while preserving source-image content;
	- studying Auto-Rubric as an interpretable alternative to scalar reward models;
	- reproducing and extending ARR-RPO experiments.

	They are not intended for safety-critical, medical, legal, or identity-sensitive decision-making. Generated or edited images should be reviewed before use in downstream products.

	## How ARR-RPO Works

	ARR-RPO separates reward construction into explicit criteria and binary preference decisions:

	```text
	visual preference examples
	-> auto-generated rubrics
	-> verified and structured rubric set
	-> frozen VLM judge
	-> pairwise preference decision
	-> RPO binary reward
	```

	For pairwise RPO, the preferred candidate receives `+1.0` and the dispreferred candidate receives `-0.1`.

	## Using The Models

	Install a recent Diffusers/PEFT environment that supports the corresponding base model.

	### FLUX.1-dev LoRA

	```python
	import torch
	from diffusers import FluxPipeline

	base_model = "black-forest-labs/FLUX.1-dev"
	adapter_repo = "OpenEnvisionLab/ARR-RPO"

	pipe = FluxPipeline.from_pretrained(
	base_model,
	torch_dtype=torch.bfloat16,
	)
	pipe.load_lora_weights(
	adapter_repo,
	subfolder="ARR-FLUX.1-dev",
	)
	pipe.to("cuda")

	image = pipe(
	"A cinematic portrait of a ceramic robot chef in a warm kitchen.",
	guidance_scale=3.5,
	num_inference_steps=30,
	).images[0]
	image.save("arr_flux_example.png")
	```

	### Qwen-Image-Edit LoRA

	```python
	import torch
	from PIL import Image
	from diffusers import QwenImageEditPipeline

	base_model = "Qwen/Qwen-Image-Edit"
	adapter_repo = "OpenEnvisionLab/ARR-RPO"

	pipe = QwenImageEditPipeline.from_pretrained(
	base_model,
	torch_dtype=torch.bfloat16,
	)
	pipe.load_lora_weights(
	adapter_repo,
	subfolder="ARR-Qwen-Image-Edit",
	)
	pipe.to("cuda")

	source = Image.open("source.png").convert("RGB")
	image = pipe(
	image=source,
	prompt="Replace the sky with a sunset while preserving the building.",
	num_inference_steps=30,
	).images[0]
	image.save("arr_qwen_edit_example.png")
	```

	If your Diffusers version uses a different Qwen-Image-Edit pipeline class or call signature, keep the same adapter subfolder and follow the base model's official loading example.

	## Effective Prompting

	### FLUX Text-to-Image

	The FLUX adapter works best with prompts that clearly specify:

	- required objects and attributes;
	- object counts;
	- spatial relationships;
	- style or medium;
	- constraints that should not be ignored.

	Example:

	```text
	A high-resolution product photo of two matte blue ceramic cups on a wooden table,
	with the smaller cup to the left of the larger cup, soft window lighting.
	```

	### Qwen-Image-Edit

	The Qwen-Image-Edit adapter works best with edit instructions that clearly separate the requested change from content that should remain unchanged.

	Example:

	```text
	Change the shirt color to dark green while preserving the person's face, pose,
	background, lighting, and all other clothing details.
	```

	## Training Details

	ARR-RPO was trained with LoRA and pairwise online preference optimization.

	\| Hyperparameter \| FLUX.1-dev \| Qwen-Image-Edit \|
	\| --- \| --- \| --- \|
	\| Training method \| RPO with ARR reward \| RPO with ARR reward \|
	\| Candidates per prompt \| 2 \| 2 \|
	\| Positive reward \| `1.0` \| `1.0` \|
	\| Negative reward \| `0.1` \| `0.1` \|
	\| Learning rate \| `5e-5` \| `1e-5` \|
	\| PPO clip range \| `0.2` \| `0.2` \|
	\| KL coefficient \| `0.01` \| `0.02` \|
	\| Sampling steps during training \| 8 \| 10 \|
	\| Optimizer \| AdamW \| AdamW \|
	\| Gradient clipping \| `1.0` \| `1.0` \|
	\| LoRA rank \| 16 \| 32 \|

	The reward judge is a frozen VLM conditioned on auto-generated visual rubrics. No trainable scalar reward model is required.

	## Evaluation Summary

	ARR-RPO is designed to improve alignment with multi-dimensional visual preferences. In the associated experiments, ARR-RPO improves over the corresponding unaligned base models on text-to-image and image-editing benchmarks, with gains attributed to explicit rubric-conditioned reward signals rather than opaque scalar regression.

	Recommended evaluation axes include:

	- text-to-image prompt adherence and compositional correctness;
	- image-edit instruction fulfillment;
	- source-image preservation for editing;
	- artifact control and visual coherence;
	- pairwise human or VLM preference accuracy;
	- position-bias checks by swapping candidate order.

	## Limitations

	- These are LoRA adapters and require the corresponding base model weights.
	- Output quality still depends on the base model, prompt quality, scheduler, seed, and inference settings.
	- The ARR reward signal depends on the chosen VLM judge and rubric quality.
	- Image editing may still alter unrelated source-image regions, especially under ambiguous instructions.
	- The model card does not guarantee safety filtering; users should apply appropriate content and policy filters for deployment.

	## License

	The model card metadata declares `apache-2.0`. Users must also comply with the licenses and terms of the base models:

	- `black-forest-labs/FLUX.1-dev`
	- `Qwen/Qwen-Image-Edit`

	## Citation

	If you use these adapters, please cite the ARR-RPO project:

	```bibtex
	@misc{visionautorubric2026,
	title = {Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria},
	author = {Anonymous},
	year = {2026},
	note = {arXiv coming soon}
	}
	```

	## Contact

	For questions, issues, or updates, please use the project repository or Hugging Face community tab.