Ferry1231 commited on
Commit
80410e0
·
1 Parent(s): ce0fdd0

update model card

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -1,6 +1,5 @@
1
  ---
2
-
3
- ## license: apache-2.0
4
  library_name: diffusers
5
  tags:
6
  - text-to-image
@@ -15,6 +14,7 @@ tags:
15
  base_model:
16
  - black-forest-labs/FLUX.1-dev
17
  - Qwen/Qwen-Image-Edit
 
18
 
19
  # ARR-RPO
20
 
@@ -24,8 +24,8 @@ base_model:
24
 
25
  ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
26
 
27
- - `**ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
28
- - `**ARR-Qwen-Image-Edit/**`: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
29
 
30
  ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
31
 
 
1
  ---
2
+ license: apache-2.0
 
3
  library_name: diffusers
4
  tags:
5
  - text-to-image
 
14
  base_model:
15
  - black-forest-labs/FLUX.1-dev
16
  - Qwen/Qwen-Image-Edit
17
+ ---
18
 
19
  # ARR-RPO
20
 
 
24
 
25
  ARR-RPO provides two LoRA adapters trained with **Auto-Rubric as Reward (ARR)** and **Rubric Policy Optimization (RPO)** for visual generation:
26
 
27
+ - **`ARR-FLUX.1-dev/`**: a LoRA adapter for FLUX.1-dev text-to-image generation.
28
+ - **`ARR-Qwen-Image-Edit/`**: a LoRA adapter for Qwen-Image-Edit instruction-guided image editing.
29
 
30
  ARR-RPO uses a frozen VLM judge conditioned on explicit auto-generated rubrics. During RPO training, two candidate outputs are sampled for the same prompt or edit instruction, the ARR judge selects the preferred output, and the preferred/dispreferred candidates receive binary rewards. The goal is to improve prompt faithfulness, visual quality, compositional alignment, and edit fidelity without training a separate scalar reward model.
31