Improve model card metadata and usage

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +25 -78
README.md CHANGED
@@ -1,11 +1,19 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
  base_model:
6
  - stabilityai/stable-diffusion-3.5-medium
 
 
 
7
  pipeline_tag: text-to-image
 
 
 
 
 
 
 
8
  ---
 
9
  <div align="center">
10
 
11
  <img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
@@ -14,7 +22,7 @@ pipeline_tag: text-to-image
14
 
15
  <p><b>Reward-Tilted DMD &nbsp;Β·&nbsp; Ambient-Consistent Distillation &nbsp;Β·&nbsp; Hybrid Policy Gradient</b></p>
16
 
17
- [![Paper](https://img.shields.io/badge/paper-arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.26108)
18
  [![Github](https://img.shields.io/badge/Harahan%2FRTDMD-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harahan/RTDMD)
19
  [![Hugging Face Collection](https://img.shields.io/badge/RTDMD_Collection-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/Harahan/rtdmd)
20
 
@@ -39,63 +47,25 @@ pipeline_tag: text-to-image
39
 
40
  ## πŸ“– Abstract
41
 
42
- We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a
43
- two-stage framework that unifies distribution-matching distillation with
44
- reward-guided RL for few-step flow generators. Minimizing the KL divergence to
45
- a *reward-tilted teacher distribution* decomposes naturally into a
46
- **distribution-matching** term and a **reward-maximization** term β€” instantiated
47
- as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy
48
- gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
49
- With **4 NFE** RTDMD reaches new SOTA on SD3-M / SD3.5-M / FLUX.2 4B; the
50
- distilled FLUX.2 4B even beats the full FLUX.2 9B teacher (50 NFE) on most
51
- rewards.
52
-
53
- <table align="center">
54
- <tr>
55
- <td align="center" width="50%">
56
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/MLr2YHfmAvKYQsfA50uOf.png" alt="RTDMD teaser" width="100%">
57
- <br/>
58
- <em>4-step samples from RTDMD-distilled FLUX.2 4B (no classifier-free guidance).</em>
59
- </td>
60
- <td align="center" width="50%">
61
- <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/UJnH2QqNCw4aJDFgwtfjP.png" alt="RTDMD comparison" width="100%">
62
- <br/>
63
- <em>Qualitative comparison for few-step diffusion models (4 NFE).</em>
64
- </td>
65
- </tr>
66
- </table>
67
 
68
  ---
69
 
70
  ## 🍭 Method Overview
71
 
72
-
73
  <div align="center">
74
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
75
  <br/>
76
- <em>RTDMD overview. <b>Det.</b> = deterministic final step, <b>Stoc.</b> = stochastic intermediate steps. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
77
  </div>
78
 
79
- For the generator $G_\theta$, the reward-tilted KL objective decomposes as
80
-
81
- $$
82
- \nabla_\theta D_{\text{KL}}(p_\theta \| \tilde{p}_\psi) =
83
- \underbrace{\nabla_\theta D_{\text{KL}}(p_\theta \| p_\psi)}_{\text{distribution matching}} - \beta\underbrace{\nabla_\theta \mathbb{E}_{\hat{\mathbf{x}}_0 \sim p_\theta}[r(\hat{\mathbf{x}}_0)]}_{\text{reward maximization}}.
84
- $$
85
-
86
- The two terms map directly to the two trainers exposed by the CLI:
87
-
88
- | Stage | Trainer | Key knobs |
89
- | --- | --- | --- |
90
- | 1. AC-DMD cold start | `ACDMDTrainer` (`--trainer ac_dmd`) | sub-interval renoising, consistency weight `Ξ³`, CPS sampler `Ξ· = 0.9` |
91
- | 2. RTDMD RL fine-tune | `RTDMDTrainer` (`--trainer rtdmd`) | SubGRPO + final-step BP + AC-DMD |
92
-
93
  ---
94
 
95
  ## πŸ“¦ Contents
96
 
97
- This repository hosts the 4-NFE LoRA checkpoints distilled from
98
- **Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
99
 
100
  ```
101
  .
@@ -105,11 +75,7 @@ This repository hosts the 4-NFE LoRA checkpoints distilled from
105
  └── generator_ema.pt # Stage-2 RTDMD LoRA (stacked on top of cold_start)
106
  ```
107
 
108
- Each `generator_ema.pt` is a `torch.save`-d `state_dict` containing only LoRA
109
- adapter keys (`lora_A` / `lora_B`, rank **32**, alpha **64**). The two adapters
110
- are designed to be **stacked**: the cold-start LoRA distills SD3.5-M down to
111
- 4 NFE, and the RTDMD LoRA further fine-tunes that distilled model with
112
- reward-tilted RL.
113
 
114
  ---
115
 
@@ -117,25 +83,12 @@ reward-tilted RL.
117
 
118
  ### Option 1 β€” RTDMD inference CLI (recommended)
119
 
120
- The simplest path is to clone the RTDMD repo and let it stack both LoRAs and
121
- run the CPS sampler for you:
122
-
123
- ```bash
124
- git clone https://github.com/Harahan/RTDMD.git && cd RTDMD
125
- pip install -r requirements.txt && pip install -e .
126
-
127
- # Download this repo
128
- huggingface-cli download Harahan/SD35M-RTDMD --local-dir ./ckpts/sd35m
129
-
130
- # Run 4-NFE inference (single GPU)
131
- python inference.py configs/inference/sd35m.yaml \
132
- --override lora_paths='["./ckpts/sd35m/cold_start/generator_ema.pt","./ckpts/sd35m/rtdmd/generator_ema.pt"]' \
133
- --override eval_reward=false \
134
- --prompt "a cute cat sitting on a windowsill"
135
- ```
136
 
137
  ### Option 2 β€” Plain diffusers
138
 
 
 
139
  ```python
140
  import torch
141
  from diffusers import StableDiffusion3Pipeline
@@ -160,16 +113,13 @@ for ckpt in ["cold_start/generator_ema.pt", "rtdmd/generator_ema.pt"]:
160
  state = torch.load(path, map_location="cpu", weights_only=False)
161
  pipe.transformer.load_state_dict(state, strict=False)
162
 
163
- # 4-step CPS sampling
 
 
164
  pipe(prompt="a cute cat sitting on a windowsill",
165
  num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
166
  ```
167
 
168
- > **Note:** RTDMD is trained on the CPS (Coefficients-Preserving Sampling)
169
- > scheduler with `Ξ· = 0.9`. Using the default Flow-Matching Euler scheduler
170
- > will still produce reasonable samples at 4 NFE, but the RTDMD inference CLI
171
- > is the only entry point that reproduces the paper numbers exactly.
172
-
173
  ---
174
 
175
  ## πŸ“„ Citation
@@ -190,7 +140,4 @@ pipe(prompt="a cute cat sitting on a windowsill",
190
 
191
  ## βš–οΈ License
192
 
193
- Apache 2.0 β€” same as the upstream
194
- [RTDMD](https://github.com/Harahan/RTDMD) repo. The base model
195
- [`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
196
- is governed by its own license; please review and comply with it separately.
 
1
  ---
 
 
 
2
  base_model:
3
  - stabilityai/stable-diffusion-3.5-medium
4
+ language:
5
+ - en
6
+ license: apache-2.0
7
  pipeline_tag: text-to-image
8
+ library_name: diffusers
9
+ tags:
10
+ - lora
11
+ - flow-matching
12
+ - distillation
13
+ - stable-diffusion
14
+ - rtdmd
15
  ---
16
+
17
  <div align="center">
18
 
19
  <img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
 
22
 
23
  <p><b>Reward-Tilted DMD &nbsp;Β·&nbsp; Ambient-Consistent Distillation &nbsp;Β·&nbsp; Hybrid Policy Gradient</b></p>
24
 
25
+ [![Paper](https://img.shields.io/badge/paper-arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://huggingface.co/papers/2605.26108)
26
  [![Github](https://img.shields.io/badge/Harahan%2FRTDMD-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harahan/RTDMD)
27
  [![Hugging Face Collection](https://img.shields.io/badge/RTDMD_Collection-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/Harahan/rtdmd)
28
 
 
47
 
48
  ## πŸ“– Abstract
49
 
50
+ This repository contains the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** using the framework proposed in the paper [Reinforcing Few-step Generators via Reward-Tilted Distribution Matching](https://huggingface.co/papers/2605.26108).
51
+
52
+ We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a two-stage framework that unifies distribution-matching distillation with reward-guided RL for few-step flow generators. Minimizing the KL divergence to a *reward-tilted teacher distribution* decomposes naturally into a **distribution-matching** term and a **reward-maximization** term β€” instantiated as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
  ---
55
 
56
  ## 🍭 Method Overview
57
 
 
58
  <div align="center">
59
  <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
60
  <br/>
61
+ <em>RTDMD overview. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
62
  </div>
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  ---
65
 
66
  ## πŸ“¦ Contents
67
 
68
+ This repository hosts the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
 
69
 
70
  ```
71
  .
 
75
  └── generator_ema.pt # Stage-2 RTDMD LoRA (stacked on top of cold_start)
76
  ```
77
 
78
+ Each `generator_ema.pt` is a `state_dict` containing LoRA adapter keys (rank **32**, alpha **64**). The two adapters are designed to be **stacked**: the cold-start LoRA distills the model down to 4 NFE, and the RTDMD LoRA further fine-tunes it with reward-tilted RL.
 
 
 
 
79
 
80
  ---
81
 
 
83
 
84
  ### Option 1 β€” RTDMD inference CLI (recommended)
85
 
86
+ For exact reproduction of the paper numbers, please use the [official RTDMD repository](https://github.com/Harahan/RTDMD).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
 
88
  ### Option 2 β€” Plain diffusers
89
 
90
+ You can use these LoRAs with the `diffusers` library as follows:
91
+
92
  ```python
93
  import torch
94
  from diffusers import StableDiffusion3Pipeline
 
113
  state = torch.load(path, map_location="cpu", weights_only=False)
114
  pipe.transformer.load_state_dict(state, strict=False)
115
 
116
+ # 4-step sampling
117
+ # Note: RTDMD is trained on the CPS scheduler with Ξ· = 0.9.
118
+ # Default Flow-Matching Euler will still produce reasonable samples.
119
  pipe(prompt="a cute cat sitting on a windowsill",
120
  num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
121
  ```
122
 
 
 
 
 
 
123
  ---
124
 
125
  ## πŸ“„ Citation
 
140
 
141
  ## βš–οΈ License
142
 
143
+ Apache 2.0. The base model [`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) is governed by its own license; please review and comply with it separately.