Improve model card metadata and usage
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,11 +1,19 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
language:
|
| 4 |
-
- en
|
| 5 |
base_model:
|
| 6 |
- stabilityai/stable-diffusion-3.5-medium
|
|
|
|
|
|
|
|
|
|
| 7 |
pipeline_tag: text-to-image
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
|
|
|
| 9 |
<div align="center">
|
| 10 |
|
| 11 |
<img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
|
|
@@ -14,7 +22,7 @@ pipeline_tag: text-to-image
|
|
| 14 |
|
| 15 |
<p><b>Reward-Tilted DMD Β· Ambient-Consistent Distillation Β· Hybrid Policy Gradient</b></p>
|
| 16 |
|
| 17 |
-
[](https://
|
| 18 |
[](https://github.com/Harahan/RTDMD)
|
| 19 |
[](https://huggingface.co/collections/Harahan/rtdmd)
|
| 20 |
|
|
@@ -39,63 +47,25 @@ pipeline_tag: text-to-image
|
|
| 39 |
|
| 40 |
## π Abstract
|
| 41 |
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
reward-guided RL for few-step flow generators. Minimizing the KL divergence to
|
| 45 |
-
a *reward-tilted teacher distribution* decomposes naturally into a
|
| 46 |
-
**distribution-matching** term and a **reward-maximization** term β instantiated
|
| 47 |
-
as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy
|
| 48 |
-
gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
|
| 49 |
-
With **4 NFE** RTDMD reaches new SOTA on SD3-M / SD3.5-M / FLUX.2 4B; the
|
| 50 |
-
distilled FLUX.2 4B even beats the full FLUX.2 9B teacher (50 NFE) on most
|
| 51 |
-
rewards.
|
| 52 |
-
|
| 53 |
-
<table align="center">
|
| 54 |
-
<tr>
|
| 55 |
-
<td align="center" width="50%">
|
| 56 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/MLr2YHfmAvKYQsfA50uOf.png" alt="RTDMD teaser" width="100%">
|
| 57 |
-
<br/>
|
| 58 |
-
<em>4-step samples from RTDMD-distilled FLUX.2 4B (no classifier-free guidance).</em>
|
| 59 |
-
</td>
|
| 60 |
-
<td align="center" width="50%">
|
| 61 |
-
<img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/UJnH2QqNCw4aJDFgwtfjP.png" alt="RTDMD comparison" width="100%">
|
| 62 |
-
<br/>
|
| 63 |
-
<em>Qualitative comparison for few-step diffusion models (4 NFE).</em>
|
| 64 |
-
</td>
|
| 65 |
-
</tr>
|
| 66 |
-
</table>
|
| 67 |
|
| 68 |
---
|
| 69 |
|
| 70 |
## π Method Overview
|
| 71 |
|
| 72 |
-
|
| 73 |
<div align="center">
|
| 74 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
|
| 75 |
<br/>
|
| 76 |
-
<em>RTDMD overview.
|
| 77 |
</div>
|
| 78 |
|
| 79 |
-
For the generator $G_\theta$, the reward-tilted KL objective decomposes as
|
| 80 |
-
|
| 81 |
-
$$
|
| 82 |
-
\nabla_\theta D_{\text{KL}}(p_\theta \| \tilde{p}_\psi) =
|
| 83 |
-
\underbrace{\nabla_\theta D_{\text{KL}}(p_\theta \| p_\psi)}_{\text{distribution matching}} - \beta\underbrace{\nabla_\theta \mathbb{E}_{\hat{\mathbf{x}}_0 \sim p_\theta}[r(\hat{\mathbf{x}}_0)]}_{\text{reward maximization}}.
|
| 84 |
-
$$
|
| 85 |
-
|
| 86 |
-
The two terms map directly to the two trainers exposed by the CLI:
|
| 87 |
-
|
| 88 |
-
| Stage | Trainer | Key knobs |
|
| 89 |
-
| --- | --- | --- |
|
| 90 |
-
| 1. AC-DMD cold start | `ACDMDTrainer` (`--trainer ac_dmd`) | sub-interval renoising, consistency weight `Ξ³`, CPS sampler `Ξ· = 0.9` |
|
| 91 |
-
| 2. RTDMD RL fine-tune | `RTDMDTrainer` (`--trainer rtdmd`) | SubGRPO + final-step BP + AC-DMD |
|
| 92 |
-
|
| 93 |
---
|
| 94 |
|
| 95 |
## π¦ Contents
|
| 96 |
|
| 97 |
-
This repository hosts the 4-NFE LoRA checkpoints distilled from
|
| 98 |
-
**Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
|
| 99 |
|
| 100 |
```
|
| 101 |
.
|
|
@@ -105,11 +75,7 @@ This repository hosts the 4-NFE LoRA checkpoints distilled from
|
|
| 105 |
βββ generator_ema.pt # Stage-2 RTDMD LoRA (stacked on top of cold_start)
|
| 106 |
```
|
| 107 |
|
| 108 |
-
Each `generator_ema.pt` is a `
|
| 109 |
-
adapter keys (`lora_A` / `lora_B`, rank **32**, alpha **64**). The two adapters
|
| 110 |
-
are designed to be **stacked**: the cold-start LoRA distills SD3.5-M down to
|
| 111 |
-
4 NFE, and the RTDMD LoRA further fine-tunes that distilled model with
|
| 112 |
-
reward-tilted RL.
|
| 113 |
|
| 114 |
---
|
| 115 |
|
|
@@ -117,25 +83,12 @@ reward-tilted RL.
|
|
| 117 |
|
| 118 |
### Option 1 β RTDMD inference CLI (recommended)
|
| 119 |
|
| 120 |
-
|
| 121 |
-
run the CPS sampler for you:
|
| 122 |
-
|
| 123 |
-
```bash
|
| 124 |
-
git clone https://github.com/Harahan/RTDMD.git && cd RTDMD
|
| 125 |
-
pip install -r requirements.txt && pip install -e .
|
| 126 |
-
|
| 127 |
-
# Download this repo
|
| 128 |
-
huggingface-cli download Harahan/SD35M-RTDMD --local-dir ./ckpts/sd35m
|
| 129 |
-
|
| 130 |
-
# Run 4-NFE inference (single GPU)
|
| 131 |
-
python inference.py configs/inference/sd35m.yaml \
|
| 132 |
-
--override lora_paths='["./ckpts/sd35m/cold_start/generator_ema.pt","./ckpts/sd35m/rtdmd/generator_ema.pt"]' \
|
| 133 |
-
--override eval_reward=false \
|
| 134 |
-
--prompt "a cute cat sitting on a windowsill"
|
| 135 |
-
```
|
| 136 |
|
| 137 |
### Option 2 β Plain diffusers
|
| 138 |
|
|
|
|
|
|
|
| 139 |
```python
|
| 140 |
import torch
|
| 141 |
from diffusers import StableDiffusion3Pipeline
|
|
@@ -160,16 +113,13 @@ for ckpt in ["cold_start/generator_ema.pt", "rtdmd/generator_ema.pt"]:
|
|
| 160 |
state = torch.load(path, map_location="cpu", weights_only=False)
|
| 161 |
pipe.transformer.load_state_dict(state, strict=False)
|
| 162 |
|
| 163 |
-
# 4-step
|
|
|
|
|
|
|
| 164 |
pipe(prompt="a cute cat sitting on a windowsill",
|
| 165 |
num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
|
| 166 |
```
|
| 167 |
|
| 168 |
-
> **Note:** RTDMD is trained on the CPS (Coefficients-Preserving Sampling)
|
| 169 |
-
> scheduler with `Ξ· = 0.9`. Using the default Flow-Matching Euler scheduler
|
| 170 |
-
> will still produce reasonable samples at 4 NFE, but the RTDMD inference CLI
|
| 171 |
-
> is the only entry point that reproduces the paper numbers exactly.
|
| 172 |
-
|
| 173 |
---
|
| 174 |
|
| 175 |
## π Citation
|
|
@@ -190,7 +140,4 @@ pipe(prompt="a cute cat sitting on a windowsill",
|
|
| 190 |
|
| 191 |
## βοΈ License
|
| 192 |
|
| 193 |
-
Apache 2.0
|
| 194 |
-
[RTDMD](https://github.com/Harahan/RTDMD) repo. The base model
|
| 195 |
-
[`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
|
| 196 |
-
is governed by its own license; please review and comply with it separately.
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
base_model:
|
| 3 |
- stabilityai/stable-diffusion-3.5-medium
|
| 4 |
+
language:
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
pipeline_tag: text-to-image
|
| 8 |
+
library_name: diffusers
|
| 9 |
+
tags:
|
| 10 |
+
- lora
|
| 11 |
+
- flow-matching
|
| 12 |
+
- distillation
|
| 13 |
+
- stable-diffusion
|
| 14 |
+
- rtdmd
|
| 15 |
---
|
| 16 |
+
|
| 17 |
<div align="center">
|
| 18 |
|
| 19 |
<img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
|
|
|
|
| 22 |
|
| 23 |
<p><b>Reward-Tilted DMD Β· Ambient-Consistent Distillation Β· Hybrid Policy Gradient</b></p>
|
| 24 |
|
| 25 |
+
[](https://huggingface.co/papers/2605.26108)
|
| 26 |
[](https://github.com/Harahan/RTDMD)
|
| 27 |
[](https://huggingface.co/collections/Harahan/rtdmd)
|
| 28 |
|
|
|
|
| 47 |
|
| 48 |
## π Abstract
|
| 49 |
|
| 50 |
+
This repository contains the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** using the framework proposed in the paper [Reinforcing Few-step Generators via Reward-Tilted Distribution Matching](https://huggingface.co/papers/2605.26108).
|
| 51 |
+
|
| 52 |
+
We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a two-stage framework that unifies distribution-matching distillation with reward-guided RL for few-step flow generators. Minimizing the KL divergence to a *reward-tilted teacher distribution* decomposes naturally into a **distribution-matching** term and a **reward-maximization** term β instantiated as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
---
|
| 55 |
|
| 56 |
## π Method Overview
|
| 57 |
|
|
|
|
| 58 |
<div align="center">
|
| 59 |
<img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
|
| 60 |
<br/>
|
| 61 |
+
<em>RTDMD overview. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
|
| 62 |
</div>
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
---
|
| 65 |
|
| 66 |
## π¦ Contents
|
| 67 |
|
| 68 |
+
This repository hosts the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
|
|
|
|
| 69 |
|
| 70 |
```
|
| 71 |
.
|
|
|
|
| 75 |
βββ generator_ema.pt # Stage-2 RTDMD LoRA (stacked on top of cold_start)
|
| 76 |
```
|
| 77 |
|
| 78 |
+
Each `generator_ema.pt` is a `state_dict` containing LoRA adapter keys (rank **32**, alpha **64**). The two adapters are designed to be **stacked**: the cold-start LoRA distills the model down to 4 NFE, and the RTDMD LoRA further fine-tunes it with reward-tilted RL.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
---
|
| 81 |
|
|
|
|
| 83 |
|
| 84 |
### Option 1 β RTDMD inference CLI (recommended)
|
| 85 |
|
| 86 |
+
For exact reproduction of the paper numbers, please use the [official RTDMD repository](https://github.com/Harahan/RTDMD).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
### Option 2 β Plain diffusers
|
| 89 |
|
| 90 |
+
You can use these LoRAs with the `diffusers` library as follows:
|
| 91 |
+
|
| 92 |
```python
|
| 93 |
import torch
|
| 94 |
from diffusers import StableDiffusion3Pipeline
|
|
|
|
| 113 |
state = torch.load(path, map_location="cpu", weights_only=False)
|
| 114 |
pipe.transformer.load_state_dict(state, strict=False)
|
| 115 |
|
| 116 |
+
# 4-step sampling
|
| 117 |
+
# Note: RTDMD is trained on the CPS scheduler with Ξ· = 0.9.
|
| 118 |
+
# Default Flow-Matching Euler will still produce reasonable samples.
|
| 119 |
pipe(prompt="a cute cat sitting on a windowsill",
|
| 120 |
num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
|
| 121 |
```
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
---
|
| 124 |
|
| 125 |
## π Citation
|
|
|
|
| 140 |
|
| 141 |
## βοΈ License
|
| 142 |
|
| 143 |
+
Apache 2.0. The base model [`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) is governed by its own license; please review and comply with it separately.
|
|
|
|
|
|
|
|
|