File size: 5,929 Bytes
c68fef4 1219e32 c68fef4 1219e32 c68fef4 1219e32 63f8615 93ddb76 63f8615 1219e32 63f8615 1219e32 63f8615 93ddb76 63f8615 1219e32 63f8615 1219e32 63f8615 1219e32 63f8615 1219e32 63f8615 1219e32 63f8615 1219e32 63f8615 791fe48 63f8615 1219e32 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 | ---
base_model:
- stabilityai/stable-diffusion-3.5-medium
language:
- en
license: apache-2.0
pipeline_tag: text-to-image
library_name: diffusers
tags:
- lora
- flow-matching
- distillation
- stable-diffusion
- rtdmd
---
<div align="center">
<img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
<h2> Reinforcing Few-step Generators via Reward-Tilted Distribution Matching </h2>
<p><b>Reward-Tilted DMD Β· Ambient-Consistent Distillation Β· Hybrid Policy Gradient</b></p>
[](https://huggingface.co/papers/2605.26108)
[](https://github.com/Harahan/RTDMD)
[](https://huggingface.co/collections/Harahan/rtdmd)
[](https://opensource.org/licenses/Apache-2.0)
[](https://www.python.org/)
</div>
<div align="center">
[Yushi Huang](https://harahan.github.io/)<sup>1, 2,</sup>\*<sup>β </sup>, [Xiangxin Zhou](https://zhouxiangxin1998.github.io/)<sup>2,</sup>\*, Ruoyu Wang<sup>2, 3,</sup>\*<sup>β </sup>, [Chi Zhang](https://icoz69.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>, [Tianyu Pang](https://p2333.github.io/)<sup>2,</sup>β‘
<sup>1</sup>The Hong Kong University of Science and Technology
<sup>2</sup>Tencent Hunyuan
<sup>3</sup>Westlake University
\* Equal contribution Β· β Work done during internship at Tencent Hunyuan Β· β‘ Corresponding author
</div>
---
## π Abstract
This repository contains the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** using the framework proposed in the paper [Reinforcing Few-step Generators via Reward-Tilted Distribution Matching](https://huggingface.co/papers/2605.26108).
We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a two-stage framework that unifies distribution-matching distillation with reward-guided RL for few-step flow generators. Minimizing the KL divergence to a *reward-tilted teacher distribution* decomposes naturally into a **distribution-matching** term and a **reward-maximization** term β instantiated as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
---
## π Method Overview
<div align="center">
<img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
<br/>
<em>RTDMD overview. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
</div>
---
## π¦ Contents
This repository hosts the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
```
.
βββ cold_start/
β βββ generator_ema.pt # Stage-1 AC-DMD LoRA (4 NFE base)
βββ rtdmd/
βββ generator_ema.pt # Stage-2 RTDMD LoRA (stacked on top of cold_start)
```
Each `generator_ema.pt` is a `state_dict` containing LoRA adapter keys (rank **32**, alpha **64**). The two adapters are designed to be **stacked**: the cold-start LoRA distills the model down to 4 NFE, and the RTDMD LoRA further fine-tunes it with reward-tilted RL.
---
## π Usage
### Option 1 β RTDMD inference CLI (recommended)
For exact reproduction of the paper numbers, please use the [official RTDMD repository](https://github.com/Harahan/RTDMD).
### Option 2 β Plain diffusers
You can use these LoRAs with the `diffusers` library as follows:
```python
import torch
from diffusers import StableDiffusion3Pipeline
from peft import LoraConfig
from huggingface_hub import hf_hub_download
base = "stabilityai/stable-diffusion-3.5-medium"
pipe = StableDiffusion3Pipeline.from_pretrained(base, torch_dtype=torch.bfloat16).to("cuda")
# Inject LoRA adapters with the rank/alpha used during training
TARGETS = [
"to_q", "to_k", "to_v", "to_out.0",
"add_q_proj", "add_k_proj", "add_v_proj", "to_add_out",
]
pipe.transformer.add_adapter(
LoraConfig(r=32, lora_alpha=64, target_modules=TARGETS, init_lora_weights="gaussian")
)
# Sequentially load cold-start then RTDMD weights into the same adapter
for ckpt in ["cold_start/generator_ema.pt", "rtdmd/generator_ema.pt"]:
path = hf_hub_download("Harahan/SD35M-RTDMD", ckpt)
state = torch.load(path, map_location="cpu", weights_only=False)
pipe.transformer.load_state_dict(state, strict=False)
# 4-step sampling
# Note: RTDMD is trained on the CPS scheduler with Ξ· = 0.9.
# Default Flow-Matching Euler will still produce reasonable samples.
pipe(prompt="a cute cat sitting on a windowsill",
num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
```
---
## π Citation
```bibtex
@misc{huang2026reinforcingfewstepgeneratorsrewardtilted,
title={Reinforcing Few-step Generators via Reward-Tilted Distribution Matching},
author={Yushi Huang and Xiangxin Zhou and Ruoyu Wang and Chi Zhang and Jun Zhang and Tianyu Pang},
year={2026},
eprint={2605.26108},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2605.26108},
}
```
---
## βοΈ License
Apache 2.0. The base model [`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) is governed by its own license; please review and comply with it separately. |