Harahan
/

SD35M-RTDMD

Text-to-Image

English

Model card Files Files and versions

xet

Community

Harahan commited on 1 day ago

Commit

63f8615

verified ·

1 Parent(s): 9987140

Add model card

Browse files

Files changed (1) hide show

README.md +179 -0

README.md ADDED Viewed

	@@ -0,0 +1,179 @@

+<div align="center">
+<img width="70%" height="70%" alt="logo" src="https://github.com/user-attachments/assets/4d534e80-f8ec-4c0b-948f-730cc0311961" />
+<h2> Reinforcing Few-step Generators via Reward-Tilted Distribution Matching </h2>
+<p><b>Reward-Tilted DMD &nbsp;·&nbsp; Ambient-Consistent Distillation &nbsp;·&nbsp; Hybrid Policy Gradient</b></p>
+[![Paper](https://img.shields.io/badge/paper-arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](<TODO: arxiv link>)
+[![Github](https://img.shields.io/badge/Harahan%2FRTDMD-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harahan/RTDMD)
+[![Hugging Face Collection](https://img.shields.io/badge/RTDMD_Collection-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/Harahan/rtdmd)
+[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/)
+</div>
+<div align="center">
+[Yushi Huang](https://harahan.github.io/)<sup>1, 2,</sup>\*<sup>†</sup>, [Xiangxin Zhou](https://zhouxiangxin1998.github.io/)<sup>2,</sup>\*, Ruoyu Wang<sup>2, 3,</sup>\*<sup>†</sup>, [Chi Zhang](https://icoz69.github.io/)<sup>3</sup>, [Jun Zhang](https://eejzhang.people.ust.hk/)<sup>1</sup>, [Tianyu Pang](https://p2333.github.io/)<sup>2,</sup>‡
+<sup>1</sup>The Hong Kong University of Science and Technology &nbsp;&nbsp;
+<sup>2</sup>Tencent Hunyuan &nbsp;&nbsp;
+<sup>3</sup>Westlake University
+\* Equal contribution &nbsp;·&nbsp; † Work done during internship at Tencent Hunyuan &nbsp;·&nbsp; ‡ Corresponding author
+</div>
+---
+## 📖 Abstract
+We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a
+two-stage framework that unifies distribution-matching distillation with
+reward-guided RL for few-step flow generators. Minimizing the KL divergence to
+a *reward-tilted teacher distribution* decomposes naturally into a
+**distribution-matching** term and a **reward-maximization** term — instantiated
+as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy
+gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
+With **4 NFE** RTDMD reaches new SOTA on SD3-M / SD3.5-M / FLUX.2 4B; the
+distilled FLUX.2 4B even beats the full FLUX.2 9B teacher (50 NFE) on most
+rewards.
+<table align="center">
+  <tr>
+    <td align="center" width="50%">
+      <img src="https://github.com/user-attachments/assets/cb1fb0da-d388-4846-9017-66bccebd0749" alt="RTDMD teaser" width="100%">
+      <br/>
+      <em>4-step samples from RTDMD-distilled FLUX.2 4B (no classifier-free guidance).</em>
+    </td>
+    <td align="center" width="50%">
+      <img src="https://github.com/user-attachments/assets/3d76d99a-6fe1-4059-8e68-9461ba067b01" alt="RTDMD comparison" width="100%">
+      <br/>
+      <em>Qualitative comparison for few-step diffusion models (4 NFE).</em>
+    </td>
+  </tr>
+</table>
+---
+## 🍭 Method Overview
+<div align="center">
+  <img src="https://github.com/user-attachments/assets/61a64fca-a143-40ae-9e36-79c6fcb5b696" alt="RTDMD method overview" width="70%">
+  <br/>
+  <em>RTDMD overview. <b>Det.</b> = deterministic final step, <b>Stoc.</b> = stochastic intermediate steps. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
+</div>
+For the generator $G_\theta$, the reward-tilted KL objective decomposes as
+$$
+\nabla_\theta D_{\text{KL}}(p_\theta \| \tilde{p}_\psi) =
+\underbrace{\nabla_\theta D_{\text{KL}}(p_\theta \| p_\psi)}_{\text{distribution matching}} - \beta\underbrace{\nabla_\theta \mathbb{E}_{\hat{\mathbf{x}}_0 \sim p_\theta}[r(\hat{\mathbf{x}}_0)]}_{\text{reward maximization}}.
+$$
+The two terms map directly to the two trainers exposed by the CLI:
+| Stage | Trainer | Key knobs |
+| --- | --- | --- |
+| 1. AC-DMD cold start | `ACDMDTrainer` (`--trainer ac_dmd`) | sub-interval renoising, consistency weight `γ`, CPS sampler `η = 0.9` |
+| 2. RTDMD RL fine-tune | `RTDMDTrainer` (`--trainer rtdmd`)  | SubGRPO + final-step BP + AC-DMD |
+---
+## 📦 Contents
+This repository hosts the 4-NFE LoRA checkpoints distilled from
+**Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
+```
+.
+├── cold_start/
+│   └── generator_ema.pt    # Stage-1 AC-DMD LoRA (4 NFE base)
+└── rtdmd/
+    └── generator_ema.pt    # Stage-2 RTDMD LoRA (stacked on top of cold_start)
+```
+Each `generator_ema.pt` is a `torch.save`-d `state_dict` containing only LoRA
+adapter keys (`lora_A` / `lora_B`, rank **32**, alpha **64**). The two adapters
+are designed to be **stacked**: the cold-start LoRA distills SD3.5-M down to
+4 NFE, and the RTDMD LoRA further fine-tunes that distilled model with
+reward-tilted RL.
+---
+## 🚀 Usage
+### Option 1 — RTDMD inference CLI (recommended)
+The simplest path is to clone the RTDMD repo and let it stack both LoRAs and
+run the CPS sampler for you:
+```bash
+git clone https://github.com/Harahan/RTDMD.git && cd RTDMD
+pip install -r requirements.txt && pip install -e .
+# Download this repo
+huggingface-cli download Harahan/SD35M-RTDMD --local-dir ./ckpts/sd35m
+# Run 4-NFE inference (single GPU)
+python inference.py configs/inference/sd35m.yaml \
+    --override lora_paths='["./ckpts/sd35m/cold_start/generator_ema.pt","./ckpts/sd35m/rtdmd/generator_ema.pt"]' \
+    --override eval_reward=false \
+    --prompt "a cute cat sitting on a windowsill"
+```
+### Option 2 — Plain diffusers
+```python
+import torch
+from diffusers import StableDiffusion3Pipeline
+from peft import LoraConfig
+from huggingface_hub import hf_hub_download
+base = "stabilityai/stable-diffusion-3.5-medium"
+pipe = StableDiffusion3Pipeline.from_pretrained(base, torch_dtype=torch.bfloat16).to("cuda")
+# Inject LoRA adapters with the rank/alpha used during training
+TARGETS = [
+    "to_q", "to_k", "to_v", "to_out.0",
+    "add_q_proj", "add_k_proj", "add_v_proj", "to_add_out",
+]
+pipe.transformer.add_adapter(
+    LoraConfig(r=32, lora_alpha=64, target_modules=TARGETS, init_lora_weights="gaussian")
+)
+# Sequentially load cold-start then RTDMD weights into the same adapter
+for ckpt in ["cold_start/generator_ema.pt", "rtdmd/generator_ema.pt"]:
+    path = hf_hub_download("Harahan/SD35M-RTDMD", ckpt)
+    state = torch.load(path, map_location="cpu", weights_only=False)
+    pipe.transformer.load_state_dict(state, strict=False)
+# 4-step CPS sampling
+pipe(prompt="a cute cat sitting on a windowsill",
+     num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
+```
+> **Note:** RTDMD is trained on the CPS (Coefficients-Preserving Sampling)
+> scheduler with `η = 0.9`. Using the default Flow-Matching Euler scheduler
+> will still produce reasonable samples at 4 NFE, but the RTDMD inference CLI
+> is the only entry point that reproduces the paper numbers exactly.
+---
+## 📄 Citation
+```bibtex
+<TODO: add bibtex after arXiv release>
+```
+---
+## ⚖️ License
+Apache 2.0 — same as the upstream
+[RTDMD](https://github.com/Harahan/RTDMD) repo. The base model
+[`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
+is governed by its own license; please review and comply with it separately.