Harahan
/

SD35M-RTDMD

Text-to-Image

English

Model card Files Files and versions

xet

Community

Improve model card metadata and usage

by nielsr HF Staff - opened about 6 hours ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+25

-78

Files changed (1) hide show

README.md +25 -78

README.md CHANGED Viewed

@@ -1,11 +1,19 @@
 ---
-license: apache-2.0
-language:
-- en
 base_model:
 - stabilityai/stable-diffusion-3.5-medium
 pipeline_tag: text-to-image
 ---
 <div align="center">
 <img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
@@ -14,7 +22,7 @@ pipeline_tag: text-to-image
 <p><b>Reward-Tilted DMD &nbsp;·&nbsp; Ambient-Consistent Distillation &nbsp;·&nbsp; Hybrid Policy Gradient</b></p>
-[![Paper](https://img.shields.io/badge/paper-arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/2605.26108)
 [![Github](https://img.shields.io/badge/Harahan%2FRTDMD-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harahan/RTDMD)
 [![Hugging Face Collection](https://img.shields.io/badge/RTDMD_Collection-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/Harahan/rtdmd)
@@ -39,63 +47,25 @@ pipeline_tag: text-to-image
 ## 📖 Abstract
-We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a
-two-stage framework that unifies distribution-matching distillation with
-reward-guided RL for few-step flow generators. Minimizing the KL divergence to
-a *reward-tilted teacher distribution* decomposes naturally into a
-**distribution-matching** term and a **reward-maximization** term — instantiated
-as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy
-gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
-With **4 NFE** RTDMD reaches new SOTA on SD3-M / SD3.5-M / FLUX.2 4B; the
-distilled FLUX.2 4B even beats the full FLUX.2 9B teacher (50 NFE) on most
-rewards.
-<table align="center">
-  <tr>
-    <td align="center" width="50%">
-      <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/MLr2YHfmAvKYQsfA50uOf.png" alt="RTDMD teaser" width="100%">
-      <br/>
-      <em>4-step samples from RTDMD-distilled FLUX.2 4B (no classifier-free guidance).</em>
-    </td>
-    <td align="center" width="50%">
-      <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/UJnH2QqNCw4aJDFgwtfjP.png" alt="RTDMD comparison" width="100%">
-      <br/>
-      <em>Qualitative comparison for few-step diffusion models (4 NFE).</em>
-    </td>
-  </tr>
-</table>
 ---
 ## 🍭 Method Overview
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
   <br/>
-  <em>RTDMD overview. <b>Det.</b> = deterministic final step, <b>Stoc.</b> = stochastic intermediate steps. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
 </div>
-For the generator $G_\theta$, the reward-tilted KL objective decomposes as
-$$
-\nabla_\theta D_{\text{KL}}(p_\theta \| \tilde{p}_\psi) =
-\underbrace{\nabla_\theta D_{\text{KL}}(p_\theta \| p_\psi)}_{\text{distribution matching}} - \beta\underbrace{\nabla_\theta \mathbb{E}_{\hat{\mathbf{x}}_0 \sim p_\theta}[r(\hat{\mathbf{x}}_0)]}_{\text{reward maximization}}.
-$$
-The two terms map directly to the two trainers exposed by the CLI:
-| Stage | Trainer | Key knobs |
-| --- | --- | --- |
-| 1. AC-DMD cold start | `ACDMDTrainer` (`--trainer ac_dmd`) | sub-interval renoising, consistency weight `γ`, CPS sampler `η = 0.9` |
-| 2. RTDMD RL fine-tune | `RTDMDTrainer` (`--trainer rtdmd`)  | SubGRPO + final-step BP + AC-DMD |
 ---
 ## 📦 Contents
-This repository hosts the 4-NFE LoRA checkpoints distilled from
-**Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
 ```
 .
@@ -105,11 +75,7 @@ This repository hosts the 4-NFE LoRA checkpoints distilled from
     └── generator_ema.pt    # Stage-2 RTDMD LoRA (stacked on top of cold_start)
 ```
-Each `generator_ema.pt` is a `torch.save`-d `state_dict` containing only LoRA
-adapter keys (`lora_A` / `lora_B`, rank **32**, alpha **64**). The two adapters
-are designed to be **stacked**: the cold-start LoRA distills SD3.5-M down to
-4 NFE, and the RTDMD LoRA further fine-tunes that distilled model with
-reward-tilted RL.
 ---
@@ -117,25 +83,12 @@ reward-tilted RL.
 ### Option 1 — RTDMD inference CLI (recommended)
-The simplest path is to clone the RTDMD repo and let it stack both LoRAs and
-run the CPS sampler for you:
-```bash
-git clone https://github.com/Harahan/RTDMD.git && cd RTDMD
-pip install -r requirements.txt && pip install -e .
-# Download this repo
-huggingface-cli download Harahan/SD35M-RTDMD --local-dir ./ckpts/sd35m
-# Run 4-NFE inference (single GPU)
-python inference.py configs/inference/sd35m.yaml \
-    --override lora_paths='["./ckpts/sd35m/cold_start/generator_ema.pt","./ckpts/sd35m/rtdmd/generator_ema.pt"]' \
-    --override eval_reward=false \
-    --prompt "a cute cat sitting on a windowsill"
-```
 ### Option 2 — Plain diffusers
 ```python
 import torch
 from diffusers import StableDiffusion3Pipeline
@@ -160,16 +113,13 @@ for ckpt in ["cold_start/generator_ema.pt", "rtdmd/generator_ema.pt"]:
     state = torch.load(path, map_location="cpu", weights_only=False)
     pipe.transformer.load_state_dict(state, strict=False)
-# 4-step CPS sampling
 pipe(prompt="a cute cat sitting on a windowsill",
      num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
 ```
-> **Note:** RTDMD is trained on the CPS (Coefficients-Preserving Sampling)
-> scheduler with `η = 0.9`. Using the default Flow-Matching Euler scheduler
-> will still produce reasonable samples at 4 NFE, but the RTDMD inference CLI
-> is the only entry point that reproduces the paper numbers exactly.
 ---
 ## 📄 Citation
@@ -190,7 +140,4 @@ pipe(prompt="a cute cat sitting on a windowsill",
 ## ⚖️ License
-Apache 2.0 — same as the upstream
-[RTDMD](https://github.com/Harahan/RTDMD) repo. The base model
-[`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium)
-is governed by its own license; please review and comply with it separately.

 ---
 base_model:
 - stabilityai/stable-diffusion-3.5-medium
+language:
+- en
+license: apache-2.0
 pipeline_tag: text-to-image
+library_name: diffusers
+tags:
+- lora
+- flow-matching
+- distillation
+- stable-diffusion
+- rtdmd
 ---
 <div align="center">
 <img width="70%" height="70%" alt="logo" src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/l1JM1Si5PDCgvJR5SSiqf.png" />
 <p><b>Reward-Tilted DMD &nbsp;·&nbsp; Ambient-Consistent Distillation &nbsp;·&nbsp; Hybrid Policy Gradient</b></p>
+[![Paper](https://img.shields.io/badge/paper-arXiv-A42C25?style=for-the-badge&logo=arxiv&logoColor=white)](https://huggingface.co/papers/2605.26108)
 [![Github](https://img.shields.io/badge/Harahan%2FRTDMD-000000?style=for-the-badge&logo=github&logoColor=white)](https://github.com/Harahan/RTDMD)
 [![Hugging Face Collection](https://img.shields.io/badge/RTDMD_Collection-fcd022?style=for-the-badge&logo=huggingface&logoColor=000)](https://huggingface.co/collections/Harahan/rtdmd)
 ## 📖 Abstract
+This repository contains the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** using the framework proposed in the paper [Reinforcing Few-step Generators via Reward-Tilted Distribution Matching](https://huggingface.co/papers/2605.26108).
+We propose **Reward-Tilted Distribution Matching Distillation (RTDMD)**, a two-stage framework that unifies distribution-matching distillation with reward-guided RL for few-step flow generators. Minimizing the KL divergence to a *reward-tilted teacher distribution* decomposes naturally into a **distribution-matching** term and a **reward-maximization** term — instantiated as **Ambient-Consistent DMD (AC-DMD)** for the cold start and a **hybrid policy gradient** (SubGRPO + final-step reward back-propagation) for the RL stage.
 ---
 ## 🍭 Method Overview
 <div align="center">
   <img src="https://cdn-uploads.huggingface.co/production/uploads/64b500fdf460afaefc5c64b3/GSQ5Q9bF6SAiUqyFR4ZKs.png" alt="RTDMD method overview" width="70%">
   <br/>
+  <em>RTDMD overview. Trajectories: teacher (blue), few-step generator (green), fake score (yellow).</em>
 </div>
 ---
 ## 📦 Contents
+This repository hosts the 4-NFE LoRA checkpoints distilled from **Stable Diffusion 3.5 Medium** with [RTDMD](https://github.com/Harahan/RTDMD).
 ```
 .
     └── generator_ema.pt    # Stage-2 RTDMD LoRA (stacked on top of cold_start)
 ```
+Each `generator_ema.pt` is a `state_dict` containing LoRA adapter keys (rank **32**, alpha **64**). The two adapters are designed to be **stacked**: the cold-start LoRA distills the model down to 4 NFE, and the RTDMD LoRA further fine-tunes it with reward-tilted RL.
 ---
 ### Option 1 — RTDMD inference CLI (recommended)
+For exact reproduction of the paper numbers, please use the [official RTDMD repository](https://github.com/Harahan/RTDMD).
 ### Option 2 — Plain diffusers
+You can use these LoRAs with the `diffusers` library as follows:
 ```python
 import torch
 from diffusers import StableDiffusion3Pipeline
     state = torch.load(path, map_location="cpu", weights_only=False)
     pipe.transformer.load_state_dict(state, strict=False)
+# 4-step sampling
+# Note: RTDMD is trained on the CPS scheduler with η = 0.9.
+# Default Flow-Matching Euler will still produce reasonable samples.
 pipe(prompt="a cute cat sitting on a windowsill",
      num_inference_steps=4, guidance_scale=1.0).images[0].save("out.png")
 ```
 ---
 ## 📄 Citation
 ## ⚖️ License
+Apache 2.0. The base model [`stabilityai/stable-diffusion-3.5-medium`](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) is governed by its own license; please review and comply with it separately.