byliutao commited on
Commit
0762e6e
·
verified ·
1 Parent(s): 3a73176

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -3
README.md CHANGED
@@ -1,3 +1,161 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <h1 align="center">
2
+ Continuous-Time Distribution Matching for Few-Step Diffusion Distillation
3
+ </h1>
4
+
5
+ <div align="center">
6
+
7
+ <a href="https://byliutao.github.io/cdm_page/">
8
+ <img src="https://img.shields.io/badge/Project_Page-0055b3?logo=githubpages&logoColor=white" alt="Project Page">
9
+ </a>
10
+ <a href="https://huggingface.co/byliutao/stable-diffusion-3-medium-turbo">
11
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-SD3.5_Medium-ffc107" alt="SD3.5-Medium Model">
12
+ </a>
13
+ <a href="https://huggingface.co/byliutao/Longcat-Image-Turbo">
14
+ <img src="https://img.shields.io/badge/%F0%9F%A4%97%20Model-LongCat-ffc107" alt="LongCat Model">
15
+ </a>
16
+ <a href="https://github.com/byliutao/cdm">
17
+ <img src="https://img.shields.io/badge/GitHub-byliutao%2Fcdm-black?logo=github&logoColor=white" alt="GitHub">
18
+ </a>
19
+ <a href="https://arxiv.org/abs/">
20
+ <img src="https://img.shields.io/badge/Paper-2509.161-b31b1b?logo=arxiv&logoColor=white" alt="arXiv Paper">
21
+ </a>
22
+
23
+ </div>
24
+
25
+ <p align="center">
26
+ <a href="#algorithm-overview">Algorithm Overview</a> •
27
+ <a href="#4-nfe-generation-results">Results</a> •
28
+ <a href="#inference">Inference</a> •
29
+ <a href="#training">Training</a> •
30
+ <a href="#evaluation">Evaluation</a> •
31
+ <a href="#citation">Citation</a>
32
+ </p>
33
+
34
+ <p align="center">
35
+ <img src="assets/teaser.png" width="95%" alt="Teaser: High-quality images generated with only 4 NFE">
36
+ </p>
37
+
38
+ ## Algorithm Overview
39
+
40
+ <p align="center">
41
+ <img src="assets/pipe.png" width="90%" alt="Pipeline overview of Continuous-Time Distribution Matching">
42
+ </p>
43
+
44
+ **Overview of Continuous-Time Distribution Matching (CDM).** **Top:** Our approach employs a dynamic continuous time schedule during backward simulation, sampling intermediate anchors uniformly from (0, 1]. **Bottom Left:** CFG augmentation (CA) and distribution matching (DM) operate on this dynamic schedule to align text-image conditions and data distributions at on-trajectory anchors. **Bottom Right:** To address inter-anchor inconsistency, the proposed CDM objective explicitly extrapolates off-trajectory latents using the predicted velocity.
45
+
46
+ ## 4-NFE Generation Results
47
+
48
+ ### SD3.5-Medium
49
+
50
+ <p align="center">
51
+ <img src="assets/sd3.png" width="90%" alt="SD3.5-Medium 4-NFE generation samples">
52
+ </p>
53
+
54
+ ### LongCat
55
+
56
+ <p align="center">
57
+ <img src="assets/longcat.png" width="90%" alt="LongCat 4-NFE generation samples">
58
+ </p>
59
+
60
+ ---
61
+
62
+ ## Inference
63
+
64
+ ```bash
65
+ # Clone this repository
66
+ git clone https://github.com/byliutao/cdm.git
67
+ cd cdm
68
+
69
+ # [Optional] Use HuggingFace mirror if huggingface.co is not accessible
70
+ export HF_ENDPOINT="https://hf-mirror.com"
71
+ export HF_TOKEN="hf_xxx"
72
+
73
+ # Create and activate the inference environment
74
+ conda create -n cdm_infer python=3.10
75
+ conda activate cdm_infer
76
+ pip install -r config/requirements_infer.txt
77
+
78
+ # Run inference
79
+ python scripts/infer/sd3_m.py # SD3.5-Medium
80
+ python scripts/infer/longcat.py # LongCat
81
+ ```
82
+
83
+ ## Training
84
+
85
+ ```bash
86
+ # Clone this repository
87
+ git clone https://github.com/NVlabs/DiffusionNFT.git
88
+ cd DiffusionNFT
89
+
90
+ # Create and activate the training environment
91
+ conda create -n cdm_train python=3.10
92
+ conda activate cdm_train
93
+ pip install -r config/requirements_train.txt
94
+ pip install flash-attn==2.7.4.post1 --no-build-isolation # May take 1-2 hours
95
+
96
+ # Launch training with FSDP2
97
+ accelerate launch --config_file config/accelerate_fsdp2.yaml \
98
+ --num_processes 1 -m scripts.train \
99
+ --config config/config.py:sd3 # SD3.5-Medium
100
+
101
+ accelerate launch --config_file config/accelerate_fsdp2.yaml \
102
+ --num_processes 1 -m scripts.train \
103
+ --config config/config.py:longcat # LongCat
104
+ ```
105
+
106
+ ## Evaluation
107
+
108
+ Evaluation is split into two phases: **image generation** and **metric computation**.
109
+
110
+ ### Step 1 — Export a checkpoint to a pipeline
111
+
112
+ ```bash
113
+ conda activate cdm_train
114
+
115
+ python -m scripts.save \
116
+ --experiment_dir "logs/experiments/sd3/test" \
117
+ --output_dir "logs/pipelines/test" \
118
+ --checkpoint_steps "10"
119
+ ```
120
+
121
+ ### Step 2 — Generate images
122
+
123
+ ```bash
124
+ accelerate launch --num_processes 1 -m scripts.eval \
125
+ --phase generate \
126
+ --model_path "logs/pipelines/test/checkpoint-2000" \
127
+ --eval_metrics imagereward clipscore pickscore hpsv2 hpsv3 aesthetic ocr dpgbench fid \
128
+ --output_dir "logs/evaluations/test" \
129
+ --base_model sd3 \
130
+ --save_images
131
+ ```
132
+
133
+ ### Step 3 — Compute metrics
134
+
135
+ ```bash
136
+ # Create a separate environment for evaluation dependencies
137
+ conda create -n cdm_eval python=3.10
138
+ conda activate cdm_eval
139
+ pip install -r config/requirements_eval.txt
140
+ pip install image-reward --no-deps
141
+ pip install fairseq --no-deps
142
+
143
+ # NOTE: If running on multiple GPUs, download checkpoints on 1 GPU first.
144
+ # For FID evaluation, place COCO 2014 val images under: dataset/coco2014val_10k/images
145
+
146
+ accelerate launch --num_processes 1 -m scripts.eval \
147
+ --phase evaluate \
148
+ --eval_metrics imagereward clipscore pickscore hpsv2 hpsv3 aesthetic ocr dpgbench fid \
149
+ --output_dir "logs/evaluations/test"
150
+ ```
151
+
152
+ ## License
153
+
154
+ This project is licensed under the MIT License — see the [LICENSE](LICENSE) file for details.
155
+
156
+ ## Citation
157
+
158
+ If our work assists your research, please consider giving us a star ⭐ or citing us:
159
+
160
+ ```bibtex
161
+ ```