N2048M commited on
Commit
dc6a38a
Β·
verified Β·
1 Parent(s): 55cb8ab

Polish org card README (markdown rendering, absolute logo URL, shields badges)

Browse files
Files changed (1) hide show
  1. README.md +23 -21
README.md CHANGED
@@ -4,23 +4,25 @@ emoji: 🌊
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: static
7
- pinned: false
8
  ---
9
 
10
- <center>
11
- <img src="logo.gif" width="320" />
12
- </center>
13
 
14
- <h1 align="center">Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models</h1>
15
 
16
- <p align="center">
17
- 🌊 The first cross-architecture distillation framework for diffusion LLMs β€” distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊
18
- </p>
19
 
20
  <p align="center">
21
- <a href="https://arxiv.org/abs/2604.26951">πŸ“„ arXiv 2604.26951</a> &nbsp;Β·&nbsp;
22
- <a href="https://github.com/PKU-YuanGroup/TIDE">πŸ’» Code (PKU-YuanGroup/TIDE)</a> &nbsp;Β·&nbsp;
23
- <a href="https://pku-yuangroup.github.io/TIDE-Page/">🌐 Project page</a>
 
 
24
  </p>
25
 
26
  ---
@@ -31,7 +33,7 @@ This organization hosts the **distilled student checkpoints** and **pre-tokenize
31
 
32
  - **+1.53 average gain** over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).
33
  - **+16.48 on HumanEval** over the equivalent-size AR baseline (48.78 vs. 32.30) β€” distilled dLLMs especially excel at code generation.
34
- - **22Γ— peak-memory reduction** vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and **5.2Γ— faster inference** (6.25 s vs. 32.55 s for 256 tokens on H100), enabling commodity-hardware deployment.
35
 
36
  ## πŸ€– Released models
37
 
@@ -39,21 +41,21 @@ Six 0.6B distilled student checkpoints (3 per pipeline). Each is initialized fro
39
 
40
  | Pipeline | Variant | Repo |
41
  |---|---|---|
42
- | A β€” Cross-Tokenizer (LLaDA2 teacher) | **TIDE-Cross** (native, paper-best) | [distill-LLaDA2-TIDE_Cross](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross) |
43
- | A β€” Cross-Tokenizer (LLaDA2 teacher) | TIDE-Shared variant | [distill-LLaDA2-TIDE_Shared](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared) |
44
- | A β€” Cross-Tokenizer (LLaDA2 teacher) | CALM baseline | [distill-LLaDA2-CALM](https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM) |
45
- | B β€” Shared-Tokenizer (WeDLM teacher) | **TIDE-Shared** (native, paper-best) | [distill-WeDLM-TIDE_Shared](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared) |
46
- | B β€” Shared-Tokenizer (WeDLM teacher) | TIDE-Cross variant | [distill-WeDLM-TIDE_Cross](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross) |
47
- | B β€” Shared-Tokenizer (WeDLM teacher) | KL baseline | [distill-WeDLM-KL](https://huggingface.co/TIDE-dllm/distill-WeDLM-KL) |
48
 
49
  ## πŸ“š Released datasets
50
 
51
- Pre-tokenized SFT mixtures (`tulu-3-sft-mixture` + `smoltalk` + `opc-sft-stage1` + `opc-sft-stage2`) prepared for each teacher, so distillation jobs never have to re-tokenize at startup.
52
 
53
  | Pipeline | Repo |
54
  |---|---|
55
- | A β€” for the LLaDA2 teacher | [distill_llada2_sft](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft) |
56
- | B β€” for the WeDLM teacher | [distill_wedlm_sft](https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft) |
57
 
58
  ## πŸš€ Quick start
59
 
 
4
  colorFrom: blue
5
  colorTo: indigo
6
  sdk: static
7
+ pinned: true
8
  ---
9
 
10
+ <div align="center">
11
+ <img src="https://huggingface.co/spaces/TIDE-dllm/README/resolve/main/logo.gif" alt="TIDE logo" width="320" />
12
+ </div>
13
 
14
+ <h1 align="center">Turning the TIDE</h1>
15
 
16
+ <p align="center"><em>Cross-Architecture Distillation for Diffusion Large Language Models</em></p>
17
+
18
+ <p align="center">🌊 The first cross-architecture distillation framework for diffusion LLMs β€” distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊</p>
19
 
20
  <p align="center">
21
+ <a href="https://arxiv.org/abs/2604.26951"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2604.26951-b31b1b.svg?logo=arxiv" /></a>
22
+ <a href="https://huggingface.co/papers/2604.26951"><img alt="HF Paper" src="https://img.shields.io/badge/%F0%9F%A4%97-Paper-blue" /></a>
23
+ <a href="https://github.com/PKU-YuanGroup/TIDE"><img alt="Code" src="https://img.shields.io/badge/Code-PKU--YuanGroup%2FTIDE-181717.svg?logo=github" /></a>
24
+ <a href="https://pku-yuangroup.github.io/TIDE-Page/"><img alt="Project Page" src="https://img.shields.io/badge/Project-Page-2ea44f" /></a>
25
+ <a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" /></a>
26
  </p>
27
 
28
  ---
 
33
 
34
  - **+1.53 average gain** over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).
35
  - **+16.48 on HumanEval** over the equivalent-size AR baseline (48.78 vs. 32.30) β€” distilled dLLMs especially excel at code generation.
36
+ - **22Γ— peak-memory reduction** vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and **5.2Γ— faster inference** (6.25 s vs. 32.55 s for 256 tokens on H100).
37
 
38
  ## πŸ€– Released models
39
 
 
41
 
42
  | Pipeline | Variant | Repo |
43
  |---|---|---|
44
+ | A β€” Cross-Tokenizer (LLaDA2 teacher) | **TIDE-Cross** *(native, paper-best)* | [`distill-LLaDA2-TIDE_Cross`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross) |
45
+ | A β€” Cross-Tokenizer (LLaDA2 teacher) | TIDE-Shared variant | [`distill-LLaDA2-TIDE_Shared`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared) |
46
+ | A β€” Cross-Tokenizer (LLaDA2 teacher) | CALM baseline | [`distill-LLaDA2-CALM`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM) |
47
+ | B β€” Shared-Tokenizer (WeDLM teacher) | **TIDE-Shared** *(native, paper-best)* | [`distill-WeDLM-TIDE_Shared`](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared) |
48
+ | B β€” Shared-Tokenizer (WeDLM teacher) | TIDE-Cross variant | [`distill-WeDLM-TIDE_Cross`](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross) |
49
+ | B β€” Shared-Tokenizer (WeDLM teacher) | KL baseline | [`distill-WeDLM-KL`](https://huggingface.co/TIDE-dllm/distill-WeDLM-KL) |
50
 
51
  ## πŸ“š Released datasets
52
 
53
+ Pre-tokenized SFT mixtures (`tulu-3-sft-mixture` + `smoltalk` + `opc-sft-stage1` + `opc-sft-stage2`) prepared for each teacher, so distillation jobs never re-tokenize at startup.
54
 
55
  | Pipeline | Repo |
56
  |---|---|
57
+ | A β€” for the LLaDA2 teacher | [`distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft) |
58
+ | B β€” for the WeDLM teacher | [`distill_wedlm_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft) |
59
 
60
  ## πŸš€ Quick start
61