Spaces:

TIDE-dllm
/

README

Running

App Files Files Community

N2048M commited on 10 days ago

Commit

dc6a38a

verified ·

1 Parent(s): 55cb8ab

Polish org card README (markdown rendering, absolute logo URL, shields badges)

Browse files

Files changed (1) hide show

README.md +23 -21

README.md CHANGED Viewed

@@ -4,23 +4,25 @@ emoji: 🌊
 colorFrom: blue
 colorTo: indigo
 sdk: static
-pinned: false
 ---
-<center>
-<img src="logo.gif" width="320" />
-</center>
-<h1 align="center">Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models</h1>
-<p align="center">
-  🌊 The first cross-architecture distillation framework for diffusion LLMs — distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊
-</p>
 <p align="center">
-  <a href="https://arxiv.org/abs/2604.26951">📄 arXiv 2604.26951</a> &nbsp;·&nbsp;
-  <a href="https://github.com/PKU-YuanGroup/TIDE">💻 Code (PKU-YuanGroup/TIDE)</a> &nbsp;·&nbsp;
-  <a href="https://pku-yuangroup.github.io/TIDE-Page/">🌐 Project page</a>
 </p>
 ---
@@ -31,7 +33,7 @@ This organization hosts the **distilled student checkpoints** and **pre-tokenize
 - **+1.53 average gain** over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).
 - **+16.48 on HumanEval** over the equivalent-size AR baseline (48.78 vs. 32.30) — distilled dLLMs especially excel at code generation.
-- **22× peak-memory reduction** vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and **5.2× faster inference** (6.25 s vs. 32.55 s for 256 tokens on H100), enabling commodity-hardware deployment.
 ## 🤖 Released models
@@ -39,21 +41,21 @@ Six 0.6B distilled student checkpoints (3 per pipeline). Each is initialized fro
 | Pipeline | Variant | Repo |
 |---|---|---|
-| A — Cross-Tokenizer (LLaDA2 teacher) | **TIDE-Cross** (native, paper-best) | [distill-LLaDA2-TIDE_Cross](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross) |
-| A — Cross-Tokenizer (LLaDA2 teacher) | TIDE-Shared variant | [distill-LLaDA2-TIDE_Shared](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared) |
-| A — Cross-Tokenizer (LLaDA2 teacher) | CALM baseline | [distill-LLaDA2-CALM](https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM) |
-| B — Shared-Tokenizer (WeDLM teacher) | **TIDE-Shared** (native, paper-best) | [distill-WeDLM-TIDE_Shared](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared) |
-| B — Shared-Tokenizer (WeDLM teacher) | TIDE-Cross variant | [distill-WeDLM-TIDE_Cross](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross) |
-| B — Shared-Tokenizer (WeDLM teacher) | KL baseline | [distill-WeDLM-KL](https://huggingface.co/TIDE-dllm/distill-WeDLM-KL) |
 ## 📚 Released datasets
-Pre-tokenized SFT mixtures (`tulu-3-sft-mixture` + `smoltalk` + `opc-sft-stage1` + `opc-sft-stage2`) prepared for each teacher, so distillation jobs never have to re-tokenize at startup.
 | Pipeline | Repo |
 |---|---|
-| A — for the LLaDA2 teacher | [distill_llada2_sft](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft) |
-| B — for the WeDLM teacher | [distill_wedlm_sft](https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft) |
 ## 🚀 Quick start

 colorFrom: blue
 colorTo: indigo
 sdk: static
+pinned: true
 ---
+<div align="center">
+  <img src="https://huggingface.co/spaces/TIDE-dllm/README/resolve/main/logo.gif" alt="TIDE logo" width="320" />
+</div>
+<h1 align="center">Turning the TIDE</h1>
+<p align="center"><em>Cross-Architecture Distillation for Diffusion Large Language Models</em></p>
+<p align="center">🌊 The first cross-architecture distillation framework for diffusion LLMs — distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊</p>
 <p align="center">
+  <a href="https://arxiv.org/abs/2604.26951"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2604.26951-b31b1b.svg?logo=arxiv" /></a>
+  <a href="https://huggingface.co/papers/2604.26951"><img alt="HF Paper" src="https://img.shields.io/badge/%F0%9F%A4%97-Paper-blue" /></a>
+  <a href="https://github.com/PKU-YuanGroup/TIDE"><img alt="Code" src="https://img.shields.io/badge/Code-PKU--YuanGroup%2FTIDE-181717.svg?logo=github" /></a>
+  <a href="https://pku-yuangroup.github.io/TIDE-Page/"><img alt="Project Page" src="https://img.shields.io/badge/Project-Page-2ea44f" /></a>
+  <a href="https://opensource.org/licenses/Apache-2.0"><img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" /></a>
 </p>
 ---
 - **+1.53 average gain** over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).
 - **+16.48 on HumanEval** over the equivalent-size AR baseline (48.78 vs. 32.30) — distilled dLLMs especially excel at code generation.
+- **22× peak-memory reduction** vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and **5.2× faster inference** (6.25 s vs. 32.55 s for 256 tokens on H100).
 ## 🤖 Released models
 | Pipeline | Variant | Repo |
 |---|---|---|
+| A — Cross-Tokenizer (LLaDA2 teacher) | **TIDE-Cross** *(native, paper-best)* | [`distill-LLaDA2-TIDE_Cross`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross) |
+| A — Cross-Tokenizer (LLaDA2 teacher) | TIDE-Shared variant | [`distill-LLaDA2-TIDE_Shared`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared) |
+| A — Cross-Tokenizer (LLaDA2 teacher) | CALM baseline | [`distill-LLaDA2-CALM`](https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM) |
+| B — Shared-Tokenizer (WeDLM teacher) | **TIDE-Shared** *(native, paper-best)* | [`distill-WeDLM-TIDE_Shared`](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared) |
+| B — Shared-Tokenizer (WeDLM teacher) | TIDE-Cross variant | [`distill-WeDLM-TIDE_Cross`](https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross) |
+| B — Shared-Tokenizer (WeDLM teacher) | KL baseline | [`distill-WeDLM-KL`](https://huggingface.co/TIDE-dllm/distill-WeDLM-KL) |
 ## 📚 Released datasets
+Pre-tokenized SFT mixtures (`tulu-3-sft-mixture` + `smoltalk` + `opc-sft-stage1` + `opc-sft-stage2`) prepared for each teacher, so distillation jobs never re-tokenize at startup.
 | Pipeline | Repo |
 |---|---|
+| A — for the LLaDA2 teacher | [`distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft) |
+| B — for the WeDLM teacher | [`distill_wedlm_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft) |
 ## 🚀 Quick start