Spaces:

TIDE-dllm
/

README

Running

App Files Files Community

N2048M commited on 8 days ago

Commit

a3c5430

verified ·

1 Parent(s): dc6a38a

Remove index.html so org card renders README.md (matches Qwen/meta-llama/google convention)

Browse files

Files changed (1) hide show

index.html +0 -86

index.html DELETED Viewed

@@ -1,86 +0,0 @@
-<div align="center">
-<img src="https://huggingface.co/spaces/TIDE-dllm/README/resolve/main/logo.gif" alt="TIDE logo" width="320" />
-<h1>Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models</h1>
-<p>🌊 The first cross-architecture distillation framework for diffusion LLMs — distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊</p>
-<p>
-  <a href="https://arxiv.org/abs/2604.26951"><img alt="arXiv" src="https://img.shields.io/badge/arXiv-2604.26951-b31b1b.svg?logo=arxiv" /></a>
-  <a href="https://github.com/PKU-YuanGroup/TIDE"><img alt="Code" src="https://img.shields.io/badge/Code-PKU--YuanGroup%2FTIDE-181717.svg?logo=github" /></a>
-  <a href="https://pku-yuangroup.github.io/TIDE-Page/"><img alt="Project Page" src="https://img.shields.io/badge/Project-Page-2ea44f" /></a>
-  <a href="https://huggingface.co/papers/2604.26951"><img alt="HF Paper" src="https://img.shields.io/badge/%F0%9F%A4%97-Paper-blue" /></a>
-</p>
-</div>
-<p>This organization hosts the <strong>distilled student checkpoints</strong> and <strong>pre-tokenized SFT datasets</strong> released with TIDE. The framework consists of three modular components — <strong>TIDAL</strong> (dual-axis interpolation), <strong>CompDemo</strong> (complementary mask-split teacher inference), and <strong>Reverse CALM</strong> (cross-tokenizer chunk-level matching) — and is evaluated across two heterogeneous distillation pipelines.</p>
-<h2>✨ Highlights</h2>
-<ul>
-  <li><strong>+1.53 average gain</strong> over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).</li>
-  <li><strong>+16.48 on HumanEval</strong> over the equivalent-size AR baseline (48.78 vs. 32.30) — distilled dLLMs especially excel at code generation.</li>
-  <li><strong>22× peak-memory reduction</strong> vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and <strong>5.2× faster inference</strong> (6.25 s vs. 32.55 s for 256 tokens on H100).</li>
-</ul>
-<h2>🤖 Released models</h2>
-<p>Six 0.6B distilled student checkpoints (3 per pipeline). Each is initialized from <a href="https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1"><code>dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1</code></a> and distilled from a larger dLLM teacher.</p>
-<table>
-  <thead>
-    <tr><th>Pipeline</th><th>Variant</th><th>Repo</th></tr>
-  </thead>
-  <tbody>
-    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td><strong>TIDE-Cross</strong> (native, paper-best)</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross">distill-LLaDA2-TIDE_Cross</a></td></tr>
-    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td>TIDE-Shared variant</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared">distill-LLaDA2-TIDE_Shared</a></td></tr>
-    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td>CALM baseline</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM">distill-LLaDA2-CALM</a></td></tr>
-    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td><strong>TIDE-Shared</strong> (native, paper-best)</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared">distill-WeDLM-TIDE_Shared</a></td></tr>
-    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td>TIDE-Cross variant</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross">distill-WeDLM-TIDE_Cross</a></td></tr>
-    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td>KL baseline</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-KL">distill-WeDLM-KL</a></td></tr>
-  </tbody>
-</table>
-<h2>📚 Released datasets</h2>
-<p>Pre-tokenized SFT mixtures (<code>tulu-3-sft-mixture</code> + <code>smoltalk</code> + <code>opc-sft-stage1</code> + <code>opc-sft-stage2</code>) prepared for each teacher, so distillation jobs never re-tokenize at startup.</p>
-<table>
-  <thead>
-    <tr><th>Pipeline</th><th>Repo</th></tr>
-  </thead>
-  <tbody>
-    <tr><td>A — for the LLaDA2 teacher</td><td><a href="https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft">distill_llada2_sft</a></td></tr>
-    <tr><td>B — for the WeDLM teacher</td><td><a href="https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft">distill_wedlm_sft</a></td></tr>
-  </tbody>
-</table>
-<h2>🚀 Quick start</h2>
-<pre><code>import torch
-from transformers import AutoModelForMaskedLM, AutoTokenizer
-repo = "TIDE-dllm/distill-LLaDA2-TIDE_Cross"   # paper-best Pipeline-A checkpoint
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = AutoModelForMaskedLM.from_pretrained(
-    repo, dtype=torch.bfloat16, trust_remote_code=True,
-).to(device).eval()
-tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
-</code></pre>
-<p>The same <code>generate()</code> routine published with <a href="https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1"><code>dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1</code></a> works on every TIDE checkpoint — just swap the model name.</p>
-<h2>📝 Citation</h2>
-<pre><code>@misc{zhang2026turningtidecrossarchitecturedistillation,
-      title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
-      author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
-      year={2026},
-      eprint={2604.26951},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2604.26951},
-}
-</code></pre>