Spaces:

TIDE-dllm
/

README

Running

App Files Files Community

N2048M commited on 11 days ago

Commit

7ce3008

verified ·

1 Parent(s): 906ebd4

Replace placeholder index.html with TIDE org card content

Browse files

Files changed (1) hide show

index.html +145 -17

index.html CHANGED Viewed

@@ -1,19 +1,147 @@
 <!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
 </html>

 <!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <meta name="viewport" content="width=device-width,initial-scale=1" />
+  <title>TIDE-dllm — Turning the TIDE</title>
+  <style>
+    :root {
+      --tide-navy: #003D5B;
+      --tide-cyan: #00B4D8;
+      --bg: #ffffff;
+      --fg: #1f2328;
+      --muted: #6e7781;
+      --border: #d0d7de;
+      --row-stripe: #f6f8fa;
+      --code-bg: #f6f8fa;
+    }
+    @media (prefers-color-scheme: dark) {
+      :root {
+        --bg: #0d1117;
+        --fg: #e6edf3;
+        --muted: #8b949e;
+        --border: #30363d;
+        --row-stripe: #161b22;
+        --code-bg: #161b22;
+      }
+    }
+    html, body { margin: 0; padding: 0; background: var(--bg); color: var(--fg); }
+    body {
+      font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", "Helvetica Neue", Arial, "PingFang SC", sans-serif;
+      line-height: 1.55;
+      max-width: 880px;
+      margin: 0 auto;
+      padding: 2rem 1.25rem 3rem;
+    }
+    .logo { text-align: center; margin: 0.5rem 0 1rem; }
+    .logo img { max-width: 320px; width: 60%; height: auto; }
+    h1 { text-align: center; font-size: 1.6rem; line-height: 1.3; color: var(--tide-navy); margin: 0.25rem 0 0.5rem; }
+    @media (prefers-color-scheme: dark) { h1 { color: var(--tide-cyan); } }
+    .tagline { text-align: center; color: var(--muted); margin: 0 0 1rem; font-size: 0.95rem; }
+    .badges { text-align: center; margin: 0.5rem 0 1.5rem; }
+    .badges a {
+      display: inline-block;
+      padding: 0.35em 0.9em;
+      margin: 0.2em;
+      border-radius: 999px;
+      background: var(--tide-navy);
+      color: #fff;
+      text-decoration: none;
+      font-size: 0.88rem;
+      transition: background 0.15s;
+    }
+    .badges a:hover { background: var(--tide-cyan); }
+    h2 { color: var(--tide-navy); border-bottom: 1px solid var(--border); padding-bottom: 0.3em; margin-top: 2rem; font-size: 1.25rem; }
+    @media (prefers-color-scheme: dark) { h2 { color: var(--tide-cyan); } }
+    a { color: var(--tide-navy); }
+    @media (prefers-color-scheme: dark) { a { color: var(--tide-cyan); } }
+    table { border-collapse: collapse; width: 100%; font-size: 0.92rem; margin: 0.5rem 0; }
+    th, td { padding: 0.45em 0.6em; text-align: left; border-bottom: 1px solid var(--border); }
+    th { background: var(--row-stripe); font-weight: 600; }
+    tr:nth-child(even) td { background: var(--row-stripe); }
+    code, pre { font-family: ui-monospace, SFMono-Regular, "SF Mono", Menlo, Consolas, monospace; }
+    code { background: var(--code-bg); padding: 0.1em 0.35em; border-radius: 4px; font-size: 0.88em; }
+    pre { background: var(--code-bg); padding: 0.85em 1em; border-radius: 6px; overflow-x: auto; font-size: 0.85rem; }
+    pre code { background: none; padding: 0; }
+    ul.highlights { margin: 0.5em 0; padding-left: 1.25rem; }
+    ul.highlights li { margin-bottom: 0.4em; }
+    hr { border: none; border-top: 1px solid var(--border); margin: 1.75rem 0; }
+  </style>
+</head>
+<body>
+<div class="logo"><img src="logo.gif" alt="TIDE logo" /></div>
+<h1>Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models</h1>
+<p class="tagline">🌊 The first cross-architecture distillation framework for diffusion LLMs — distilling 8B dense and 16B MoE teachers into a 0.6B student 🌊</p>
+<div class="badges">
+  <a href="https://arxiv.org/abs/2604.26951" target="_blank">📄 arXiv 2604.26951</a>
+  <a href="https://github.com/PKU-YuanGroup/TIDE" target="_blank">💻 Code</a>
+  <a href="https://pku-yuangroup.github.io/TIDE-Page/" target="_blank">🌐 Project page</a>
+</div>
+<p>This organization hosts the <strong>distilled student checkpoints</strong> and <strong>pre-tokenized SFT datasets</strong> released with TIDE. The framework consists of three modular components — <strong>TIDAL</strong> (dual-axis interpolation), <strong>CompDemo</strong> (complementary mask-split teacher inference), and <strong>Reverse CALM</strong> (cross-tokenizer chunk-level matching) — and is evaluated across two heterogeneous distillation pipelines.</p>
+<h2>✨ Highlights</h2>
+<ul class="highlights">
+  <li><strong>+1.53 average gain</strong> over the non-distilled BD3LM baseline across 8 benchmarks (34.20 vs. 32.67).</li>
+  <li><strong>+16.48 on HumanEval</strong> over the equivalent-size AR baseline (48.78 vs. 32.30) — distilled dLLMs especially excel at code generation.</li>
+  <li><strong>22× peak-memory reduction</strong> vs. the 16B MoE LLaDA2 teacher (1.4 GB vs. 31.3 GB) and <strong>5.2× faster inference</strong> (6.25 s vs. 32.55 s for 256 tokens on H100).</li>
+</ul>
+<h2>🤖 Released models</h2>
+<p>Six 0.6B distilled student checkpoints (3 per pipeline). Each is initialized from <a href="https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1"><code>dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1</code></a> and distilled from a larger dLLM teacher.</p>
+<table>
+  <thead><tr><th>Pipeline</th><th>Variant</th><th>Repo</th></tr></thead>
+  <tbody>
+    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td><strong>TIDE-Cross</strong> (native, paper-best)</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Cross">distill-LLaDA2-TIDE_Cross</a></td></tr>
+    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td>TIDE-Shared variant</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-TIDE_Shared">distill-LLaDA2-TIDE_Shared</a></td></tr>
+    <tr><td>A — Cross-Tokenizer (LLaDA2 teacher)</td><td>CALM baseline</td><td><a href="https://huggingface.co/TIDE-dllm/distill-LLaDA2-CALM">distill-LLaDA2-CALM</a></td></tr>
+    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td><strong>TIDE-Shared</strong> (native, paper-best)</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Shared">distill-WeDLM-TIDE_Shared</a></td></tr>
+    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td>TIDE-Cross variant</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-TIDE_Cross">distill-WeDLM-TIDE_Cross</a></td></tr>
+    <tr><td>B — Shared-Tokenizer (WeDLM teacher)</td><td>KL baseline</td><td><a href="https://huggingface.co/TIDE-dllm/distill-WeDLM-KL">distill-WeDLM-KL</a></td></tr>
+  </tbody>
+</table>
+<h2>📚 Released datasets</h2>
+<p>Pre-tokenized SFT mixtures (<code>tulu-3-sft-mixture</code> + <code>smoltalk</code> + <code>opc-sft-stage1</code> + <code>opc-sft-stage2</code>) prepared for each teacher, so distillation jobs never re-tokenize at startup.</p>
+<table>
+  <thead><tr><th>Pipeline</th><th>Repo</th></tr></thead>
+  <tbody>
+    <tr><td>A — for the LLaDA2 teacher</td><td><a href="https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft">distill_llada2_sft</a></td></tr>
+    <tr><td>B — for the WeDLM teacher</td><td><a href="https://huggingface.co/datasets/TIDE-dllm/distill_wedlm_sft">distill_wedlm_sft</a></td></tr>
+  </tbody>
+</table>
+<h2>🚀 Quick start</h2>
+<pre><code>import torch
+from transformers import AutoModelForMaskedLM, AutoTokenizer
+repo = "TIDE-dllm/distill-LLaDA2-TIDE_Cross"   # paper-best Pipeline-A checkpoint
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = AutoModelForMaskedLM.from_pretrained(
+    repo, dtype=torch.bfloat16, trust_remote_code=True,
+).to(device).eval()
+tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
+</code></pre>
+<p>The same <code>generate()</code> routine published with <a href="https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1"><code>dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1</code></a> works on every TIDE checkpoint — just swap the model name.</p>
+<h2>📝 Citation</h2>
+<pre><code>@misc{zhang2026turningtidecrossarchitecturedistillation,
+      title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
+      author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
+      year={2026},
+      eprint={2604.26951},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={https://arxiv.org/abs/2604.26951},
+}</code></pre>
+</body>
 </html>