File size: 4,538 Bytes
09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 d5e7d0a 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f b78b250 09ee09f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1
tags:
- diffusion
- dllm
- bd3lm
- distillation
- arxiv:2604.26951
---
<center> <div style="text-align: center;"> <img src="logo.gif" width="300" />
</div> </center>
# distill-LLaDA2-TIDE_Shared
This model was introduced in the paper [Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models](https://huggingface.co/papers/2604.26951).
`distill-LLaDA2-TIDE_Shared` is a 0.6B diffusion language model distilled from LLaDA2.0-mini (16B MoE) into the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) student in the **Cross-Tokenizer (Pipeline A)** of the TIDE framework. TIDAL + CompDemo applied within the cross-tokenizer pipeline (non-native ablation).
## Model Overview
- **Method**: TIDE — [Reverse CALM / TIDAL / CompDemo](https://arxiv.org/abs/2604.26951) (cross-architecture distillation for diffusion LMs)
- **Framework**: [TIDE / dLLM](https://github.com/PKU-YuanGroup/TIDE)
- **Student (initialization)**: [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) (BD3LM, block_size=32)
- **Teacher**: [`inclusionAI/LLaDA2.0-mini`](https://huggingface.co/inclusionAI/LLaDA2.0-mini)
- **Distillation mode**: `--distill_mode alm_taid --use_comp_demo True`
- **Datasets**: [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture), [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), [opc-sft-stage1](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1) and [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) — same composition as the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) base. Pre-tokenized for this teacher in [`TIDE-dllm/distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft).
## Installation
```shell
pip install torch transformers accelerate
```
## Quick Start
> [!NOTE]
> This checkpoint is fully compatible with the BD3LM `generate(...)` routine published with [`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) — only the model name changes.
```python
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
repo = "TIDE-dllm/distill-LLaDA2-TIDE_Shared"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForMaskedLM.from_pretrained(
repo, dtype=torch.bfloat16, trust_remote_code=True,
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
prompts = [
[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Implement a DFS traversal in Python with clear inline comments."},
],
]
encoded = [tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=True, enable_thinking=False) for m in prompts]
# ... use the same `generate()` function as in dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1.
```
## Command-Line Interface
For an interactive demo (visualised iterative denoising), use the script in the [TIDE / dLLM repo](https://github.com/PKU-YuanGroup/TIDE):
```shell
python -u examples/a2d/bd3lm/chat.py \
--model_name_or_path TIDE-dllm/distill-LLaDA2-TIDE_Shared \
--chat_template True --block_size 32 --remasking low_confidence \
--steps 256 --max_new_tokens 256
```
## Reproducing this checkpoint
```shell
git clone https://github.com/PKU-YuanGroup/TIDE && cd TIDE
pip install -e . && git submodule update --init --recursive
pip install -e "lm-evaluation-harness[ifeval,math]" && pip install -e "tokenkit[full]"
# Download the pre-tokenized SFT mixture for this teacher
huggingface-cli download TIDE-dllm/distill_llada2_sft --repo-type dataset \
--local-dir data/distill_llada2_sft
bash scripts/distill_llada2.sh \
--data_path data/distill_llada2_sft \
--distill_mode alm_taid --use_comp_demo True \
--num_gpus 8
```
## Citation
```bibtex
@misc{zhang2026turningtidecrossarchitecturedistillation,
title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
year={2026},
eprint={2604.26951},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.26951},
}
```
|