| --- |
| license: apache-2.0 |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1 |
| tags: |
| - diffusion |
| - dllm |
| - bd3lm |
| - distillation |
| - arxiv:2604.26951 |
| --- |
| |
| <center> <div style="text-align: center;"> <img src="logo.gif" width="300" /> |
| </div> </center> |
|
|
| # distill-LLaDA2-CALM |
|
|
| This model was introduced in the paper [Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models](https://huggingface.co/papers/2604.26951). |
|
|
| `distill-LLaDA2-CALM` is a 0.6B diffusion language model distilled from LLaDA2.0-mini (16B MoE) into the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) student in the **Cross-Tokenizer (Pipeline A)** of the TIDE framework. Forward CALM (chunk-level approximate likelihood matching) baseline. |
|
|
| ## Model Overview |
|
|
| - **Method**: TIDE — [Reverse CALM / TIDAL / CompDemo](https://arxiv.org/abs/2604.26951) (cross-architecture distillation for diffusion LMs) |
| - **Framework**: [TIDE / dLLM](https://github.com/PKU-YuanGroup/TIDE) |
| - **Student (initialization)**: [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) (BD3LM, block_size=32) |
| - **Teacher**: [`inclusionAI/LLaDA2.0-mini`](https://huggingface.co/inclusionAI/LLaDA2.0-mini) |
| - **Distillation mode**: `--distill_mode alm` |
| - **Datasets**: [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture), [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), [opc-sft-stage1](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1) and [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) — same composition as the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) base. Pre-tokenized for this teacher in [`TIDE-dllm/distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft). |
|
|
| ## Installation |
|
|
| ```shell |
| pip install torch transformers accelerate |
| ``` |
|
|
| ## Quick Start |
|
|
| > [!NOTE] |
| > This checkpoint is fully compatible with the BD3LM `generate(...)` routine published with [`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) — only the model name changes. |
|
|
| ```python |
| import torch |
| from transformers import AutoModelForMaskedLM, AutoTokenizer |
| |
| repo = "TIDE-dllm/distill-LLaDA2-CALM" |
| device = "cuda" if torch.cuda.is_available() else "cpu" |
| |
| model = AutoModelForMaskedLM.from_pretrained( |
| repo, dtype=torch.bfloat16, trust_remote_code=True, |
| ).to(device).eval() |
| tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True) |
| |
| prompts = [ |
| [ |
| {"role": "system", "content": "You are a helpful AI assistant."}, |
| {"role": "user", "content": "Implement a DFS traversal in Python with clear inline comments."}, |
| ], |
| ] |
| encoded = [tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=True, enable_thinking=False) for m in prompts] |
| # ... use the same `generate()` function as in dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1. |
| ``` |
|
|
| ## Command-Line Interface |
|
|
| For an interactive demo (visualised iterative denoising), use the script in the [TIDE / dLLM repo](https://github.com/PKU-YuanGroup/TIDE): |
|
|
| ```shell |
| python -u examples/a2d/bd3lm/chat.py \ |
| --model_name_or_path TIDE-dllm/distill-LLaDA2-CALM \ |
| --chat_template True --block_size 32 --remasking low_confidence \ |
| --steps 256 --max_new_tokens 256 |
| ``` |
|
|
| ## Reproducing this checkpoint |
|
|
| ```shell |
| git clone https://github.com/PKU-YuanGroup/TIDE && cd TIDE |
| pip install -e . && git submodule update --init --recursive |
| pip install -e "lm-evaluation-harness[ifeval,math]" && pip install -e "tokenkit[full]" |
| |
| # Download the pre-tokenized SFT mixture for this teacher |
| huggingface-cli download TIDE-dllm/distill_llada2_sft --repo-type dataset \ |
| --local-dir data/distill_llada2_sft |
| |
| bash scripts/distill_llada2.sh \ |
| --data_path data/distill_llada2_sft \ |
| --distill_mode alm \ |
| --num_gpus 8 |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{zhang2026turningtidecrossarchitecturedistillation, |
| title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models}, |
| author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan}, |
| year={2026}, |
| eprint={2604.26951}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2604.26951}, |
| } |
| ``` |
|
|