File size: 4,443 Bytes
8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 d6be459 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 a920379 8db3da6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | ---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1
tags:
- diffusion
- dllm
- bd3lm
- distillation
- arxiv:2604.26951
---
<center> <div style="text-align: center;"> <img src="logo.gif" width="300" />
</div> </center>
# distill-LLaDA2-CALM
This model was introduced in the paper [Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models](https://huggingface.co/papers/2604.26951).
`distill-LLaDA2-CALM` is a 0.6B diffusion language model distilled from LLaDA2.0-mini (16B MoE) into the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) student in the **Cross-Tokenizer (Pipeline A)** of the TIDE framework. Forward CALM (chunk-level approximate likelihood matching) baseline.
## Model Overview
- **Method**: TIDE — [Reverse CALM / TIDAL / CompDemo](https://arxiv.org/abs/2604.26951) (cross-architecture distillation for diffusion LMs)
- **Framework**: [TIDE / dLLM](https://github.com/PKU-YuanGroup/TIDE)
- **Student (initialization)**: [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) (BD3LM, block_size=32)
- **Teacher**: [`inclusionAI/LLaDA2.0-mini`](https://huggingface.co/inclusionAI/LLaDA2.0-mini)
- **Distillation mode**: `--distill_mode alm`
- **Datasets**: [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture), [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), [opc-sft-stage1](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1) and [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) — same composition as the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) base. Pre-tokenized for this teacher in [`TIDE-dllm/distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft).
## Installation
```shell
pip install torch transformers accelerate
```
## Quick Start
> [!NOTE]
> This checkpoint is fully compatible with the BD3LM `generate(...)` routine published with [`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) — only the model name changes.
```python
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer
repo = "TIDE-dllm/distill-LLaDA2-CALM"
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForMaskedLM.from_pretrained(
repo, dtype=torch.bfloat16, trust_remote_code=True,
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)
prompts = [
[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Implement a DFS traversal in Python with clear inline comments."},
],
]
encoded = [tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=True, enable_thinking=False) for m in prompts]
# ... use the same `generate()` function as in dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1.
```
## Command-Line Interface
For an interactive demo (visualised iterative denoising), use the script in the [TIDE / dLLM repo](https://github.com/PKU-YuanGroup/TIDE):
```shell
python -u examples/a2d/bd3lm/chat.py \
--model_name_or_path TIDE-dllm/distill-LLaDA2-CALM \
--chat_template True --block_size 32 --remasking low_confidence \
--steps 256 --max_new_tokens 256
```
## Reproducing this checkpoint
```shell
git clone https://github.com/PKU-YuanGroup/TIDE && cd TIDE
pip install -e . && git submodule update --init --recursive
pip install -e "lm-evaluation-harness[ifeval,math]" && pip install -e "tokenkit[full]"
# Download the pre-tokenized SFT mixture for this teacher
huggingface-cli download TIDE-dllm/distill_llada2_sft --repo-type dataset \
--local-dir data/distill_llada2_sft
bash scripts/distill_llada2.sh \
--data_path data/distill_llada2_sft \
--distill_mode alm \
--num_gpus 8
```
## Citation
```bibtex
@misc{zhang2026turningtidecrossarchitecturedistillation,
title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
year={2026},
eprint={2604.26951},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2604.26951},
}
```
|