File size: 4,538 Bytes
09ee09f
 
 
b78b250
 
09ee09f
b78b250
09ee09f
 
b78b250
d5e7d0a
09ee09f
 
b78b250
 
 
09ee09f
 
b78b250
09ee09f
b78b250
09ee09f
b78b250
 
 
 
 
 
 
 
 
 
 
 
 
 
09ee09f
b78b250
 
 
 
09ee09f
 
b78b250
 
09ee09f
 
b78b250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09ee09f
 
b78b250
 
 
09ee09f
b78b250
 
 
 
 
09ee09f
 
b78b250
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
09ee09f
 
 
 
b78b250
 
 
 
 
 
 
 
09ee09f
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-generation
base_model: dllm-collection/Qwen3-0.6B-diffusion-bd3lm-v0.1
tags:
- diffusion
- dllm
- bd3lm
- distillation
- arxiv:2604.26951
---

<center> <div style="text-align: center;"> <img src="logo.gif" width="300" />
 </div> </center>

# distill-LLaDA2-TIDE_Shared

This model was introduced in the paper [Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models](https://huggingface.co/papers/2604.26951).

`distill-LLaDA2-TIDE_Shared` is a 0.6B diffusion language model distilled from LLaDA2.0-mini (16B MoE) into the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) student in the **Cross-Tokenizer (Pipeline A)** of the TIDE framework. TIDAL + CompDemo applied within the cross-tokenizer pipeline (non-native ablation).

## Model Overview

- **Method**: TIDE — [Reverse CALM / TIDAL / CompDemo](https://arxiv.org/abs/2604.26951) (cross-architecture distillation for diffusion LMs)
- **Framework**: [TIDE / dLLM](https://github.com/PKU-YuanGroup/TIDE)
- **Student (initialization)**: [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) (BD3LM, block_size=32)
- **Teacher**: [`inclusionAI/LLaDA2.0-mini`](https://huggingface.co/inclusionAI/LLaDA2.0-mini)
- **Distillation mode**: `--distill_mode alm_taid --use_comp_demo True`
- **Datasets**: [tulu-3-sft-mixture](https://huggingface.co/datasets/allenai/tulu-3-sft-mixture), [smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), [opc-sft-stage1](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage1) and [opc-sft-stage2](https://huggingface.co/datasets/OpenCoder-LLM/opc-sft-stage2) — same composition as the [`Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) base. Pre-tokenized for this teacher in [`TIDE-dllm/distill_llada2_sft`](https://huggingface.co/datasets/TIDE-dllm/distill_llada2_sft).

## Installation

```shell
pip install torch transformers accelerate
```

## Quick Start

> [!NOTE]
> This checkpoint is fully compatible with the BD3LM `generate(...)` routine published with [`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) — only the model name changes.

```python
import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

repo = "TIDE-dllm/distill-LLaDA2-TIDE_Shared"
device = "cuda" if torch.cuda.is_available() else "cpu"

model = AutoModelForMaskedLM.from_pretrained(
    repo, dtype=torch.bfloat16, trust_remote_code=True,
).to(device).eval()
tokenizer = AutoTokenizer.from_pretrained(repo, trust_remote_code=True)

prompts = [
    [
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Implement a DFS traversal in Python with clear inline comments."},
    ],
]
encoded = [tokenizer.apply_chat_template(m, add_generation_prompt=True, tokenize=True, enable_thinking=False) for m in prompts]
# ... use the same `generate()` function as in dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1.
```

## Command-Line Interface

For an interactive demo (visualised iterative denoising), use the script in the [TIDE / dLLM repo](https://github.com/PKU-YuanGroup/TIDE):

```shell
python -u examples/a2d/bd3lm/chat.py \
    --model_name_or_path TIDE-dllm/distill-LLaDA2-TIDE_Shared \
    --chat_template True --block_size 32 --remasking low_confidence \
    --steps 256 --max_new_tokens 256
```

## Reproducing this checkpoint

```shell
git clone https://github.com/PKU-YuanGroup/TIDE && cd TIDE
pip install -e . && git submodule update --init --recursive
pip install -e "lm-evaluation-harness[ifeval,math]" && pip install -e "tokenkit[full]"

# Download the pre-tokenized SFT mixture for this teacher
huggingface-cli download TIDE-dllm/distill_llada2_sft --repo-type dataset \
    --local-dir data/distill_llada2_sft

bash scripts/distill_llada2.sh \
    --data_path data/distill_llada2_sft \
    --distill_mode alm_taid --use_comp_demo True \
    --num_gpus 8
```

## Citation

```bibtex
@misc{zhang2026turningtidecrossarchitecturedistillation,
      title={Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models},
      author={Gongbo Zhang and Wen Wang and Ye Tian and Li Yuan},
      year={2026},
      eprint={2604.26951},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2604.26951},
}
```