Introspective Diffusion Language Models (I-DLM)
Collection
Model checkpoints for I-DLM. Paper: https://arxiv.org/abs/2604.11035 • 3 items • Updated • 10
Introspective Diffusion Language Model (8B) — a diffusion language model converted from Qwen3-8B that matches AR quality while enabling parallel token generation.
| Benchmark | I-DLM-8B | Qwen3-8B (AR) | LLaDA-2.1-mini (16B) | SDAR (8B) |
|---|---|---|---|---|
| ARC-C | 95.8 | 95.8 | 90.2 | 91.9 |
| MMLU | 82.4 | 83.5 | 74.5 | 78.6 |
| MMLU-Pro | 73.1 | 75.1 | 64.8 | 56.9 |
| GPQA-D | 55.6 | 58.9 | 46.0 | 40.2 |
| GPQA | 54.9 | 55.4 | 53.3 | --- |
| GSM8K | 95.0 | 96.0 | 89.0 | 91.7 |
| MATH-500 | 96.8 | 95.8 | 85.0 | 78.6 |
| MathBench | 89.1 | 93.1 | 84.2 | 76.9 |
| AIME-24 | 69.6 | 73.1 | 43.3 | 10.0 |
| AIME-25 | 60.8 | 65.4 | 43.3 | 10.0 |
| HumanEval | 93.3 | 95.1 | 86.0 | 78.7 |
| MBPP | 92.2 | 93.4 | 82.1 | 72.0 |
| LiveCodeBench-v6 | 45.7 | 50.3 | 30.4 | 16.6 |
| IFEval | 84.7 | 84.7 | 83.2 | 61.4 |
Note: This model checkpoint is hosted on HuggingFace for weight distribution. For inference, please use our SGLang-based ISD pipeline which implements the Introspective Strided Decoding algorithm described in the paper. Direct loading via
transformersis not currently supported for reproducing paper results.
# Install
git clone https://github.com/Introspective-Diffusion/I-DLM.git
cd I-DLM/inference && bash install.sh
# Launch server
python -m sglang.launch_server \
--model-path yifanyu/I-DLM-8B \
--trust-remote-code --tp-size 1 --dtype bfloat16 \
--mem-fraction-static 0.85 --max-running-requests 32 \
--attention-backend flashinfer --dllm-algorithm IDLMBlockN \
--dllm-algorithm-config inference/configs/idlm_blockN4_config.yaml \
--port 30000
# Generate
curl http://localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"default","messages":[{"role":"user","content":"Prove sqrt(2) is irrational."}],"max_tokens":4096}'
See the inference README for detailed setup, evaluation, and benchmarking.
I-DLM recovers introspective consistency (AR models' inherent self-agreement) through:
| Model | HuggingFace | Description |
|---|---|---|
| I-DLM-8B | yifanyu/I-DLM-8B | Converted from Qwen3-8B |
| I-DLM-32B | yifanyu/I-DLM-32B | Converted from Qwen3-32B |
| I-DLM-8B-LoRA | yifanyu/I-DLM-8B-lora-r128 | Gated LoRA adapter (rank=128) for lossless R-ISD |
@article{yu2026introspective,
title={Introspective Diffusion Language Models},
author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu
and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri
and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon
and Dao, Tri and Athiwaratkun, Ben and Zou, James
and Lai, Fan and Xu, Chenfeng},
journal={arXiv preprint arXiv:2604.11035},
year={2026}
}