-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 42 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 50 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132
Collections
Discover the best community collections!
Collections including paper arxiv:2509.26328
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 49 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 98 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56
-
Structured Denoising Diffusion Models in Discrete State-Spaces
Paper • 2107.03006 • Published • 1 -
Simplified and Generalized Masked Diffusion for Discrete Data
Paper • 2406.04329 • Published • 8 -
Simple and Effective Masked Diffusion Language Models
Paper • 2406.07524 • Published • 12 -
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 8 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 67 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58
-
Efficient-Large-Model/Fast_dLLM_v2_1.5B
2B • Updated • 16.7k • 11 -
Efficient-Large-Model/Fast_dLLM_v2_7B
333k • Updated • 8.74k • 28 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 45
-
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58 -
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper • 2510.14973 • Published • 42 -
Attention Sinks in Diffusion Language Models
Paper • 2510.15731 • Published • 50 -
Diffusion Language Models are Super Data Learners
Paper • 2511.03276 • Published • 132
-
Structured Denoising Diffusion Models in Discrete State-Spaces
Paper • 2107.03006 • Published • 1 -
Simplified and Generalized Masked Diffusion for Discrete Data
Paper • 2406.04329 • Published • 8 -
Simple and Effective Masked Diffusion Language Models
Paper • 2406.07524 • Published • 12 -
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs
Paper • 2510.07499 • Published • 49 -
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Paper • 2509.13683 • Published • 8 -
Multimodal Iterative RAG for Knowledge-Intensive Visual Question Answering
Paper • 2509.00798 • Published • 1
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 8 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 67 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 513 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58
-
Efficient-Large-Model/Fast_dLLM_v2_1.5B
2B • Updated • 16.7k • 11 -
Efficient-Large-Model/Fast_dLLM_v2_7B
333k • Updated • 8.74k • 28 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 58 -
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Paper • 2505.22618 • Published • 45
-
Large Language Diffusion Models
Paper • 2502.09992 • Published • 127 -
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Paper • 2503.09573 • Published • 77 -
MMaDA: Multimodal Large Diffusion Language Models
Paper • 2505.15809 • Published • 98 -
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective
Paper • 2505.15045 • Published • 56