Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed Paper • 2512.14067 • Published Dec 16, 2025 • 16
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 44