Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2505.22618

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

Efficient Diffusion LLM

Efficient-Large-Model/Fast_dLLM_v2_1.5B

2B • Updated Jan 22 • 16.9k • 11
Efficient-Large-Model/Fast_dLLM_v2_7B

333k • Updated Jan 22 • 8.83k • 28
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30, 2025 • 58
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21, 2025 • 68
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 170
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18, 2025 • 17

Diffusion Language Models

Structured Denoising Diffusion Models in Discrete State-Spaces

Paper • 2107.03006 • Published Jul 7, 2021 • 1
Simplified and Generalized Masked Diffusion for Discrete Data

Paper • 2406.04329 • Published Jun 6, 2024 • 8
Simple and Effective Masked Diffusion Language Models

Paper • 2406.07524 • Published Jun 11, 2024 • 12
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127

(M)LLMs based on Discrete Diffusion Model and relevant techniques

diffusionfamily/diffullama

Text Generation • 7B • Updated Oct 25, 2024 • 173 • 13
GSAI-ML/LLaDA-8B-Base

Text Generation • Updated Oct 21, 2025 • 180k • 96
Dream-org/Dream-v0-Instruct-7B

Text Generation • 8B • Updated Jul 15, 2025 • 128k • 154
GSAI-ML/LLaDA-V

Image-Text-to-Text • 8B • Updated 28 days ago • 5.14k • 27

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29, 2025 • 31
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16, 2025 • 42
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper • 2506.14429 • Published Jun 17, 2025 • 44

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13, 2025 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

WTF GENIUS PAPERS

Papers that made me appreciate my major and my life a little more. obs=Observation, innov=Innovation. Most papers are abt improving tiny models.

Diffusion Language Models Know the Answer Before Decoding

Paper • 2508.19982 • Published Aug 27, 2025 • 27
ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Paper • 2512.13586 • Published Dec 15, 2025 • 93
LSRIF: Logic-Structured Reinforcement Learning for Instruction Following

Paper • 2601.06431 • Published Jan 10 • 12
Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning

Paper • 2601.09088 • Published Jan 14 • 63

Diffusion Language Models

Structured Denoising Diffusion Models in Discrete State-Spaces

Paper • 2107.03006 • Published Jul 7, 2021 • 1
Simplified and Generalized Masked Diffusion for Discrete Data

Paper • 2406.04329 • Published Jun 6, 2024 • 8
Simple and Effective Masked Diffusion Language Models

Paper • 2406.07524 • Published Jun 11, 2024 • 12
Large Language Diffusion Models

Paper • 2502.09992 • Published Feb 14, 2025 • 127

Efficient Diffusion LLM

Efficient-Large-Model/Fast_dLLM_v2_1.5B

2B • Updated Jan 22 • 16.9k • 11
Efficient-Large-Model/Fast_dLLM_v2_7B

333k • Updated Jan 22 • 8.83k • 28
Fast-dLLM v2: Efficient Block-Diffusion LLM

Paper • 2509.26328 • Published Sep 30, 2025 • 58
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45

(M)LLMs based on Discrete Diffusion Model and relevant techniques

diffusionfamily/diffullama

Text Generation • 7B • Updated Oct 25, 2024 • 173 • 13
GSAI-ML/LLaDA-8B-Base

Text Generation • Updated Oct 21, 2025 • 180k • 96
Dream-org/Dream-v0-Instruct-7B

Text Generation • 8B • Updated Jul 15, 2025 • 128k • 154
GSAI-ML/LLaDA-V

Image-Text-to-Text • 8B • Updated 28 days ago • 5.14k • 27

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45

Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding

Paper • 2505.22618 • Published May 28, 2025 • 45
DINGO: Constrained Inference for Diffusion LLMs

Paper • 2505.23061 • Published May 29, 2025 • 31
Discrete Diffusion in Large Language and Multimodal Models: A Survey

Paper • 2506.13759 • Published Jun 16, 2025 • 42
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs

Paper • 2506.14429 • Published Jun 17, 2025 • 44

CoRAG: Collaborative Retrieval-Augmented Generation

Paper • 2504.01883 • Published Apr 2, 2025 • 9
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning

Paper • 2504.08837 • Published Apr 10, 2025 • 44
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model

Paper • 2504.10068 • Published Apr 14, 2025 • 30
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations

Paper • 2504.10481 • Published Apr 14, 2025 • 85

R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

Paper • 2503.10615 • Published Mar 13, 2025 • 17
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation

Paper • 2503.10630 • Published Mar 13, 2025 • 6
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12, 2025 • 39
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL

Paper • 2503.07536 • Published Mar 10, 2025 • 88

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11, 2025 • 90
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21, 2025 • 68
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16, 2025 • 170
MoBA: Mixture of Block Attention for Long-Context LLMs

Paper • 2502.13189 • Published Feb 18, 2025 • 17

MLLM-as-a-Judge for Image Safety without Human Labeling

Paper • 2501.00192 • Published Dec 31, 2024 • 32
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1, 2025 • 110
Xmodel-2 Technical Report

Paper • 2412.19638 • Published Dec 27, 2024 • 27
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs