Collections
Discover the best community collections!
Collections including paper arxiv:2501.15383
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 62 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 60
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
Paper • 2412.20750 • Published • 20 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 40 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107
-
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer
Paper • 2511.22699 • Published • 245 -
A Survey on Diffusion Language Models
Paper • 2508.10875 • Published • 34 -
Scalable Diffusion Models with Transformers
Paper • 2212.09748 • Published • 17 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
Qwen Technical Report
Paper • 2309.16609 • Published • 38 -
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Paper • 2311.07919 • Published • 9 -
Qwen2 Technical Report
Paper • 2407.10671 • Published • 171 -
Qwen2-Audio Technical Report
Paper • 2407.10759 • Published • 64
-
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 302 -
Towards Best Practices for Open Datasets for LLM Training
Paper • 2501.08365 • Published • 62 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72
-
EuroBERT: Scaling Multilingual Encoders for European Languages
Paper • 2503.05500 • Published • 81 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Qwen2.5-1M Technical Report
Paper • 2501.15383 • Published • 72 -
Baichuan-Omni-1.5 Technical Report
Paper • 2501.15368 • Published • 60
-
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 110 -
Are Vision-Language Models Truly Understanding Multi-vision Sensor?
Paper • 2412.20750 • Published • 20 -
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Paper • 2412.21187 • Published • 40 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 107