-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2502.13923
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
Qwen2.5 VL 32B Instruct Demo
🏃166Chat with a multimodal AI using text, images, or video
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217 -
Qwen/Qwen2.5-VL-32B-Instruct
Image-Text-to-Text • 33B • Updated • 71.9k • 481 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text • 73B • Updated • 103k • • 609
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 274
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Qwen2.5 VL 32B Instruct Demo
🏃166Chat with a multimodal AI using text, images, or video
-
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217 -
Qwen/Qwen2.5-VL-32B-Instruct
Image-Text-to-Text • 33B • Updated • 71.9k • 481 -
Qwen/Qwen2.5-VL-72B-Instruct
Image-Text-to-Text • 73B • Updated • 103k • • 609
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
LLaMA: Open and Efficient Foundation Language Models
Paper • 2302.13971 • Published • 23 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447 -
Qwen2.5-VL Technical Report
Paper • 2502.13923 • Published • 217 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
Qwen-Image Technical Report
Paper • 2508.02324 • Published • 274
-
LNS-Madam: Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update
Paper • 2106.13914 • Published • 1 -
HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges
Paper • 2506.15196 • Published • 3 -
Ascend HiFloat8 Format for Deep Learning
Paper • 2409.16626 • Published • 1 -
Recipes for Pre-training LLMs with MXFP8
Paper • 2506.08027 • Published • 1
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 88