-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
Collections
Discover the best community collections!
Collections including paper arxiv:2503.19786
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 55 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 139 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 39
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 47 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 274
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 20 -
Evaluating Large Language Models Trained on Code
Paper • 2107.03374 • Published • 10 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
GPT-4 Technical Report
Paper • 2303.08774 • Published • 7
-
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Paper • 2508.06471 • Published • 211 -
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model
Paper • 2508.14444 • Published • 47 -
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Paper • 2507.06261 • Published • 67 -
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention
Paper • 2506.13585 • Published • 274
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 55 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 139 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 39
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 172