Collections
Discover the best community collections!
Collections including paper arxiv:2507.10532
-
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper • 2505.14146 • Published • 20 -
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
ARM: Adaptive Reasoning Model
Paper • 2505.20258 • Published • 45 -
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Paper • 2505.19914 • Published • 46
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 20
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
-
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
Paper • 2505.14231 • Published • 53 -
Skywork-R1V3 Technical Report
Paper • 2507.06167 • Published • 74 -
Scaling Laws for Optimal Data Mixtures
Paper • 2507.09404 • Published • 38 -
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Paper • 2507.10532 • Published • 90
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 40 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 82 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 85
-
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper • 2505.14146 • Published • 20 -
Vibe Coding vs. Agentic Coding: Fundamentals and Practical Implications of Agentic AI
Paper • 2505.19443 • Published • 15 -
ARM: Adaptive Reasoning Model
Paper • 2505.20258 • Published • 45 -
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Paper • 2505.19914 • Published • 46
-
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning
Paper • 2505.14231 • Published • 53 -
Skywork-R1V3 Technical Report
Paper • 2507.06167 • Published • 74 -
Scaling Laws for Optimal Data Mixtures
Paper • 2507.09404 • Published • 38 -
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination
Paper • 2507.10532 • Published • 90
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks
Paper • 2504.05118 • Published • 26 -
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning
Paper • 2504.08600 • Published • 33 -
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce
Paper • 2504.11343 • Published • 20
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 51 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Chain-of-Verification Reduces Hallucination in Large Language Models
Paper • 2309.11495 • Published • 40 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 82 -
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages
Paper • 2309.09400 • Published • 87 -
Language Modeling Is Compression
Paper • 2309.10668 • Published • 85
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 40 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 57 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25