-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2508.14460
-
Seed X
💻30A powerful multilingual translation language model
-
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Paper • 2507.13618 • Published • 16 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85 -
ByteDance-Seed/Seed-X-PPO-7B
Translation • Updated • 2.61k • 301
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 154 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 305 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper • 2507.08616 • Published • 15 -
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
Paper • 2507.21990 • Published • 27 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Paper • 2410.13785 • Published • 19 -
Aligning Large Language Models via Self-Steering Optimization
Paper • 2410.17131 • Published • 24 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 51 -
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Paper • 2410.14745 • Published • 47
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85 -
MEML-GRPO: Heterogeneous Multi-Expert Mutual Learning for RLVR Advancement
Paper • 2508.09670 • Published -
URPO: A Unified Reward & Policy Optimization Framework for Large Language Models
Paper • 2507.17515 • Published • 2
-
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 140 -
CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search
Paper • 2508.02091 • Published • 13 -
DINOv3
Paper • 2508.10104 • Published • 305 -
SSRL: Self-Search Reinforcement Learning
Paper • 2508.10874 • Published • 97
-
Seed X
💻30A powerful multilingual translation language model
-
Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters
Paper • 2507.13618 • Published • 16 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85 -
ByteDance-Seed/Seed-X-PPO-7B
Translation • Updated • 2.61k • 301
-
Scaling Test-time Compute for LLM Agents
Paper • 2506.12928 • Published • 63 -
AgentsNet: Coordination and Collaborative Reasoning in Multi-Agent LLMs
Paper • 2507.08616 • Published • 15 -
ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge
Paper • 2507.21990 • Published • 27 -
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
Paper • 2508.14460 • Published • 85
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 196 • 99 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 39 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published • 1 -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs
Paper • 2501.18585 • Published • 61 -
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper • 2503.14456 • Published • 154 -
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Paper • 2503.15265 • Published • 46 -
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning
Paper • 2503.15558 • Published • 50
-
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment
Paper • 2410.13785 • Published • 19 -
Aligning Large Language Models via Self-Steering Optimization
Paper • 2410.17131 • Published • 24 -
Baichuan Alignment Technical Report
Paper • 2410.14940 • Published • 51 -
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation
Paper • 2410.14745 • Published • 47
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 7 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 24 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 15 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69