-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
Collections
Discover the best community collections!
Collections including paper arxiv:2305.18290
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 118
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 137 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published • 2
-
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training
Paper • 2602.10693 • Published • 220 -
Reinforced Attention Learning
Paper • 2602.04884 • Published • 29 -
Learning to Reason in 13 Parameters
Paper • 2602.04118 • Published • 6 -
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters
Paper • 2405.17604 • Published • 3
-
High-Resolution Image Synthesis with Latent Diffusion Models
Paper • 2112.10752 • Published • 17 -
Adding Conditional Control to Text-to-Image Diffusion Models
Paper • 2302.05543 • Published • 58 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 447
-
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 146 -
Training language models to follow instructions with human feedback
Paper • 2203.02155 • Published • 24 -
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper • 2307.09288 • Published • 251 -
The Llama 3 Herd of Models
Paper • 2407.21783 • Published • 118
-
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Paper • 2509.22576 • Published • 137 -
AgentBench: Evaluating LLMs as Agents
Paper • 2308.03688 • Published • 26 -
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Paper • 1910.01108 • Published • 22 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 64
-
Attention Is All You Need
Paper • 1706.03762 • Published • 120 -
LoRA: Low-Rank Adaptation of Large Language Models
Paper • 2106.09685 • Published • 60 -
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
Paper • 2101.03961 • Published • 13 -
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 26 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2