Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2305.18290

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 2

High-Resolution Image Synthesis with Latent Diffusion Models

Paper • 2112.10752 • Published Dec 20, 2021 • 17
Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 58
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

Collection of useful papers.

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447

article collection

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137
AgentBench: Evaluating LLMs as Agents

Paper • 2308.03688 • Published Aug 7, 2023 • 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

Aligning Instruction Tuning with Pre-training

Paper • 2501.09368 • Published Jan 16, 2025
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper • 2403.14608 • Published Mar 21, 2024
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

a collection of algorithmic agents for user interfaces/interactions, program synthesis, and robotics

End-to-End Goal-Driven Web Navigation

Paper • 1602.02261 • Published Feb 6, 2016
Learning Language Games through Interaction

Paper • 1606.02447 • Published Jun 8, 2016
Naturalizing a Programming Language via Interactive Learning

Paper • 1704.06956 • Published Apr 23, 2017
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Paper • 1802.08802 • Published Feb 24, 2018 • 2

Papers reimplemented

List of research papers, architectures, and techniques reimplemented in LLM-quest or Hugging Face's TRL. Missing: Qwen3.5, Qwen3-Next, GPT-2

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220
Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 29
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published Feb 4 • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3

High-Resolution Image Synthesis with Latent Diffusion Models

Paper • 2112.10752 • Published Dec 20, 2021 • 17
Adding Conditional Control to Text-to-Image Diffusion Models

Paper • 2302.05543 • Published Feb 10, 2023 • 58
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 447

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18, 2025 • 146
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 251
The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31, 2024 • 118

article collection

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137
AgentBench: Evaluating LLMs as Agents

Paper • 2308.03688 • Published Aug 7, 2023 • 26
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Paper • 1910.01108 • Published Oct 2, 2019 • 22
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

Collection of useful papers.

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 120
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 60
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Paper • 2101.03961 • Published Jan 11, 2021 • 13
Proximal Policy Optimization Algorithms

Paper • 1707.06347 • Published Jul 20, 2017 • 11

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022 • 1
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 26
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021 • 2
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 2

Aligning Instruction Tuning with Pre-training

Paper • 2501.09368 • Published Jan 16, 2025
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Paper • 2403.14608 • Published Mar 21, 2024
Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs