Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:1706.03741

Read Later 📚

Interesting papers on AI, LLMs, etc. to add to reading list

Monitored Markov Decision Processes

Paper • 2402.06819 • Published Feb 9, 2024
Generalization in Monitored Markov Decision Processes (Mon-MDPs)

Paper • 2505.08988 • Published May 13, 2025
Bayesian Risk Markov Decision Processes

Paper • 2106.02558 • Published Jun 4, 2021
Sotopia-RL: Reward Design for Social Intelligence

Paper • 2508.03905 • Published Aug 5, 2025 • 23

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Papers to read - Reinforcement Learning

Papers I want to read, at some point. Focused on Reinforcement Learning papers.

Deep reinforcement learning from human preferences

Paper • 1706.03741 • Published Jun 12, 2017 • 4
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Direct Preference-based Policy Optimization without Reward Modeling

Paper • 2301.12842 • Published Jan 30, 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17

Papers - Fine-tuning - Preference-based RL (PbRL)

Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2
Learning Trajectory Preferences for Manipulators via Iterative Improvement

Paper • 1306.6294 • Published Jun 26, 2013 • 3
Deep reinforcement learning from human preferences

Paper • 1706.03741 • Published Jun 12, 2017 • 4
Learning Dynamic Robot-to-Human Object Handover from Human Feedback

Paper • 1603.06390 • Published Mar 21, 2016 • 2

Papers - Fine-tuning

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 46

Read Later 📚

Interesting papers on AI, LLMs, etc. to add to reading list

Monitored Markov Decision Processes

Paper • 2402.06819 • Published Feb 9, 2024
Generalization in Monitored Markov Decision Processes (Mon-MDPs)

Paper • 2505.08988 • Published May 13, 2025
Bayesian Risk Markov Decision Processes

Paper • 2106.02558 • Published Jun 4, 2021
Sotopia-RL: Reward Design for Social Intelligence

Paper • 2508.03905 • Published Aug 5, 2025 • 23

Papers - Fine-tuning - Preference-based RL (PbRL)

Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2
Learning Trajectory Preferences for Manipulators via Iterative Improvement

Paper • 1306.6294 • Published Jun 26, 2013 • 3
Deep reinforcement learning from human preferences

Paper • 1706.03741 • Published Jun 12, 2017 • 4
Learning Dynamic Robot-to-Human Object Handover from Human Feedback

Paper • 1603.06390 • Published Mar 21, 2016 • 2

Papers - Fine-tuning - DPO

Refer to additional papers: https://link.springer.com/article/10.1007/s10994-014-5458-8 and https://link.springer.com/article/10.1007/BF00992696

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Paper • 2305.18290 • Published May 29, 2023 • 64
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
sDPO: Don't Use Your Data All at Once

Paper • 2403.19270 • Published Mar 28, 2024 • 41
Dueling RL: Reinforcement Learning with Trajectory Preferences

Paper • 2111.04850 • Published Nov 8, 2021 • 2

Papers - Fine-tuning

Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

Paper • 2310.20587 • Published Oct 31, 2023 • 18
SELF: Language-Driven Self-Evolution for Large Language Model

Paper • 2310.00533 • Published Oct 1, 2023 • 2
QLoRA: Efficient Finetuning of Quantized LLMs

Paper • 2305.14314 • Published May 23, 2023 • 61
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models

Paper • 2309.14717 • Published Sep 26, 2023 • 46

Papers to read - Reinforcement Learning

Papers I want to read, at some point. Focused on Reinforcement Learning papers.

Deep reinforcement learning from human preferences

Paper • 1706.03741 • Published Jun 12, 2017 • 4
Training language models to follow instructions with human feedback

Paper • 2203.02155 • Published Mar 4, 2022 • 24
Direct Preference-based Policy Optimization without Reward Modeling

Paper • 2301.12842 • Published Jan 30, 2023
Woodpecker: Hallucination Correction for Multimodal Large Language Models

Paper • 2310.16045 • Published Oct 24, 2023 • 17

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs