rlhf/finetune
updated
A Critical Evaluation of AI Feedback for Aligning Large Language Models
Paper
• 2402.12366
• Published • 3
Contrastive Preference Optimization: Pushing the Boundaries of LLM
Performance in Machine Translation
Paper
• 2401.08417
• Published • 37
Insights into Alignment: Evaluating DPO and its Variants Across Multiple
Tasks
Paper
• 2404.14723
• Published • 10
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published • 28
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback
Paper
• 2406.00888
• Published • 33
Iterative Length-Regularized Direct Preference Optimization: A Case
Study on Improving 7B Language Models to GPT-4 Level
Paper
• 2406.11817
• Published • 13
Following Length Constraints in Instructions
Paper
• 2406.17744
• Published • 1
Understanding the performance gap between online and offline alignment
algorithms
Paper
• 2405.08448
• Published • 18
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published • 35
Contrastive Prefence Learning: Learning from Human Feedback without RL
Paper
• 2310.13639
• Published • 25
Paper
• 2408.02666
• Published • 29
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published • 140
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 69
The Differences Between Direct Alignment Algorithms are a Blur
Paper
• 2502.01237
• Published • 113
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published • 59
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published • 125
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published • 104
Agent-R: Training Language Model Agents to Reflect via Iterative
Self-Training
Paper
• 2501.11425
• Published • 109
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
• 2501.12948
• Published • 447
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published • 113
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
• 2503.22230
• Published • 45
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Paper
• 2505.03335
• Published • 191