RefAlign: RL with Similarity-based Rewards
Collection
Datasets and models in: Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data. • 19 items • Updated • 1
GitHub repository: https://github.com/mzhaoshuai/RefAlign
RefAlign Training with https://huggingface.co/datasets/shuchangtao/CONQORD_dataset/tree/main/conqord_step3_data.
Base model
mistralai/Mistral-7B-v0.1