Self-Rewarding Language Models
Paper • 2401.10020 • Published • 153
An sft version of a qwen1.5-1.8B
Qwen1.5 is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include:
trust_remote_code.For more details, please refer to the blog post and GitHub repo.
We sft the model on a subset of [Open Assistant dataset](Open Assistant dataset) following self_reward