Qwen1.5 - 0.5B reward model trained on beyond/rlhf-reward-single-round-trans_chinese
Chat template
Files info