Efficient Training of Robust Traditional Chinese LLaMA-1B on a Single Consumer GPU: Continual Pre-training, SFT, and DPO
Paper • 2510.01616 • Published
開發者:遲佑成 協助人員:段明濤、侯詠皓 continue pretrain/ instruct-tuning/ qa tuning/ DPO/ RLHF/ vector applied 聯絡信箱:s111003816@m111.nthu.edu.tw