DPO-finetuned model based on Qwen/Qwen2.5-3B, trained to reduce over-normalized and false-equivalence responses.
Chat template
Files info
Base model