Qwen2.5-3B DPO โ€“ Over-Normalization Mitigation

DPO-finetuned model based on Qwen/Qwen2.5-3B, trained to reduce over-normalized and false-equivalence responses.

Downloads last month
28
Safetensors
Model size
3B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Phantomcloak19/qwen2.5-dpo-overnorm-full

Base model

Qwen/Qwen2.5-3B
Finetuned
(369)
this model