Phantomcloak19
/

qwen2.5-dpo-overnorm-full

Text Generation

text-generation-inference

Model card Files Files and versions

Qwen2.5-3B DPO – Over-Normalization Mitigation

DPO-finetuned model based on Qwen/Qwen2.5-3B, trained to reduce over-normalized and false-equivalence responses.

Downloads last month: 28

Safetensors

Model size

3B params

Tensor type

F16

·

Model tree for Phantomcloak19/qwen2.5-dpo-overnorm-full

Base model

Qwen/Qwen2.5-3B

Finetuned

(369)

this model