Final DPO: TOML + YAML + XML + codeblock fix

  • TOML: 750 pairs (section vs inline)
  • YAML: 300 pairs (correct vs broken indent)
  • XML: 300 pairs (valid vs broken tags)
  • JSON: 225 pairs (clean vs codeblock-wrapped)
  • CSV: 150 pairs
  • LR=1e-05, Beta=0.1, Epochs=1
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rakushaking/Qwen4b-SFT-d9-merged-after-dpo-toml-xml-yaml-dpo-d2