Qwen3-4B D4 SFT + Generic DPO + TOML-focused DPO

Pipeline

  1. SFT on structured data (V4+V5)
  2. Generic DPO (u-10bei/dpo-dataset-qwen-cot) → eval score best
  3. TOML-focused DPO (this model)
    • chosen: [section] style TOML
    • rejected: inline table style TOML
    • 800 TOML pairs + 400 other format pairs

Config

  • LR: 1e-05
  • Beta: 0.1
  • Epochs: 1
  • LoRA: r=8, alpha=16 (merged)
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rakushaking/Qwen4b-SFT-d9-merged-after-dpo-toml-dpo