Qwen3-4B D4 SFT + Generic DPO + TOML-focused DPO
Pipeline
- SFT on structured data (V4+V5)
- Generic DPO (u-10bei/dpo-dataset-qwen-cot) → eval score best
- TOML-focused DPO (this model)
- chosen: [section] style TOML
- rejected: inline table style TOML
- 800 TOML pairs + 400 other format pairs
Config
- LR: 1e-05
- Beta: 0.1
- Epochs: 1
- LoRA: r=8, alpha=16 (merged)
- Downloads last month
- 2
Model tree for Rakushaking/Qwen4b-SFT-d9-merged-after-dpo-toml-dpo
Base model
Qwen/Qwen3-4B-Instruct-2507 Adapter
Rakushaking/Qwen4b-SFT-d9