exp028b-downproj-dpo-lr1e6-ep2-merged
SFT + DPO merged model. Full 16-bit weights, no adapter loading required.
Training Pipeline
- SFT: tomofusa/exp021b-blend-h-lora
- DPO: u-10bei/dpo-dataset-qwen-cot (2 epoch, lr=1e-06, beta=0.1)
DPO Configuration
- Learning rate: 1e-06
- Beta: 0.1
- Loss type: ipo
- LoRA: r=64, alpha=128
- Max length: 1024
- Downloads last month
- 14
Model tree for tomofusa/exp028b-downproj-dpo-lr1e6-ep2-merged
Base model
Qwen/Qwen3-4B-Instruct-2507