DPO LoRA Adapter (ExpK) for Qwen3-4B (StructEval)

SFT(ExpG) + DPO, r=8, alpha=8

  • SFT adapter merged from Exp G
  • DPO r=8, alpha=8
  • LR=5e-7, epoch=1, beta=0.1
  • max_length=2048
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Umezaki/dpo-qwen-expG-adapter

Adapter
(5273)
this model