dpo-qwen-y-v35

DPO fine-tuned version of Qwen/Qwen3-4B-Instruct-2507. Full-merged 16-bit weights. No adapter loading required.

Training Configuration

  • Method: DPO
  • Epochs: 1
  • Learning rate: 1e-07
  • Beta: 0.1
  • Max sequence length: 1024
Downloads last month
9
Safetensors
Model size
3B params
Tensor type
F32
·
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yamaTK/dpo-qwen-y-v35

Finetuned
(1537)
this model

Dataset used to train yamaTK/dpo-qwen-y-v35