qwen3-4b-advanced-dpo-v20-merged
This model is a DPO fine-tuned version of deepkick/qwen3-4b-advanced-sft-v13-merged.
Method
- Base: Qwen/Qwen3-4B-Instruct-2507
- SFT Base: deepkick/qwen3-4b-advanced-sft-v13-merged (ALF 27/50, score 4.0543)
- DPO: v0 chosen / v13 rejected pairs (7 samples, target indices: [9, 11, 13, 21, 23, 27, 45])
- Beta: 0.1
- LR: 5e-07
- Epochs: 3
- LoRA: r=32, alpha=128
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"deepkick/qwen3-4b-advanced-dpo-v20-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("deepkick/qwen3-4b-advanced-dpo-v20-merged")
- Downloads last month
- 2
Model tree for deepkick/qwen3-4b-advanced-dpo-v20-merged
Base model
Qwen/Qwen3-4B-Instruct-2507