qwen3-4b-advanced-dpo-v23-merged

DPO fine-tuned version of deepkick/qwen3-4b-advanced-sft-v13-merged.

Method

  • Base: Qwen/Qwen3-4B-Instruct-2507
  • SFT Base: deepkick/qwen3-4b-advanced-sft-v13-merged (ALF 27/50, score 4.0543)
  • DPO Dataset: deepkick/sft_alfworld_v5_action_format (THOUGHT/ACTION format)
  • DPO Pairs: 311 samples (全失敗パターン網羅)
  • Beta: 0.1
  • LR: 5e-07
  • Epochs: 1
  • LoRA: r=32, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "deepkick/qwen3-4b-advanced-dpo-v23-merged",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("deepkick/qwen3-4b-advanced-dpo-v23-merged")
Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepkick/qwen3-4b-advanced-dpo-v23-merged

Finetuned
(1541)
this model