Qwen3-4B SFT + DPO SLERP Merge
Overview
This model is a SLERP merge between:
- Structured Output SFT model
- Reasoning-enhanced DPO model
Goal:
- Preserve strict structured output formatting
- Improve reasoning depth
Merge Details
Method: SLERP
Ratio: 70% SFT / 30% DPO
Source Models
SFT: kochan13/qwen-qwen3-4b-structured-output-lora-26020401
DPO: kochan13/qwen3-4b-dpo-qwen-cot-merged-26020701
Usage
Example:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "kochan13/qwen3-4b-sft-dpo-slerp-merged-260208"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
dtype=torch.float16,
device_map="auto"
)
Intended Use
Structured output generation
Reasoning tasks
Evaluation benchmarks
Limitations
May inherit biases from base model and training datasets.
Base Model License
Apache 2.0
- Downloads last month
- 6
Model tree for kochan13/qwen3-4b-sft-dpo-slerp-merged-260208
Base model
Qwen/Qwen3-4B-Instruct-2507