Qwen3-4B SFT + DPO SLERP Merge

Overview

This model is a SLERP merge between:

Structured Output SFT model
Reasoning-enhanced DPO model

Goal:

Preserve strict structured output formatting
Improve reasoning depth

Merge Details

Method: SLERP
Ratio: 70% SFT / 30% DPO

Source Models

SFT: kochan13/qwen-qwen3-4b-structured-output-lora-26020401

DPO: kochan13/qwen3-4b-dpo-qwen-cot-merged-26020701

Usage

Example:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "kochan13/qwen3-4b-sft-dpo-slerp-merged-260208"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    device_map="auto"
)

Intended Use

Structured output generation
Reasoning tasks
Evaluation benchmarks

Limitations

May inherit biases from base model and training datasets.

Base Model License

Apache 2.0

Downloads last month: 6

Safetensors

Model size

4B params

Tensor type

F16

Model tree for kochan13/qwen3-4b-sft-dpo-slerp-merged-260208

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1543)

this model