Qwen3-4B SFT + DPO SLERP Merge

Overview

This model is a SLERP merge between:

  • Structured Output SFT model
  • Reasoning-enhanced DPO model

Goal:

  • Preserve strict structured output formatting
  • Improve reasoning depth

Merge Details

Method: SLERP
Ratio: 70% SFT / 30% DPO

Source Models

SFT: kochan13/qwen-qwen3-4b-structured-output-lora-26020401

DPO: kochan13/qwen3-4b-dpo-qwen-cot-merged-26020701


Usage

Example:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "kochan13/qwen3-4b-sft-dpo-slerp-merged-260208"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.float16,
    device_map="auto"
)

Intended Use

Structured output generation
Reasoning tasks
Evaluation benchmarks


Limitations

May inherit biases from base model and training datasets.


Base Model License

Apache 2.0

Downloads last month
6
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kochan13/qwen3-4b-sft-dpo-slerp-merged-260208

Finetuned
(1543)
this model