shinich001
/

qwen3-4b-seq2k-dpo-merged

Text Generation

text-generation-inference

Model card Files Files and versions

qwen3-4b-seq2k-dpo-merged

This model is a fine-tuned version of shinich001/qwen3-4b-lr5e5-ep1-seq2k using Direct Preference Optimization (DPO) via the Unsloth library. This repository contains the full-merged 16-bit weights. No adapter loading is required.

Training Configuration

Base model: shinich001/qwen3-4b-lr5e5-ep1-seq2k
Method: DPO
Epochs: 1
Learning rate: 5e-06
Beta: 0.1
Max sequence length: 2048

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

BF16

·

Model tree for shinich001/qwen3-4b-seq2k-dpo-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

shinich001/qwen3-4b-lr5e5-ep1-seq2k

Finetuned

(1)

this model

Dataset used to train shinich001/qwen3-4b-seq2k-dpo-merged