qwen3-4b-seq2k-dpo-merged

This model is a fine-tuned version of shinich001/qwen3-4b-lr5e5-ep1-seq2k using Direct Preference Optimization (DPO) via the Unsloth library. This repository contains the full-merged 16-bit weights. No adapter loading is required.

Training Configuration

  • Base model: shinich001/qwen3-4b-lr5e5-ep1-seq2k
  • Method: DPO
  • Epochs: 1
  • Learning rate: 5e-06
  • Beta: 0.1
  • Max sequence length: 2048
Downloads last month
2
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for shinich001/qwen3-4b-seq2k-dpo-merged

Finetuned
(1)
this model

Dataset used to train shinich001/qwen3-4b-seq2k-dpo-merged