qwen3-4b-seq2k-dpo-merged
This model is a fine-tuned version of shinich001/qwen3-4b-lr5e5-ep1-seq2k using Direct Preference Optimization (DPO) via the Unsloth library. This repository contains the full-merged 16-bit weights. No adapter loading is required.
Training Configuration
- Base model: shinich001/qwen3-4b-lr5e5-ep1-seq2k
- Method: DPO
- Epochs: 1
- Learning rate: 5e-06
- Beta: 0.1
- Max sequence length: 2048
- Downloads last month
- 2
Model tree for shinich001/qwen3-4b-seq2k-dpo-merged
Base model
Qwen/Qwen3-4B-Instruct-2507