qwen3-4b-sft-dpo-v1-20260207-0850
LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using SFT + DPO pipeline.
Training Pipeline
- SFT (QLoRA 4-bit): r=64, alpha=128, LR=0.0002, 3 epochs, seq_len=1024
- DPO: r=32, alpha=64, LR=5e-07, beta=0.1, 1 epochs
SFT Datasets
- u-10bei/structured_data_with_cot_dataset_512_v2
- daichira/structured-5k-mix-sft
- daichira/structured-hard-sft-4k
DPO Dataset
- u-10bei/dpo-dataset-qwen-cot
Sources & License
Dataset Licenses: CC-BY-4.0, MIT License Compliance: Users must comply with the dataset licenses and the base model's original terms of use.
- Downloads last month
- -
Model tree for yusei926/qwen3-4b-sft-dpo-v1-20260207-0850
Base model
Qwen/Qwen3-4B-Instruct-2507 Finetuned
unsloth/Qwen3-4B-Instruct-2507