Uploaded finetuned model

Developed by: Dario213
License: apache-2.0
Finetuned from model : unsloth/qwen3-4b-unsloth-bnb-4bit

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.

Training setup

The model was trained with LoRA adapters on all modules with rank 8. Dataset used for fine-tuning is (FreedomIntelligence/medical-o1-reasoning-SFT)[https://huggingface.co/datasets/FreedomIntelligence/medical-o1-reasoning-SFT]
SFTConfig arguments:
- warmup_steps=5
- learning_rate=2e-4
- optim="adamw_8bit"
- weight_decay=0.001
- lr_scheduler_type="linear"
- seed=5127

Citations

@misc{chen2024huatuogpto1medicalcomplexreasoning,
      title={HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs}, 
      author={Junying Chen and Zhenyang Cai and Ke Ji and Xidong Wang and Wanlong Liu and Rongsheng Wang and Jianye Hou and Benyou Wang},
      year={2024},
      eprint={2412.18925},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.18925}, 
}



@misc{qwen3technicalreport,
      title={Qwen3 Technical Report}, 
      author={Qwen Team},
      year={2025},
      eprint={2505.09388},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.09388}, 
}

Downloads last month: 29

Safetensors

Model size

4B params

Tensor type

BF16

Dataset used to train Dario213/Qwen3-4B-medical-reasoning

Papers for Dario213/Qwen3-4B-medical-reasoning

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339

HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs

Paper • 2412.18925 • Published Dec 25, 2024 • 107