Evaluation

!lm_eval --model hf \
    --model_args pretrained=jaeyong2/Qwen3-0.6B-DPO-Ja-Peft \
    --tasks mmlu,japanese_leaderboard,gsm8k \
    --device cuda:0 \
    --batch_size 1 \
    --num_fewshot 5
Qwen3-0.6B-DPO-ja Qwen3-0.6B
MMLU 0.41 0.40
ja_leaderboard_jaqket_v2 0.30 0.28
ja_leaderboard_jcommonsenseqa 0.45 0.44
ja_leaderboard_jnli 0.24 0.26
ja_leaderboard_jsquad 0.49 0.48
ja_leaderboard_marc_ja 0.87 0.86
ja_leaderboard_mgsm 0.12 0.11
ja_leaderboard_xlsum 0.09 0.08
ja_leaderboard_xwinograd 0.55 0.55
GSM8K 0.44 0.42

License

Acknowledgement

This research is supported by TPU Research Cloud program.

Downloads last month
2
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support