Safetensors
qwen3

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-0.6B SFT+RL GSM8K Model

This directory contains a Qwen3-0.6B model trained using SFT (Supervised Fine-Tuning) + RL (Reinforcement Learning) methods, specifically optimized for the GSM8K mathematical reasoning task.

Model Information

  • Base Model: Qwen3-0.6B
  • Training Method: SFT + RL (GRPO)
  • Dataset: GSM8K (Grade School Math 8K)
  • Test Set Accuracy: 0.7938 (79.38%)

Directory Structure

Qwen3-0.6B_sft+rl_merged_model/
β”œβ”€β”€ README.md                    # This file
β”œβ”€β”€ config.json                  # Model configuration file
β”œβ”€β”€ generation_config.json       # Generation configuration file
β”œβ”€β”€ tokenizer_config.json        # Tokenizer configuration
β”œβ”€β”€ tokenizer.json               # Tokenizer file
β”œβ”€β”€ vocab.json                   # Vocabulary file
β”œβ”€β”€ merges.txt                   # BPE merges file
β”œβ”€β”€ gsm8k_test_outputs.jsonl    # Test set output results
β”œβ”€β”€ gsm8k_train_outputs.jsonl   # Training set output results
β”œβ”€β”€ evaluate_accuracy.py         # Accuracy evaluation script
β”œβ”€β”€ collect_model_outputs.py     # Model output collection script
└── utils.py                     # Utility functions

Usage

Loading the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "./Qwen3-0.6B_sft+rl_merged_model"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

Evaluating Model Accuracy

Use the evaluation script in this directory:

python evaluate_accuracy.py \
    --file gsm8k_test_outputs.jsonl \
    --name "Qwen3-0.6B SFT+RL"

Performance Metrics

  • Test Set Accuracy: 79.38% (0.7938)
  • Training Set Accuracy: Detailed results can be found in gsm8k_train_outputs.jsonl

Related Files

  • Training Scripts: ../train_sft_distillation.py (SFT), ../../train_grpo_gsm8k.py (RL)
  • Merge Script: ../merge_lora_model.py
  • Project README: ../README.md

Notes

  1. The model uses BF16 precision and is recommended to run on GPUs that support BF16
  2. The model has merged LoRA weights and can be used directly without loading additional adapters
  3. Test set output results are saved in gsm8k_test_outputs.jsonl, containing detailed reasoning processes for each sample

Citation

If you use this model, please cite the related training methods and datasets:

  • GSM8K Dataset: Cobbe et al., 2021
  • GRPO: Group Relative Policy Optimization
  • Qwen3: Yang et al., 2025. Qwen3 Technical Report. arXiv preprint arXiv:2505.09388
Downloads last month
3
Safetensors
Model size
0.8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for pyromind/Qwen3-0.6B-gsm8k