Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8
This model is a mid-trained version of Qwen/Qwen2.5-1.5B on the NVIDIA Nemotron-CC-Math-v1 dataset (52B tokens, 4plus subset).
Training Details
- Base Model: Qwen/Qwen2.5-1.5B
- Training Data: Nemotron-CC-Math-v1 (4plus subset, ~52B tokens)
- Training Type: Continued pre-training (causal language modeling)
- Training Stage: Train 8 (final checkpoint of multi-stage mid-training)
Training Hyperparameters
- Learning rate: 2e-05
- Train batch size: 8 per device
- Gradient accumulation steps: 4
- Total train batch size: 128
- Distributed: 4 GPUs
- Optimizer: AdamW
- LR scheduler: Cosine with 5% warmup
- Epochs: 1.0
- Precision: bf16
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
Framework Versions
- Transformers: 4.57.1
- PyTorch: 2.6.0+cu124
- Datasets: 4.0.0
- LLaMA-Factory
- Downloads last month
- 5
Model tree for salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8
Base model
Qwen/Qwen2.5-1.5B