Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8

This model is a mid-trained version of Qwen/Qwen2.5-1.5B on the NVIDIA Nemotron-CC-Math-v1 dataset (52B tokens, 4plus subset).

Training Details

  • Base Model: Qwen/Qwen2.5-1.5B
  • Training Data: Nemotron-CC-Math-v1 (4plus subset, ~52B tokens)
  • Training Type: Continued pre-training (causal language modeling)
  • Training Stage: Train 8 (final checkpoint of multi-stage mid-training)

Training Hyperparameters

  • Learning rate: 2e-05
  • Train batch size: 8 per device
  • Gradient accumulation steps: 4
  • Total train batch size: 128
  • Distributed: 4 GPUs
  • Optimizer: AdamW
  • LR scheduler: Cosine with 5% warmup
  • Epochs: 1.0
  • Precision: bf16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

Framework Versions

  • Transformers: 4.57.1
  • PyTorch: 2.6.0+cu124
  • Datasets: 4.0.0
  • LLaMA-Factory
Downloads last month
5
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8

Finetuned
(315)
this model

Dataset used to train salmannyu/Qwen2.5-1.5B-Nemotron-Math-52B-Mid-Train-8