normal-smollm-1p7b-500B-30n-2048sl-960gbsz

This is the base (pretraining) checkpoint for a SmolLM2-style 1.7B model, converted to Hugging Face LlamaForCausalLM format from a Megatron-LM distributed checkpoint.

Details

  • Parameters: ~1.7B
  • Context length: 2048
  • Vocab size: 49152
  • Architecture: Llama (RMSNorm, SwiGLU, RoPE)
  • Training: 500B tokens (pretraining)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "REPLACE_WITH_OWNER/normal-smollm-1p7b-500B-30n-2048sl-960gbsz"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

Notes

This is a base model (not instruction-tuned). For chat use, apply SFT/DPO on top of this checkpoint.

Downloads last month
372
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Raghav-Singhal/normal-smollm-1p7b-500B-30n-2048sl-960gbsz

Finetunes
1 model