nanosarvam โ step 115500
A small GPT-2-style language model trained on OpenWebText.
Architecture
- Parameters: 48.28M
- Layers: 8
- d_model: 512
- Heads: 8 (GQA with 2 KV heads)
- Context: 512 tokens
- FFN: SwiGLU (ff_mult=4)
- Positional encoding: RoPE
- Training step: 115500
- Dataset: Skylion007/openwebtext
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("lilob/nanosarvam-owt-run")
model = AutoModelForCausalLM.from_pretrained("lilob/nanosarvam-owt-run")
inputs = tokenizer("Once upon a time", return_tensors="pt")
out = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))
- Downloads last month
- 1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support