nanosarvam โ€” step 115500

A small GPT-2-style language model trained on OpenWebText.

Architecture

  • Parameters: 48.28M
  • Layers: 8
  • d_model: 512
  • Heads: 8 (GQA with 2 KV heads)
  • Context: 512 tokens
  • FFN: SwiGLU (ff_mult=4)
  • Positional encoding: RoPE
  • Training step: 115500
  • Dataset: Skylion007/openwebtext

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("lilob/nanosarvam-owt-run")
model     = AutoModelForCausalLM.from_pretrained("lilob/nanosarvam-owt-run")

inputs = tokenizer("Once upon a time", return_tensors="pt")
out    = model.generate(**inputs, max_new_tokens=100, temperature=0.8, do_sample=True)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support