librarian-base-130m

A 125.6M parameter causal language model trained from scratch on WikiText-103 and TinyStories.
No pretrained weights. Custom BPE tokenizer.


Model Specs

Property Value
Parameters 125,553,408
Layers 12
Heads 12
Embedding dim 768
Context length 1024 tokens
Vocabulary 16,000 (custom BPE)
Training steps 92,000
Validation perplexity 6.19

Usage

Option 1 β€” pip package:

pip install librarian-lm
import librarian
model = librarian.load("130m")
print(model.generate("The history of Rome began", temperature=0.7, top_k=40))

Option 2 β€” run locally:

pip install torch tokenizers
python generate.py --prompt "The history of Rome began" --temperature 0.7 --top_k 40

Generation Tips

Base models work better with longer, document-style prompts:

# Better
model.generate("In the beginning , the Roman Empire was founded by")
model.generate("Once upon a time in a land far away , there lived a")

# Too short β€” likely to drift
model.generate("rome")

Lower temperature (0.7) and top_k (40) give more coherent output than the defaults.


What It Can and Can't Do

Can:

  • Continue text fluently in English
  • Serve as a base for fine-tuning

Cannot:

  • Follow instructions
  • Answer questions
  • Stay on topic reliably at high temperatures

Training Data

Dataset Role
WikiText-103 Structured prose, encyclopedic text
TinyStories Short narrative text

Files

File Description
librarian-base-130m.pt PyTorch checkpoint
librarian-base-130m-tokenizer.json Custom BPE tokenizer
librarian-base-130m-tokenizer-config.json Tokenizer config
generate.py Self-contained inference script

Model Series

Model Status
librarian-base-130m βœ… Released
librarian-base-390m πŸ”œ Coming soon
librarian-instruct-130m πŸ”œ Planned

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support