librarian-base-130m

A 125.6M parameter causal language model trained from scratch on WikiText-103 and TinyStories.
No pretrained weights. Custom BPE tokenizer.

Model Specs

Property	Value
Parameters	125,553,408
Layers	12
Heads	12
Embedding dim	768
Context length	1024 tokens
Vocabulary	16,000 (custom BPE)
Training steps	92,000
Validation perplexity	6.19

Usage

Option 1 — pip package:

pip install librarian-lm

import librarian
model = librarian.load("130m")
print(model.generate("The history of Rome began", temperature=0.7, top_k=40))

Option 2 — run locally:

pip install torch tokenizers
python generate.py --prompt "The history of Rome began" --temperature 0.7 --top_k 40

Generation Tips

Base models work better with longer, document-style prompts:

# Better
model.generate("In the beginning , the Roman Empire was founded by")
model.generate("Once upon a time in a land far away , there lived a")

# Too short — likely to drift
model.generate("rome")

Lower temperature (0.7) and top_k (40) give more coherent output than the defaults.

What It Can and Can't Do

Can:

Continue text fluently in English
Serve as a base for fine-tuning

Cannot:

Follow instructions
Answer questions
Stay on topic reliably at high temperatures

Training Data

Dataset	Role
WikiText-103	Structured prose, encyclopedic text
TinyStories	Short narrative text

Files

File	Description
`librarian-base-130m.pt`	PyTorch checkpoint
`librarian-base-130m-tokenizer.json`	Custom BPE tokenizer
`librarian-base-130m-tokenizer-config.json`	Tokenizer config
`generate.py`	Self-contained inference script

Model Series

Model	Status
librarian-base-130m	✅ Released
librarian-base-390m	🔜 Coming soon
librarian-instruct-130m	🔜 Planned

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track