librarian-base-130m
A 125.6M parameter causal language model trained from scratch on WikiText-103 and TinyStories.
No pretrained weights. Custom BPE tokenizer.
Model Specs
| Property | Value |
|---|---|
| Parameters | 125,553,408 |
| Layers | 12 |
| Heads | 12 |
| Embedding dim | 768 |
| Context length | 1024 tokens |
| Vocabulary | 16,000 (custom BPE) |
| Training steps | 92,000 |
| Validation perplexity | 6.19 |
Usage
Option 1 β pip package:
pip install librarian-lm
import librarian
model = librarian.load("130m")
print(model.generate("The history of Rome began", temperature=0.7, top_k=40))
Option 2 β run locally:
pip install torch tokenizers
python generate.py --prompt "The history of Rome began" --temperature 0.7 --top_k 40
Generation Tips
Base models work better with longer, document-style prompts:
# Better
model.generate("In the beginning , the Roman Empire was founded by")
model.generate("Once upon a time in a land far away , there lived a")
# Too short β likely to drift
model.generate("rome")
Lower temperature (0.7) and top_k (40) give more coherent output than the defaults.
What It Can and Can't Do
Can:
- Continue text fluently in English
- Serve as a base for fine-tuning
Cannot:
- Follow instructions
- Answer questions
- Stay on topic reliably at high temperatures
Training Data
| Dataset | Role |
|---|---|
| WikiText-103 | Structured prose, encyclopedic text |
| TinyStories | Short narrative text |
Files
| File | Description |
|---|---|
librarian-base-130m.pt |
PyTorch checkpoint |
librarian-base-130m-tokenizer.json |
Custom BPE tokenizer |
librarian-base-130m-tokenizer-config.json |
Tokenizer config |
generate.py |
Self-contained inference script |
Model Series
| Model | Status |
|---|---|
| librarian-base-130m | β Released |
| librarian-base-390m | π Coming soon |
| librarian-instruct-130m | π Planned |
License
MIT