Tokenizer Study (LLaMA 130M)
Collection
Correlating tokenizer properties on pre-trained LLMs with their downstream performance. • 84 items • Updated • 3
LLaMA 130M (Implementation: https://github.com/lmsdss/LayerNorm-Scaling)
Pre-Training: C4 [~2.054B tokens (BPE), ~2.00B tokens (SentencePiece)]
Tokenizer: BPE (LLaMA2 7B's Tokenizer: meta-llama/Llama-2-7b-hf)
Perplexity
Bits-per-byte
Checkpoints:
Path: /model_10000
Evals:
Perplexity: 25.6822
Bits-per-byte: 0.4409