Correlating tokenizer properties of pre-trained LLMs with their downstream performance.
-
shikhar-srivastava/mono_350m_pre_lr1e-3_fineweb_eng_bpe_unscaled_8192
0.3B • Updated -
shikhar-srivastava/mono_350m_pre_lr1e-3_fineweb_eng_unigram_unscaled_8192
0.3B • Updated -
shikhar-srivastava/mono_350m_pre_lr1e-3_fineweb_eng_unigram_unscaled_65536
0.4B • Updated • 2 -
shikhar-srivastava/mono_350m_pre_lr1e-3_fineweb_eng_bpe_unscaled_65536
0.4B • Updated • 1