Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AINovice2005 
posted an update Mar 17
Post
151
I recently created my first storage bucket to store experiment data of my performance analysis of 15 tokenizers across 20 languages.

The setup is simple enough for a new product and can be scalable depending on the use-case 🤗 .

Bucket: https://huggingface.co/buckets/AINovice2005/tokenizer-benchmark

github gist: https://gist.github.com/ParagEkbote/b3877f667f84cbb9a27bdaca94ba662a

Article: https://medium.com/@paragekbote23/one-sentence-fifteen-tokenizers-a-tokenizer-benchmarking-pipeline-with-hf-storage-buckets-2e59790276fd
In this post