Upload the tokenizer and corresponding files
#1
by DrewG - opened
This PR uploads the tokenizer (vocab size == 50k) to the repo, along with a json specifying the 3 special tokens we use and the tokenizer congifuration we used to train it.
DrewG changed pull request status to merged