CoreX_v0.1 / corex_tok_info.txt
lit69's picture
Upload 6 files
ea08ed0 verified
CoreX Tokenizer Information
==========================
Vocabulary Size: 32000
Model Type: unigram
Special Tokens:
PAD: 0 -> '<pad>'
UNK: 1 -> '<unk>'
BOS: 2 -> '<s>'
EOS: 3 -> '</s>'