Add tokenizer.json for Kimi K2.5
#108
by hoangquan456 - opened
No description provided.
Add tokenizer.json (HuggingFace Tokenizers format)
The repo currently only ships the tiktoken-format tokenizer. This PR adds a tokenizer.json in the HuggingFace Tokenizers format, converted from the original tiktoken vocabulary.
Details:
- BPE tokenizer with 163,584 base vocab + 256 added tokens (163,840 total)
- 296,461 merges
- All special tokens preserved:
[BOS],[EOS],<|im_end|>,<|im_user|>,<|im_assistant|>,<think>/</think>, tool call tokens, media tokens, etc.
This enables direct use with the tokenizer.json file.