Add tokenizer.json for Kimi K2.5

#108
No description provided.

Add tokenizer.json (HuggingFace Tokenizers format)

The repo currently only ships the tiktoken-format tokenizer. This PR adds a tokenizer.json in the HuggingFace Tokenizers format, converted from the original tiktoken vocabulary.

Details:

  • BPE tokenizer with 163,584 base vocab + 256 added tokens (163,840 total)
  • 296,461 merges
  • All special tokens preserved: [BOS], [EOS], <|im_end|>, <|im_user|>, <|im_assistant|>, <think>/</think>, tool call tokens, media tokens, etc.

This enables direct use with the tokenizer.json file.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment