Add tokenizer.json for Kimi K2.6

#18
No description provided.

Add tokenizer.json (HuggingFace Tokenizers format)

The repo currently only ships the tiktoken-format tokenizer. This PR adds a tokenizer.json in the HuggingFace Tokenizers format, converted from the original tiktoken vocabulary.

Details:

  • BPE tokenizer with 163,584 base vocab + 256 added tokens (163,840 total)
  • 296,461 merges
  • All special tokens preserved: [BOS], [EOS], <|im_end|>, <|im_user|>, <|im_assistant|>, <think>/</think>, tool call tokens, media tokens, etc.

This enables direct use with the transformers AutoTokenizer / PreTrainedTokenizerFast without needing the tiktoken dependency.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment