The <|im_start|>, <|im_end|>'s token_type should be set as CONTROL

#10

by CodeHz - opened Jan 15

Jan 15

When I was running llama.cpp, I encountered several times where the model output <|im_start|> as a regular token. Fortunately, it is easy to fix; I just needed to edit the token_types file again in the gguf file.
https://ciscai-gguf-editor.hf.space/download/TheBloke/CausalLM-14B-GGUF/causallm_14b.Q5_1.gguf?add=%5B%22tokenizer.ggml.token_type%22,5,%7B%22151644%22:3,%22151645%22:3%7D%5D

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment