The <|im_start|>, <|im_end|>'s token_type should be set as CONTROL
#10
by CodeHz - opened
When I was running llama.cpp, I encountered several times where the model output <|im_start|> as a regular token. Fortunately, it is easy to fix; I just needed to edit the token_types file again in the gguf file.
https://ciscai-gguf-editor.hf.space/download/TheBloke/CausalLM-14B-GGUF/causallm_14b.Q5_1.gguf?add=%5B%22tokenizer.ggml.token_type%22,5,%7B%22151644%22:3,%22151645%22:3%7D%5D