GGUF
conversational

Outdated GGUF

#17
by iyanello - opened

Considering there were llama.cpp changes to fix eos and support qwen3 emdedding, this model should be re-converted and re-uploaded.
I did tests and checked that it works. But I don't want to upload one more model. Lets save some space!

before fix:
embed("test<|endoftext|>")
after fix
embed("test")

Yeah, it would be great if this can be fixed. Without appending <|endoftext|> to the end of every prompt the quality of matching here in llama.cpp is terrible.
When using ollama with the qwen3-emedding-0.6b on their model registry the quality is pretty good. I compared some output vectors between ollama and llama.cpp. When I append <|endoftext|> to the end of the documents in llama.cpp they match the output from ollama. Otherwise they don’t match and have pretty poor quality. I tried to read through the ollama source code to see if they were providing different parameters to this model. I don’t think they are, I think the model in their registry just has this fix applied.

Yeah, it would be great if this can be fixed. Without appending <|endoftext|> to the end of every prompt the quality of matching here in llama.cpp is terrible.
When using ollama with the qwen3-emedding-0.6b on their model registry the quality is pretty good. I compared some output vectors between ollama and llama.cpp. When I append <|endoftext|> to the end of the documents in llama.cpp they match the output from ollama. Otherwise they don’t match and have pretty poor quality. I tried to read through the ollama source code to see if they were providing different parameters to this model. I don’t think they are, I think the model in their registry just has this fix applied.

Can you tell me how? I tried quantizing it myself, but the results are the identical as the old GGUF in this repo.

Sign up or log in to comment