Outdated GGUF
Considering there were llama.cpp changes to fix eos and support qwen3 emdedding, this model should be re-converted and re-uploaded.
I did tests and checked that it works. But I don't want to upload one more model. Lets save some space!
before fix:embed("test<|endoftext|>")
after fixembed("test")
Yeah, it would be great if this can be fixed. Without appending <|endoftext|> to the end of every prompt the quality of matching here in llama.cpp is terrible.
When using ollama with the qwen3-emedding-0.6b on their model registry the quality is pretty good. I compared some output vectors between ollama and llama.cpp. When I append <|endoftext|> to the end of the documents in llama.cpp they match the output from ollama. Otherwise they don’t match and have pretty poor quality. I tried to read through the ollama source code to see if they were providing different parameters to this model. I don’t think they are, I think the model in their registry just has this fix applied.
Yeah, it would be great if this can be fixed. Without appending
<|endoftext|>to the end of every prompt the quality of matching here in llama.cpp is terrible.
When using ollama with the qwen3-emedding-0.6b on their model registry the quality is pretty good. I compared some output vectors between ollama and llama.cpp. When I append<|endoftext|>to the end of the documents in llama.cpp they match the output from ollama. Otherwise they don’t match and have pretty poor quality. I tried to read through the ollama source code to see if they were providing different parameters to this model. I don’t think they are, I think the model in their registry just has this fix applied.
Can you tell me how? I tried quantizing it myself, but the results are the identical as the old GGUF in this repo.