Token accuracy issues
Hello! Thanks for the quick release.
I'm getting a few weird quirks where the model incorrectly thinks certain words are misspelled when asked to proofread or give feedback on text content.
For example, I asked it to write me a story about a cat and a dog, and then fed that input back to it to proofread. It consistently generates responses like this:
1. Technical Corrections (Proofreading)
You have a few minor typos that likely happened during the flow of writing. Cleaning these up will make the prose seamless:
"attempt" β attempt
"pounce" β (Actually, you spelled this correctly, but check the surrounding flow).
"undeterred" β undeterred
"wagged" β wagged (This appears twice).
"vibrated" β vibrated
"Minerva" β Minerva (You switched the 'i' and 'e' in one instance toward the end).
Tested with UD-Q8_K_XL and UD-Q6_K_XL in llama.cpp b8637 and b8639.
Not sure if it's a problem with the quant, base model, or inference engine, but figured I'd open a discussion here to track the issue.
I'm not sure if something is wrong with the chat template. Having some issues with the model getting fixated on a singular token and repeating that forever, Q6_K_XL with "neutral" samplers.
bartowski's quants are broken too. Likely related: https://github.com/ggml-org/llama.cpp/issues/21321
@fizzacles Same here. Happens when formatting it manually per google's instructions.
Yup, confirmed bartowski's quants are also broken (even the full bf16 GGUF).
Haven't been able to compare with the raw base model yet since I was waiting for VLLM to merge support for this model. Will post here when I get a chance unless someone beats me to it.
EDIT: Was able to run the original model in BF16 successfully with VLLM v0.19.0, so this is a llama.cpp issue. Not sure if all the fixes have been merged in yet.
Try adding <bos> at the start of context manually. Seems to fix it for me (?)
Edit: No. It helped because I misconfigured the model. Also, https://github.com/ggml-org/llama.cpp/pull/21343