Token accuracy issues

by apex-3d - opened 20 days ago

Hello! Thanks for the quick release.

I'm getting a few weird quirks where the model incorrectly thinks certain words are misspelled when asked to proofread or give feedback on text content.

For example, I asked it to write me a story about a cat and a dog, and then fed that input back to it to proofread. It consistently generates responses like this:

1. Technical Corrections (Proofreading)
You have a few minor typos that likely happened during the flow of writing. Cleaning these up will make the prose seamless:

"attempt" → attempt
"pounce" → (Actually, you spelled this correctly, but check the surrounding flow).
"undeterred" → undeterred
"wagged" → wagged (This appears twice).
"vibrated" → vibrated
"Minerva" → Minerva (You switched the 'i' and 'e' in one instance toward the end).

Tested with UD-Q8_K_XL and UD-Q6_K_XL in llama.cpp b8637 and b8639.

Not sure if it's a problem with the quant, base model, or inference engine, but figured I'd open a discussion here to track the issue.

fizzacles

20 days ago

•

edited 20 days ago

I'm not sure if something is wrong with the chat template. Having some issues with the model getting fixated on a singular token and repeating that forever, Q6_K_XL with "neutral" samplers.

quasar-of-mikus

20 days ago

•

edited 20 days ago

bartowski's quants are broken too. Likely related: https://github.com/ggml-org/llama.cpp/issues/21321

@fizzacles Same here. Happens when formatting it manually per google's instructions.

apex-3d

20 days ago

•

edited 19 days ago

Yup, confirmed bartowski's quants are also broken (even the full bf16 GGUF).

Haven't been able to compare with the raw base model yet since I was waiting for VLLM to merge support for this model. Will post here when I get a chance unless someone beats me to it.

EDIT: Was able to run the original model in BF16 successfully with VLLM v0.19.0, so this is a llama.cpp issue. Not sure if all the fixes have been merged in yet.

quasar-of-mikus

19 days ago

•

edited 19 days ago

Try adding <bos> at the start of context manually. Seems to fix it for me (?)

Edit: No. It helped because I misconfigured the model. Also, https://github.com/ggml-org/llama.cpp/pull/21343

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment