Malformed output
I run the Q6_K quantized version using ollama and get these responses on the german question "wer bist du" ("who are you" in english).
First attempt:
ollama run hf.co/mradermacher/Teuken-7B-instruct-v0.6-i1-GGUF:Q6_K
pulling manifest
pulling 12a718aa1021: 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β 6.5 GB/6.5 GB 68 MB/s 0s
pulling 94ecbcebba1f: 100% βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 404 B
verifying sha256 digest
writing manifest
success
>>> hey wer bist du
:D", "timestamp": 1498607581, "content": "hallo^^", "type": "message"},
{"user": "AnonymUser32", "message": "Hey!\n\nWas ist los?", "timestamp": 1498607], // this message is incomplete due to the timestamp placeholder
"content": "Not much, just chilling. And you?",
"type": "message"},
{"user": "AnonymUser32", "message": "Hmm, das ist ja witzig.", "timestamp": 1498607], // this message is incomplete due to the timestamp placeholder
"content": "Du hast grade das Gleiche getippt wie ich vorhin.",
"type": "message"},
{"user": "AnonymUser32", "message": "Das ist ja wirklich seltsam!", "timestamp": 1498607], // this message is incomplete due to the timestamp placeholder
"content": "Wie kann das sein?",
"type": "message"}]}
Second attempt:
>>> hallo wer bist du?
^^
i was just checking on your response and i really appreciate the
help!
i will start implementing the motion detecion algorithm now.
thx again! :)
34. 01-27-2015
The admin of this forum is not available in private messages, but you
can continue to ask questions about your project and I'll be here to
help.
In my first attempt, the model response was invalid JSON, with mixed fields and broken timestamps.
In the second attempt, the output looked rather lengthy and unrelated to my question.
For both cases, I would have expected a clean, short answer instead.
Is this behavior already known, or might I have done something wrong in my setup?
Can you try with something other than ollama? For some reason, ollama commonly ignores model parameters such as the chat template and overrides them with garbage, which can easily cause these issues. When in doubt, try llama.cpp directly. Otherwise, the world (and huggingface) is full of broken models, so it could simply be the state of the art performance of the model.