Repetition loops with llama.cpp defaults
#2
by todaymare - opened
clanker below
Environment
- Model: gemma-4-19b-reap-Q4_K_M.gguf (converted from bf16 via convert_hf_to_gguf.py)
- llama.cpp build: b8660-d00685831
- Hardware: Apple M2 Max
- Command:
llama-cli -m ~/models/gemma-4-19b-reap-Q4_K_M.gguf --jinja -cnv
- Chat template: the model's provided
chat_template.jinja, loaded via--jinja(embedded in GGUF during conversion) - All sampling settings at llama.cpp defaults (temp 0.8, top_p 0.95, top_k 40, repeat_penalty 1.1)
Issue
The model enters repetition loops, spamming the same token or phrase until the context limit is hit.
Two observed loop types:
- On the first prompt of a session: leaks
<|channel>0<|channel>thoughtthen spams000000... - General repetition: model starts repeating a word or phrase indefinitely (e.g.
about about about about...)
Reproduction
- Start a fresh session
- Send any prompt (e.g. "give me a fun fact about the roman empire")
- Model outputs
[Start thinking]then immediately spams<|channel>0<|channel>thought000000... - Send a follow-up β model responds correctly
todaymare changed discussion status to closed
0xSero changed discussion status to open


