Repetition loops with llama.cpp defaults

#2
by todaymare - opened

clanker below

Environment

  • Model: gemma-4-19b-reap-Q4_K_M.gguf (converted from bf16 via convert_hf_to_gguf.py)
  • llama.cpp build: b8660-d00685831
  • Hardware: Apple M2 Max
  • Command:
llama-cli -m ~/models/gemma-4-19b-reap-Q4_K_M.gguf --jinja -cnv
  • Chat template: the model's provided chat_template.jinja, loaded via --jinja (embedded in GGUF during conversion)
  • All sampling settings at llama.cpp defaults (temp 0.8, top_p 0.95, top_k 40, repeat_penalty 1.1)

Issue
The model enters repetition loops, spamming the same token or phrase until the context limit is hit.

Two observed loop types:

  1. On the first prompt of a session: leaks <|channel>0<|channel>thought then spams 000000...
  2. General repetition: model starts repeating a word or phrase indefinitely (e.g. about about about about...)

Reproduction

  1. Start a fresh session
  2. Send any prompt (e.g. "give me a fun fact about the roman empire")
  3. Model outputs [Start thinking] then immediately spams <|channel>0<|channel>thought000000...
  4. Send a follow-up β€” model responds correctly

image

image

image

todaymare changed discussion status to closed
0xSero changed discussion status to open

Sign up or log in to comment