Repetition loops with llama.cpp defaults

by todaymare - opened 17 days ago

clanker below

Environment

Model: gemma-4-19b-reap-Q4_K_M.gguf (converted from bf16 via convert_hf_to_gguf.py)
llama.cpp build: b8660-d00685831
Hardware: Apple M2 Max
Command:

llama-cli -m ~/models/gemma-4-19b-reap-Q4_K_M.gguf --jinja -cnv

Chat template: the model's provided chat_template.jinja, loaded via --jinja (embedded in GGUF during conversion)
All sampling settings at llama.cpp defaults (temp 0.8, top_p 0.95, top_k 40, repeat_penalty 1.1)

Issue
The model enters repetition loops, spamming the same token or phrase until the context limit is hit.

Two observed loop types:

On the first prompt of a session: leaks <|channel>0<|channel>thought then spams 000000...
General repetition: model starts repeating a word or phrase indefinitely (e.g. about about about about...)

Reproduction

Start a fresh session
Send any prompt (e.g. "give me a fun fact about the roman empire")
Model outputs [Start thinking] then immediately spams <|channel>0<|channel>thought000000...
Send a follow-up — model responds correctly

todaymare changed discussion status to closed 17 days ago

0xSero changed discussion status to open 17 days ago

Owner 17 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment