Issue with IQ3_M and IQ4_XS dammaged template/stop (Broken)

#1
by zekromVale - opened

I have had issues with just using the models as is. It keeps looping and appears to be brain damaged. I found out that the a modelfile can fix this issue. I hope this helps some people out.

Fix for "Brain Dead" Looping (Custom Modelfile)

1. Find the Blob Path (or .gguf file)

If downloaded with Ollama the .gguf is stored as a "blob". We need to find its real path to use it in a Modelfile.
Run this command in your terminal to see the actual file path for your model:

ollama show hf.co/mradermacher/gemma-3-27b-it-qat-abliterated-i1-GGUF:IQ4_XS --modelfile | grep "FROM"

It will return something like: FROM /usr/share/ollama/.ollama/models/blobs/sha256:abc123...

2. Create the Modelfile

Create a file named GemmaFix.Modelfile and use that sha256 path in the FROM line.

# Replace the path below with the result from the 'ollama show' command above
FROM /usr/share/ollama/.ollama/models/blobs/sha256:YOUR_ACTUAL_HASH_HERE

# Manually force the correct Stop Tokens (Fixes the loop / brain damage)
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
PARAMETER stop "<eos>"

# Memory Management: Setting context to 8k to use remaining VRAM headroom
# Optional
PARAMETER num_ctx 8192

# Gemma 3 requires Temperature 1.0; it breaks/loops at low temps
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER min_p 0.05
PARAMETER repeat_penalty 1.1

# Set the Template manually to ensure the model sees the turn boundaries
TEMPLATE """<bos>{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>"""

SYSTEM "REPLACE WITH SYSTEM STARTING MESSAGE"

3. Build and Run

Run these commands to create the model:

ollama create gemma-3-fixed -f GemmaFix.Modelfile
ollama run gemma-3-fixed
zekromVale changed discussion title from Issue with IQ3_M and IQ4_XS dammaged template/stop (Solution) to Issue with IQ3_M and IQ4_XS dammaged template/stop (Broken)

Or, maybe that was a fluke, not working anymore.

Trying adding PARAMETER stop "</end_of_turn>"

That made it more stable on some prompts, but more complex prompts caused it to be insane. I think the quantization is broken.

Tried the no i1 version of the IQ4_XS model and had the same issue.

Found bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF works just fine. I highly recommend that one over mradermacher's models, as it stays sane. And if you want to try a different model, try bartowski/TheDrummer_Cydonia-24B-v4.3-GGUF, it's good for stories and can get the n_ctx very high with 16GB of vram (32K KV with Q8 all [possible] in vram).

Thanks a lot for your feedback. We could try to requant it using the latest llama.cpp version. Maybe there were some compatibility breaking changes in the 8 months since we quantized it. Please keep in mind that if we do so this discussion will unfortunately be deleted.

Sign up or log in to comment