Issue with IQ3_M and IQ4_XS dammaged template/stop (Broken)

by zekromVale - opened Jan 22

•

I have had issues with just using the models as is. It keeps looping and appears to be brain damaged. I found out that the a modelfile can fix this issue. I hope this helps some people out.

Fix for "Brain Dead" Looping (Custom Modelfile)

1. Find the Blob Path (or .gguf file)

If downloaded with Ollama the .gguf is stored as a "blob". We need to find its real path to use it in a Modelfile.
Run this command in your terminal to see the actual file path for your model:

ollama show hf.co/mradermacher/gemma-3-27b-it-qat-abliterated-i1-GGUF:IQ4_XS --modelfile | grep "FROM"

It will return something like: FROM /usr/share/ollama/.ollama/models/blobs/sha256:abc123...

2. Create the Modelfile

Create a file named GemmaFix.Modelfile and use that sha256 path in the FROM line.

# Replace the path below with the result from the 'ollama show' command above
FROM /usr/share/ollama/.ollama/models/blobs/sha256:YOUR_ACTUAL_HASH_HERE

# Manually force the correct Stop Tokens (Fixes the loop / brain damage)
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
PARAMETER stop "<eos>"

# Memory Management: Setting context to 8k to use remaining VRAM headroom
# Optional
PARAMETER num_ctx 8192

# Gemma 3 requires Temperature 1.0; it breaks/loops at low temps
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER min_p 0.05
PARAMETER repeat_penalty 1.1

# Set the Template manually to ensure the model sees the turn boundaries
TEMPLATE """<bos>{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>"""

SYSTEM "REPLACE WITH SYSTEM STARTING MESSAGE"

3. Build and Run

Run these commands to create the model:

ollama create gemma-3-fixed -f GemmaFix.Modelfile
ollama run gemma-3-fixed

zekromVale changed discussion title from Issue with IQ3_M and IQ4_XS dammaged template/stop (Solution) to Issue with IQ3_M and IQ4_XS dammaged template/stop (Broken) Jan 22

zekromVale

Jan 22

Or, maybe that was a fluke, not working anymore.

zekromVale

Jan 22

•

edited Jan 22

Trying adding PARAMETER stop "</end_of_turn>"

zekromVale

Jan 22

That made it more stable on some prompts, but more complex prompts caused it to be insane. I think the quantization is broken.

zekromVale

Jan 22

Tried the no i1 version of the IQ4_XS model and had the same issue.

zekromVale

Feb 1

Found bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF works just fine. I highly recommend that one over mradermacher's models, as it stays sane. And if you want to try a different model, try bartowski/TheDrummer_Cydonia-24B-v4.3-GGUF, it's good for stories and can get the n_ctx very high with 16GB of vram (32K KV with Q8 all [possible] in vram).

nicoboss

Feb 2

•

edited Feb 2

Thanks a lot for your feedback. We could try to requant it using the latest llama.cpp version. Maybe there were some compatibility breaking changes in the 8 months since we quantized it. Please keep in mind that if we do so this discussion will unfortunately be deleted.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment