Issue with IQ3_M and IQ4_XS dammaged template/stop (Broken)
I have had issues with just using the models as is. It keeps looping and appears to be brain damaged. I found out that the a modelfile can fix this issue. I hope this helps some people out.
Fix for "Brain Dead" Looping (Custom Modelfile)
1. Find the Blob Path (or .gguf file)
If downloaded with Ollama the .gguf is stored as a "blob". We need to find its real path to use it in a Modelfile.
Run this command in your terminal to see the actual file path for your model:
ollama show hf.co/mradermacher/gemma-3-27b-it-qat-abliterated-i1-GGUF:IQ4_XS --modelfile | grep "FROM"
It will return something like: FROM /usr/share/ollama/.ollama/models/blobs/sha256:abc123...
2. Create the Modelfile
Create a file named GemmaFix.Modelfile and use that sha256 path in the FROM line.
# Replace the path below with the result from the 'ollama show' command above
FROM /usr/share/ollama/.ollama/models/blobs/sha256:YOUR_ACTUAL_HASH_HERE
# Manually force the correct Stop Tokens (Fixes the loop / brain damage)
PARAMETER stop "<start_of_turn>"
PARAMETER stop "<end_of_turn>"
PARAMETER stop "<eos>"
# Memory Management: Setting context to 8k to use remaining VRAM headroom
# Optional
PARAMETER num_ctx 8192
# Gemma 3 requires Temperature 1.0; it breaks/loops at low temps
PARAMETER temperature 1.0
PARAMETER top_p 0.95
PARAMETER min_p 0.05
PARAMETER repeat_penalty 1.1
# Set the Template manually to ensure the model sees the turn boundaries
TEMPLATE """<bos>{{ if .System }}<start_of_turn>system
{{ .System }}<end_of_turn>
{{ end }}{{ if .Prompt }}<start_of_turn>user
{{ .Prompt }}<end_of_turn>
{{ end }}<start_of_turn>model
{{ .Response }}<end_of_turn>"""
SYSTEM "REPLACE WITH SYSTEM STARTING MESSAGE"
3. Build and Run
Run these commands to create the model:
ollama create gemma-3-fixed -f GemmaFix.Modelfile
ollama run gemma-3-fixed
Or, maybe that was a fluke, not working anymore.
Trying adding PARAMETER stop "</end_of_turn>"
That made it more stable on some prompts, but more complex prompts caused it to be insane. I think the quantization is broken.
Tried the no i1 version of the IQ4_XS model and had the same issue.
Found bartowski/mlabonne_gemma-3-27b-it-abliterated-GGUF works just fine. I highly recommend that one over mradermacher's models, as it stays sane. And if you want to try a different model, try bartowski/TheDrummer_Cydonia-24B-v4.3-GGUF, it's good for stories and can get the n_ctx very high with 16GB of vram (32K KV with Q8 all [possible] in vram).
Thanks a lot for your feedback. We could try to requant it using the latest llama.cpp version. Maybe there were some compatibility breaking changes in the 8 months since we quantized it. Please keep in mind that if we do so this discussion will unfortunately be deleted.