This is great but

#1
by AizenYPB - opened

This is great, but the models fail to load in ollama. They download OK - but launching them = Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details.

I have 1 TB hdd that is 80% free, 16GB ram and 22GB GPUs resources are all good. My other models (larger) load OK. I've tried the F16, Q4 and just downloading the other below F16

I've tried all the models as per the instructions for ollama - downloading, creating the model file and using the model file to import into ollama. None of them work. I can run native Gemma4 no problem at all. So I'm guessing the issue might be with the models instead or perhaps my method.

My method as follows:

in terminal I run: ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF:Q8_0

(it fails to run as soon as download completes with error: Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

Once download is complete, I create the ModelFile: nano Modelfile

I paste the following into the Modelfile:

FROM hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF:Q8_0

SYSTEM "You are a helpful assistant with tool-use capabilities. Think through problems step by step using tags."

PARAMETER temperature 0.7
PARAMETER num_ctx 8192

I save the file, then in terminal I run: ollama create gemma4-agentic -f Modelfile

I get the following from terminal in response:

gathering model components
using existing layer sha256:30386dd47b868af9e1e4d5c2d5bc6438d63eab7308401baea7bc3eae74ed54c0
using existing layer sha256:f56e8459650d8354cf701fa5b0ddaea9a7986a271d7f55677152d1355ab5afb6
using existing layer sha256:0602dd81234824ad2c0c3b025127da3fd57c11253ba2aed2ffca41e9d399b6e4
using existing layer sha256:92e2bdcbc010ba1b7b262dbdaaad2534b96c8846dd06e380abcf31359aa76afb
writing manifest
success

I then attempt to run the model which is now listed in ollama list as gemma4-agentic:

ollama run gemma4-agentic

I get the following: Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details.

I follow the same process for all models and get the same results. Did I miss something?

Hey, thanks for the report β€” and glad you like the model!

There was a bug in the initial GGUF conversion where the embedding layer dimensions were incorrect (tensor shape mismatch). This has been fixed and re-uploaded β€” all quantizations (Q4_K_M, Q5_K_M, Q8_0, F16) should now load correctly in Ollama.

Please re-pull the model to get the corrected files:

ollama rm hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF
ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

Let me know if it works after the re-download!

Thanks so much for all your work. I will re-download and test now.

Sign up or log in to comment