deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

This is great but

by AizenYPB - opened 8 days ago

This is great, but the models fail to load in ollama. They download OK - but launching them = Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details.

I have 1 TB hdd that is 80% free, 16GB ram and 22GB GPUs resources are all good. My other models (larger) load OK. I've tried the F16, Q4 and just downloading the other below F16

AizenYPB

8 days ago

I've tried all the models as per the instructions for ollama - downloading, creating the model file and using the model file to import into ollama. None of them work. I can run native Gemma4 no problem at all. So I'm guessing the issue might be with the models instead or perhaps my method.

My method as follows:

in terminal I run: ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF:Q8_0

(it fails to run as soon as download completes with error: Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

Once download is complete, I create the ModelFile: nano Modelfile

I paste the following into the Modelfile:

FROM hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF:Q8_0

SYSTEM "You are a helpful assistant with tool-use capabilities. Think through problems step by step using tags."

PARAMETER temperature 0.7
PARAMETER num_ctx 8192

I save the file, then in terminal I run: ollama create gemma4-agentic -f Modelfile

I get the following from terminal in response:

gathering model components
using existing layer sha256:30386dd47b868af9e1e4d5c2d5bc6438d63eab7308401baea7bc3eae74ed54c0
using existing layer sha256:f56e8459650d8354cf701fa5b0ddaea9a7986a271d7f55677152d1355ab5afb6
using existing layer sha256:0602dd81234824ad2c0c3b025127da3fd57c11253ba2aed2ffca41e9d399b6e4
using existing layer sha256:92e2bdcbc010ba1b7b262dbdaaad2534b96c8846dd06e380abcf31359aa76afb
writing manifest
success

I then attempt to run the model which is now listed in ollama list as gemma4-agentic:

ollama run gemma4-agentic

I get the following: Error: 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details.

I follow the same process for all models and get the same results. Did I miss something?

deadbydawn101

Owner 7 days ago

Hey, thanks for the report — and glad you like the model!

There was a bug in the initial GGUF conversion where the embedding layer dimensions were incorrect (tensor shape mismatch). This has been fixed and re-uploaded — all quantizations (Q4_K_M, Q5_K_M, Q8_0, F16) should now load correctly in Ollama.

Please re-pull the model to get the corrected files:

ollama rm hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF
ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

Let me know if it works after the re-download!

AizenYPB

7 days ago

Thanks so much for all your work. I will re-download and test now.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment