Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B

#1
by aslon1213 - opened

Summary

Gemma 4 E2B and E4B officially support audio input (ASR, speech understanding) as a native modality β€” per Google DeepMind's model card and the Unsloth docs which state:

"E2B and E4B also support image and audio."

However, the current mmproj-BF16.gguf / mmproj-F16.gguf / mmproj-F32.gguf files in this repo appear to only export the vision encoder. When loading with llama.cpp, the audio encoder is either missing or not wired up, making audio input non-functional despite the underlying model weights supporting it.

This is confirmed by the open llama.cpp issue #21325 where the mmproj loads with has audio encoder in logs β€” but the audio pipeline doesn't complete end-to-end.

What I'd like to see

  • mmproj-BF16.gguf updated to include the audio encoder weights
  • (Optionally) a separate mmproj-audio-BF16.gguf if the combined file becomes too large
  • Confirmation of which llama.cpp build version is required for audio to work once the mmproj is ready
aslon1213 changed discussion title from Feature Request: Include Audio Encoder in `mmproj` GGUFs for E2B / E4B to Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B

Sign up or log in to comment