Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B
#1
by aslon1213 - opened
Summary
Gemma 4 E2B and E4B officially support audio input (ASR, speech understanding) as a native modality β per Google DeepMind's model card and the Unsloth docs which state:
"E2B and E4B also support image and audio."
However, the current mmproj-BF16.gguf / mmproj-F16.gguf / mmproj-F32.gguf files in this repo appear to only export the vision encoder. When loading with llama.cpp, the audio encoder is either missing or not wired up, making audio input non-functional despite the underlying model weights supporting it.
This is confirmed by the open llama.cpp issue #21325 where the mmproj loads with has audio encoder in logs β but the audio pipeline doesn't complete end-to-end.
What I'd like to see
-
mmproj-BF16.ggufupdated to include the audio encoder weights - (Optionally) a separate
mmproj-audio-BF16.ggufif the combined file becomes too large - Confirmation of which llama.cpp build version is required for audio to work once the mmproj is ready
aslon1213 changed discussion title from Feature Request: Include Audio Encoder in `mmproj` GGUFs for E2B / E4B to Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B