Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B

by aslon1213 - opened 19 days ago

Summary

Gemma 4 E2B and E4B officially support audio input (ASR, speech understanding) as a native modality — per Google DeepMind's model card and the Unsloth docs which state:

"E2B and E4B also support image and audio."

However, the current mmproj-BF16.gguf / mmproj-F16.gguf / mmproj-F32.gguf files in this repo appear to only export the vision encoder. When loading with llama.cpp, the audio encoder is either missing or not wired up, making audio input non-functional despite the underlying model weights supporting it.

This is confirmed by the open llama.cpp issue #21325 where the mmproj loads with has audio encoder in logs — but the audio pipeline doesn't complete end-to-end.

What I'd like to see

mmproj-BF16.gguf updated to include the audio encoder weights
(Optionally) a separate mmproj-audio-BF16.gguf if the combined file becomes too large
Confirmation of which llama.cpp build version is required for audio to work once the mmproj is ready

aslon1213 changed discussion title from Feature Request: Include Audio Encoder in `mmproj` GGUFs for E2B / E4B to Feature Request: Include Audio Encoder in mmproj GGUFs for E2B / E4B 19 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment