GGUF/llama.cpp support

by empeza - opened about 18 hours ago

Hi, I'm really excited to try a MOSS-Audio model because Gemma, Qwen Omni, and most other audio text to text models do not support word-level timestamps and seem limited.

I'm curious if the MOSS team would ever integrate support for MOSS-Audio into llama.cpp. Currently Qwen audio models, Mistral's Voxtral, Gemma, LFM2-Audio, and Ultravox have support in llama.cpp.

kiiic

OpenMOSS org about 8 hours ago

Thanks for your interest in MOSS-Audio — we really appreciate it. At the moment, we do not have official plans to support llama.cpp. That said, if someone from the community would like to help adapt MOSS-Audio to llama.cpp, we would be very grateful.

empeza

about 1 hour ago

Thanks. I tried out the 8B models and was disappointed by them. They are not very good at following instructions, even though sometimes the output was useful. The thinking one was marginally better, but I could not get them to output in a consistent format after many attempts, even after providing a complete example. They are not good at parsing individual music notes or changes in pitch which is what I was trying to do with it. I don't know of any models that can do this well but I was hoping MOSS-Audio-8B-Instruct/MOSS-Audio 8B Thinking would do it.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment