GGUF/llama.cpp support
Hi, I'm really excited to try a MOSS-Audio model because Gemma, Qwen Omni, and most other audio text to text models do not support word-level timestamps and seem limited.
I'm curious if the MOSS team would ever integrate support for MOSS-Audio into llama.cpp. Currently Qwen audio models, Mistral's Voxtral, Gemma, LFM2-Audio, and Ultravox have support in llama.cpp.
Thanks for your interest in MOSS-Audio β we really appreciate it. At the moment, we do not have official plans to support llama.cpp. That said, if someone from the community would like to help adapt MOSS-Audio to llama.cpp, we would be very grateful.
Thanks. I tried out the 8B models and was disappointed by them. They are not very good at following instructions, even though sometimes the output was useful. The thinking one was marginally better, but I could not get them to output in a consistent format after many attempts, even after providing a complete example. They are not good at parsing individual music notes or changes in pitch which is what I was trying to do with it. I don't know of any models that can do this well but I was hoping MOSS-Audio-8B-Instruct/MOSS-Audio 8B Thinking would do it.