Image-Text-to-Text
GGUF
minicpm-v
multimodal
conversational