HuggingFaceTB/SmolVLM2-256M-Video-Instruct Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 137k • 100
mlx-community/SmolVLM2-500M-Video-Instruct-mlx-8bit-skip-vision Video-Text-to-Text • Updated Feb 20, 2025 • 120 • 5
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 329k • 1.58k
meta-llama/Llama-4-Maverick-17B-128E-Instruct Image-Text-to-Text • 402B • Updated May 22, 2025 • 32.7k • • 478