HuggingFaceTB/SmolVLM2-256M-Video-Instruct Image-Text-to-Text • 0.3B • Updated Apr 8, 2025 • 124k • 101
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 75