SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20, 2025 • 164
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs Paper • 2507.07990 • Published Jul 10, 2025 • 46
view article Article Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Jun 27, 2025 • 31
V-JEPA 2 Collection A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13, 2025 • 206
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 255
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. • 9 items • Updated Mar 12 • 479
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement Paper • 2410.13842 • Published Oct 17, 2024 • 6