lmms-lab/RefCOCO
Viewer • Updated • 17.6k • 8.5k • 31
Feeling and building the multimodal intelligence.
A Simple Baseline for Streaming Video Understanding
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence