-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 34 • 109 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 26 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
nvidia/omnivinci
Feature Extraction • Updated • 1.09k • 177
Collections
Discover the best community collections!
Collections including paper arxiv:2510.15870
-
allenai/MolmoAct-Pretraining-Mixture
Viewer • Updated • 24.2M • 2.53k • 10 -
nvidia/Llama-Nemotron-VLM-Dataset-v1
Viewer • Updated • 2.86M • 1.21k • 159 -
zai-org/GLM-4.1V-9B-Thinking
Image-Text-to-Text • 10B • Updated • 367k • 774 -
Chat with Kimi-VL-A3B-Thinking-2506
🤔196Chat with Kimi-VL: respond to text, images, video, PDFs
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 62 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 153 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 31
-
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Paper • 2510.09608 • Published • 53 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28 -
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Paper • 2510.20579 • Published • 56 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
meituan-longcat/LongCat-Flash-Omni
Any-to-Any • 561B • Updated • 34 • 109 -
LongCat-Flash-Omni Technical Report
Paper • 2511.00279 • Published • 26 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
nvidia/omnivinci
Feature Extraction • Updated • 1.09k • 177
-
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Paper • 2510.23763 • Published • 62 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92 -
Qwen3-Omni Technical Report
Paper • 2509.17765 • Published • 153 -
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Paper • 2510.13747 • Published • 31
-
StreamingVLM: Real-Time Understanding for Infinite Video Streams
Paper • 2510.09608 • Published • 53 -
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning
Paper • 2510.12693 • Published • 28 -
Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence
Paper • 2510.20579 • Published • 56 -
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Paper • 2510.15870 • Published • 92
-
allenai/MolmoAct-Pretraining-Mixture
Viewer • Updated • 24.2M • 2.53k • 10 -
nvidia/Llama-Nemotron-VLM-Dataset-v1
Viewer • Updated • 2.86M • 1.21k • 159 -
zai-org/GLM-4.1V-9B-Thinking
Image-Text-to-Text • 10B • Updated • 367k • 774 -
Chat with Kimi-VL-A3B-Thinking-2506
🤔196Chat with Kimi-VL: respond to text, images, video, PDFs
-
microsoft/bitnet-b1.58-2B-4T
Text Generation • 0.8B • Updated • 15.5k • 1.43k -
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Paper • 2504.10449 • Published • 15 -
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct
Text Generation • 8B • Updated • 96 • 17 -
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Paper • 2504.11536 • Published • 63
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 30 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 15 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23