-
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper • 2511.18373 • Published • 7 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19 -
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper • 2511.19418 • Published • 29 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 134
Collections
Discover the best community collections!
Collections including paper arxiv:2410.17799
-
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper • 2410.17799 • Published • 12 -
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Paper • 2012.15840 • Published • 3 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13
-
Step-Audio-R1 Technical Report
Paper • 2511.15848 • Published • 58 -
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper • 2410.17799 • Published • 12 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 54
-
MASS: Motion-Aware Spatial-Temporal Grounding for Physics Reasoning and Comprehension in Vision-Language Models
Paper • 2511.18373 • Published • 7 -
Multi-Agent Deep Research: Training Multi-Agent Systems with M-GRPO
Paper • 2511.13288 • Published • 19 -
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Paper • 2511.19418 • Published • 29 -
SAM 3: Segment Anything with Concepts
Paper • 2511.16719 • Published • 134
-
Step-Audio-R1 Technical Report
Paper • 2511.15848 • Published • 58 -
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper • 2410.17799 • Published • 12 -
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents
Paper • 2507.04009 • Published • 54
-
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Paper • 2410.17799 • Published • 12 -
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
Paper • 2012.15840 • Published • 3 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13