Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.12255

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91
Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22, 2025 • 383k • 909

about 9 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 194

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 37
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28, 2025 • 23

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

PACED: Distillation at the Frontier of Student Competence

Paper • 2603.11178 • Published Mar 11 • 4
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Paper • 2603.12201 • Published Mar 12 • 53
TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published 4 days ago • 11

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 35
Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Does More Inference-Time Compute Really Help Robustness?

Paper • 2507.15974 • Published Jul 21, 2025 • 7
TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published Jan 30 • 35

Image-Video MultiModal Understanding

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2, 2025 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14, 2025 • 34

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91
Qwen/Qwen3-Omni-30B-A3B-Instruct

Any-to-Any • 35B • Updated Sep 22, 2025 • 383k • 909

PACED: Distillation at the Frontier of Student Competence

Paper • 2603.11178 • Published Mar 11 • 4
Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Paper • 2603.12201 • Published Mar 12 • 53
TIP: Token Importance in On-Policy Distillation

Paper • 2604.14084 • Published 4 days ago • 11

about 9 hours ago

AgentConductor: Topology Evolution for Multi-Agent Competition-Level Code Generation

Paper • 2602.17100 • Published Feb 19 • 4
GroupGPT: A Token-efficient and Privacy-preserving Agentic Framework for Multi-User Chat Assistant

Paper • 2603.01059 • Published Mar 1 • 1
Multi-Domain Riemannian Graph Gluing for Building Graph Foundation Models

Paper • 2603.00618 • Published Feb 28
Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 194

My notification

OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation

Paper • 2601.15369 • Published Jan 21 • 21
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model

Paper • 2601.15892 • Published Jan 22 • 53
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders

Paper • 2601.16208 • Published Jan 22 • 55
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems

Paper • 2601.11004 • Published Jan 16 • 30

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 37
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28, 2025 • 23

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 35
Test-Time Scaling with Reflective Generative Model

Paper • 2507.01951 • Published Jul 2, 2025 • 108
Does More Inference-Time Compute Really Help Robustness?

Paper • 2507.15974 • Published Jul 21, 2025 • 7
TTCS: Test-Time Curriculum Synthesis for Self-Evolving

Paper • 2601.22628 • Published Jan 30 • 35

UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models

Paper • 2410.14059 • Published Oct 17, 2024 • 63
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching

Paper • 2503.05179 • Published Mar 7, 2025 • 46
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 96
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13, 2025 • 53

Image-Video MultiModal Understanding

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Paper • 2412.10360 • Published Dec 13, 2024 • 147
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2, 2025 • 5
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 46
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks

Paper • 2501.08326 • Published Jan 14, 2025 • 34

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs