Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.16793

PhysBrain Family

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Paper • 2601.14133 • Published Jan 20 • 61
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Paper • 2601.15197 • Published Jan 21 • 55

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published Dec 22, 2025 • 12
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76
HeqianQiu/EgoMe

Updated Apr 1, 2025 • 108 • 7
LiamLian0727/Euclid30K

Viewer • Updated Oct 1, 2025 • 29.7k • 58 • 1

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 37
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28, 2025 • 23

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Paper • 2512.22615 • Published Dec 27, 2025 • 50
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Paper • 2512.02835 • Published Dec 2, 2025 • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Paper • 2512.05343 • Published Dec 5, 2025 • 25

Multimodal Dataset

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 11
MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 32
Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2, 2024 • 71
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published Sep 9, 2024 • 49

PhysBrain Family

PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76
TwinBrainVLA: Unleashing the Potential of Generalist VLMs for Embodied Tasks via Asymmetric Mixture-of-Transformers

Paper • 2601.14133 • Published Jan 20 • 61
BayesianVLA: Bayesian Decomposition of Vision Language Action Models via Latent Action Queries

Paper • 2601.15197 • Published Jan 21 • 55

Dream-VL & Dream-VLA: Open Vision-Language and Vision-Language-Action Models with Diffusion Language Model Backbone

Paper • 2512.22615 • Published Dec 27, 2025 • 50
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76

QuantiPhy: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models

Paper • 2512.19526 • Published Dec 22, 2025 • 12
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence

Paper • 2512.16793 • Published Dec 18, 2025 • 76
HeqianQiu/EgoMe

Updated Apr 1, 2025 • 108 • 7
LiamLian0727/Euclid30K

Viewer • Updated Oct 1, 2025 • 29.7k • 58 • 1

ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning

Paper • 2512.02835 • Published Dec 2, 2025 • 10
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image

Paper • 2512.05044 • Published Dec 4, 2025 • 17
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published Dec 5, 2025 • 17
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling

Paper • 2512.05343 • Published Dec 5, 2025 • 25

MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds

Paper • 2508.14879 • Published Aug 20, 2025 • 69
VoxHammer: Training-Free Precise and Coherent 3D Editing in Native 3D Space

Paper • 2508.19247 • Published Aug 26, 2025 • 43
Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Paper • 2508.17437 • Published Aug 20, 2025 • 37
Multi-View 3D Point Tracking

Paper • 2508.21060 • Published Aug 28, 2025 • 23

Multimodal Dataset

SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers

Paper • 2407.09413 • Published Jul 12, 2024 • 11
MAVIS: Mathematical Visual Instruction Tuning

Paper • 2407.08739 • Published Jul 11, 2024 • 32
Kvasir-VQA: A Text-Image Pair GI Tract Dataset

Paper • 2409.01437 • Published Sep 2, 2024 • 71
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Paper • 2409.05840 • Published Sep 9, 2024 • 49

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 30
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 15
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs