9 32 22

Zhongang Cai

caizhongang

http://caizhongang.com/

AI & ML interests

Multimodal, Video Reasoning, Spatial Intelligence, Virtual Humans.

Recent Activity

liked a dataset 7 days ago

Video-Reason/VBVR-Bench-Data

liked a model 8 days ago

sensenova/SenseNova-SI-1.5-InternVL3-8B

upvoted a paper 10 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

View all activity

Organizations

upvoted a paper 10 days ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published 12 days ago • 233

upvoted a paper 11 days ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published 12 days ago • 40

upvoted 2 papers 28 days ago

Bridging Semantic and Kinematic Conditions with Diffusion-based Discrete Motion Tokenizer

Paper • 2603.19227 • Published 29 days ago • 42

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Paper • 2603.19231 • Published 29 days ago • 36

upvoted 5 papers about 1 month ago

Kinema4D: Kinematic 4D World Modeling for Spatiotemporal Embodied Simulation

Paper • 2603.16669 • Published Mar 17 • 70

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 369

HSImul3R: Physics-in-the-Loop Reconstruction of Simulation-Ready Human-Scene Interactions

Paper • 2603.15612 • Published Mar 16 • 152

ArtHOI: Articulated Human-Object Interaction Synthesis by 4D Reconstruction from Video Priors

Paper • 2603.04338 • Published Mar 4 • 24

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted a paper about 2 months ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

upvoted a paper 2 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

upvoted a paper 3 months ago

DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation

Paper • 2601.22153 • Published Jan 29 • 74

upvoted 2 papers 4 months ago

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 67

LongVie 2: Multimodal Controllable Ultra-Long Video World Model

Paper • 2512.13604 • Published Dec 15, 2025 • 76

upvoted a collection 4 months ago

SenseNova-SI

Collection

Scaling Spatial Intelligence with Multimodal Foundation Models • 15 items • Updated 1 day ago • 16

upvoted 5 papers 5 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96

Zhongang Cai

AI & ML interests

Recent Activity

Organizations

caizhongang's activity