Wang Chengyao's picture

Wang Chengyao PRO

wcy1122

·

https://wcy1122.github.io/

AI & ML interests

Multimodal Intelligence

Recent Activity

upvoted a paper 22 days ago

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

upvoted a paper 27 days ago

Efficient Reasoning with Balanced Thinking

upvoted a paper about 1 month ago

Utonia: Toward One Encoder for All Point Clouds

View all activity

Organizations

upvoted a paper 22 days ago

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Paper • 2603.22003 • Published 23 days ago • 11

upvoted a paper 27 days ago

Efficient Reasoning with Balanced Thinking

Paper • 2603.12372 • Published Mar 12 • 146

upvoted a paper about 1 month ago

Utonia: Toward One Encoder for All Point Clouds

Paper • 2603.03283 • Published Mar 3 • 185

upvoted a paper about 2 months ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

upvoted a paper 3 months ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 322

upvoted a paper 4 months ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 134

upvoted a paper 5 months ago

Cambrian-S: Towards Spatial Supersensing in Video

Paper • 2511.04670 • Published Nov 6, 2025 • 39

upvoted 3 papers 6 months ago

Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations

Paper • 2510.23607 • Published Oct 27, 2025 • 181

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 277

DreamOmni2: Multimodal Instruction-based Editing and Generation

Paper • 2510.06679 • Published Oct 8, 2025 • 74

upvoted 2 papers 7 months ago

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Paper • 2509.25131 • Published Sep 29, 2025 • 16

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26, 2025 • 189

upvoted 2 collections 8 months ago

DeepSeek-V3.1

3 items • Updated Mar 2 • 262

MGM-Omni

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech • 13 items • Updated Mar 2 • 11

upvoted 2 papers 9 months ago

VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

upvoted 2 papers over 1 year ago

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Paper • 2412.09501 • Published Dec 12, 2024 • 48

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Paper • 2412.04467 • Published Dec 5, 2024 • 118

upvoted a collection over 1 year ago

Llama 3.1

This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Dec 6, 2024 • 710

upvoted a collection about 2 years ago

MGM-Data

Official data collection for the paper "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models" • 2 items • Updated Apr 21, 2024 • 7