-
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
Paper • 2311.05332 • Published • 11 -
SoundCam: A Dataset for Finding Humans Using Room Acoustics
Paper • 2311.03517 • Published • 14
Chaolei Tan
Chaolei
·
AI & ML interests
Computer Vision, Multimodal Learning, Video Understanding
Recent Activity
liked a dataset 29 days ago
Video-Reason/VBVR-Dataset liked a Space about 1 month ago
Qwen/Qwen3-VL-Demo liked a model about 1 year ago
microsoft/Phi-4-multimodal-instructOrganizations
None yet