One Vision-Language-Action Model for GUI Agent
Qinghong (Kevin) Lin
KevinQHLin
AI & ML interests
Vision-Language Model, Video Understanding, Agent
Recent Activity
upvoted a paper 1 day ago
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios upvoted a paper 4 days ago
MolmoWeb: Open Visual Web Agent and Open Data for the Open Web upvoted a paper 6 days ago
RAGEN-2: Reasoning Collapse in Agentic RL