OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks Paper • 2604.08539 • Published 7 days ago • 48
MARS: Enabling Autoregressive Models Multi-Token Generation Paper • 2604.07023 • Published 8 days ago • 38
Experience Transfer for Multimodal LLM Agents in Minecraft Game Paper • 2604.05533 • Published 9 days ago • 15
360Anything: Geometry-Free Lifting of Images and Videos to 360° Paper • 2601.16192 • Published Jan 22 • 9
Running 41 Image Upscaler And Restoring GFPGAN Algorithm 🦀 41 Enhance and upscale images using GFPGAN