Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2602.20739

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published Feb 24 • 31

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published Jan 27 • 18
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published Jan 27 • 43
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

Agents-X/PyVision-Image-7B-SFT

Image-Text-to-Text • 8B • Updated Feb 25 • 12 • 1
Agents-X/PyVision-Image-7B-RL

Image-Text-to-Text • 8B • Updated Feb 26 • 60 • 1
Agents-X/PyVision-Image-SFT-Data

Viewer • Updated Feb 26 • 6.88k • 36
Agents-X/PyVision-Image-RL-Data

Viewer • Updated Feb 26 • 44.6k • 42 • 1

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 65
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26, 2025 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 217
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published Feb 24 • 31

Agents-X/PyVision-Video-7B-RL

Video-Text-to-Text • 8B • Updated Feb 26 • 9
Agents-X/PyVision-Video-7B-SFT

Video-Text-to-Text • 8B • Updated Feb 26 • 9
Agents-X/PyVision-Video-SFT-Data

Updated Feb 25 • 12
Agents-X/PyVision-Video-RL-Data

Viewer • Updated Feb 26 • 15k • 44

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Paper • 2310.19909 • Published Oct 30, 2023 • 21
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 38
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 32

about 19 hours ago

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27, 2025 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published Feb 24 • 31

PyVision-RL: Forging Open Agentic Vision Models via RL

Paper • 2602.20739 • Published Feb 24 • 31

Towards Pixel-Level VLM Perception via Simple Points Prediction

Paper • 2601.19228 • Published Jan 27 • 18
Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

Paper • 2601.19895 • Published Jan 27 • 27
Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

Paper • 2601.19798 • Published Jan 27 • 43
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models

Paper • 2601.21639 • Published Jan 29 • 51

Agents-X/PyVision-Video-7B-RL

Video-Text-to-Text • 8B • Updated Feb 26 • 9
Agents-X/PyVision-Video-7B-SFT

Video-Text-to-Text • 8B • Updated Feb 26 • 9
Agents-X/PyVision-Video-SFT-Data

Updated Feb 25 • 12
Agents-X/PyVision-Video-RL-Data

Viewer • Updated Feb 26 • 15k • 44

Agents-X/PyVision-Image-7B-SFT

Image-Text-to-Text • 8B • Updated Feb 25 • 12 • 1
Agents-X/PyVision-Image-7B-RL

Image-Text-to-Text • 8B • Updated Feb 26 • 60 • 1
Agents-X/PyVision-Image-SFT-Data

Viewer • Updated Feb 26 • 6.88k • 36
Agents-X/PyVision-Image-RL-Data

Viewer • Updated Feb 26 • 44.6k • 42 • 1

Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks

Paper • 2310.19909 • Published Oct 30, 2023 • 21
Memory Augmented Language Models through Mixture of Word Experts

Paper • 2311.10768 • Published Nov 15, 2023 • 19
FlashDecoding++: Faster Large Language Model Inference on GPUs

Paper • 2311.01282 • Published Nov 2, 2023 • 38
Prompt Cache: Modular Attention Reuse for Low-Latency Inference

Paper • 2311.04934 • Published Nov 7, 2023 • 32

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 65
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162

about 19 hours ago

MiCo: Multi-image Contrast for Reinforcement Visual Reasoning

Paper • 2506.22434 • Published Jun 27, 2025 • 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning

Paper • 2507.13348 • Published Jul 17, 2025 • 79
RewardDance: Reward Scaling in Visual Generation

Paper • 2509.08826 • Published Sep 10, 2025 • 73
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs

Paper • 2510.18876 • Published Oct 21, 2025 • 37

ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems

Paper • 2503.20756 • Published Mar 26, 2025 • 7
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 99
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 217
Qwen3-Omni Technical Report

Paper • 2509.17765 • Published Sep 22, 2025 • 153

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Paper • 2411.03562 • Published Nov 5, 2024 • 69
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning

Paper • 2502.06060 • Published Feb 9, 2025 • 38
MLGym: A New Framework and Benchmark for Advancing AI Research Agents

Paper • 2502.14499 • Published Feb 20, 2025 • 195
SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20, 2025 • 100

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs