Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2508.18265

Foundational & Modern AI Research (Curated)

A curated selection of foundational and modern AI research papers that meaningfully influence how real-world AI systems are designed, evaluated, and g

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Paper • 2210.04186 • Published Oct 9, 2022

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23, 2025 • 30

List of VLM papers

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21, 2025 • 68

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1, 2025 • 123
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28, 2025 • 118
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 119
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

InternVL3.5-Core

This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
OpenGVLab/InternVL3_5-241B-A28B-HF

Image-Text-to-Text • 241B • Updated Sep 8, 2025 • 44 • 11
OpenGVLab/InternVL3_5-38B-HF

Image-Text-to-Text • 38B • Updated Sep 8, 2025 • 2.12k • 6
OpenGVLab/InternVL3_5-30B-A3B-HF

Image-Text-to-Text • 31B • Updated Sep 8, 2025 • 346 • 6

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30, 2025 • 101
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

Foundational & Modern AI Research (Curated)

A curated selection of foundational and modern AI research papers that meaningfully influence how real-world AI systems are designed, evaluated, and g

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 121
Scaling Laws for Neural Language Models

Paper • 2001.08361 • Published Jan 23, 2020 • 10
Training Compute-Optimal Large Language Models

Paper • 2203.15556 • Published Mar 29, 2022 • 11
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

Paper • 2210.04186 • Published Oct 9, 2022

Apriel-1.5-15b-Thinker

Paper • 2510.01141 • Published Oct 1, 2025 • 123
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Paper • 2509.21268 • Published Sep 25, 2025 • 104
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model

Paper • 2509.00676 • Published Aug 31, 2025 • 85
Visual Representation Alignment for Multimodal Large Language Models

Paper • 2509.07979 • Published Sep 9, 2025 • 84

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 308
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective

Paper • 2509.18905 • Published Sep 23, 2025 • 30

rStar2-Agent: Agentic Reasoning Technical Report

Paper • 2508.20722 • Published Aug 28, 2025 • 118
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19, 2025 • 119
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

List of VLM papers

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7, 2025 • 207
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models

Paper • 2504.15271 • Published Apr 21, 2025 • 68

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17, 2025 • 51
Qwen3 Technical Report

Paper • 2505.09388 • Published May 14, 2025 • 339
Qwen-Image Technical Report

Paper • 2508.02324 • Published Aug 4, 2025 • 274
DINOv3

Paper • 2508.10104 • Published Aug 13, 2025 • 305

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 142
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 162
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Paper • 2508.13167 • Published Aug 6, 2025 • 129

InternVL3.5-Core

This collection includes only the InternVL3.5 checkpoints that have completed the full training pipeline (i.e., Pretraining, SFT, MPO, Cascade RL).

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218
OpenGVLab/InternVL3_5-241B-A28B-HF

Image-Text-to-Text • 241B • Updated Sep 8, 2025 • 44 • 11
OpenGVLab/InternVL3_5-38B-HF

Image-Text-to-Text • 38B • Updated Sep 8, 2025 • 2.12k • 6
OpenGVLab/InternVL3_5-30B-A3B-HF

Image-Text-to-Text • 31B • Updated Sep 8, 2025 • 346 • 6

InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

A Survey of Context Engineering for Large Language Models

Paper • 2507.13334 • Published Jul 17, 2025 • 263
GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 135
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents

Paper • 2507.22827 • Published Jul 30, 2025 • 101
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency

Paper • 2508.18265 • Published Aug 25, 2025 • 218

Previous
1
2
3
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs