-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2508.18265
-
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
Paper • 2509.18905 • Published • 30
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 207 -
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Paper • 2504.15271 • Published • 68
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 142 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162 -
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 123 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 104 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 85 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84
-
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 119 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
OpenGVLab/InternVL3_5-241B-A28B-HF
Image-Text-to-Text • 241B • Updated • 44 • 11 -
OpenGVLab/InternVL3_5-38B-HF
Image-Text-to-Text • 38B • Updated • 2.12k • 6 -
OpenGVLab/InternVL3_5-30B-A3B-HF
Image-Text-to-Text • 31B • Updated • 346 • 6
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 135 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 101 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218
-
Attention Is All You Need
Paper • 1706.03762 • Published • 121 -
Scaling Laws for Neural Language Models
Paper • 2001.08361 • Published • 10 -
Training Compute-Optimal Large Language Models
Paper • 2203.15556 • Published • 11 -
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
Paper • 2210.04186 • Published
-
Apriel-1.5-15b-Thinker
Paper • 2510.01141 • Published • 123 -
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources
Paper • 2509.21268 • Published • 104 -
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model
Paper • 2509.00676 • Published • 85 -
Visual Representation Alignment for Multimodal Large Language Models
Paper • 2509.07979 • Published • 84
-
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 308 -
Qwen3 Technical Report
Paper • 2505.09388 • Published • 339 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective
Paper • 2509.18905 • Published • 30
-
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 118 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162 -
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR
Paper • 2508.14029 • Published • 119 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
SmolVLM: Redefining small and efficient multimodal models
Paper • 2504.05299 • Published • 207 -
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models
Paper • 2504.15271 • Published • 68
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 142 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 162 -
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
Paper • 2508.13167 • Published • 129
-
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218 -
OpenGVLab/InternVL3_5-241B-A28B-HF
Image-Text-to-Text • 241B • Updated • 44 • 11 -
OpenGVLab/InternVL3_5-38B-HF
Image-Text-to-Text • 38B • Updated • 2.12k • 6 -
OpenGVLab/InternVL3_5-30B-A3B-HF
Image-Text-to-Text • 31B • Updated • 346 • 6
-
A Survey of Context Engineering for Large Language Models
Paper • 2507.13334 • Published • 263 -
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 135 -
ScreenCoder: Advancing Visual-to-Code Generation for Front-End Automation via Modular Multimodal Agents
Paper • 2507.22827 • Published • 101 -
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Paper • 2508.18265 • Published • 218