OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories
Abstract
A simple supervised fine-tuning approach achieves state-of-the-art performance in deep search capabilities using minimal data, outperforming complex industrial pipelines and demonstrating the effectiveness of academic-led development in large language model agents.
Deep search capabilities have become an indispensable competency for frontier Large Language Model (LLM) agents, yet their development remains dominated by industrial giants. The typical industry recipe involves a highly resource-intensive pipeline spanning pre-training, continual pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL). In this report, we show that when fueled with informative and high-difficulty trajectories, a simple SFT approach could be surprisingly powerful for training frontier search agents. By introducing three simple data synthesis modifications: scaling knowledge graph size for richer exploration, expanding the tool set size for broader functionality, and strict low-step filtering, we establish a stronger baseline. Trained on merely 10.6k data points, our OpenSeeker-v2 achieves state-of-the-art performance across 4 benchmarks (30B-sized agents with ReAct paradigm): 46.0% on BrowseComp, 58.1% on BrowseComp-ZH, 34.6% on Humanity's Last Exam, and 78.0% on xbench, surpassing even Tongyi DeepResearch trained with heavy CPT+SFT+RL pipeline, which achieves 43.4%, 46.7%, 32.9%, and 75.0%, respectively. Notably, OpenSeeker-v2 represents the first state-of-the-art search agent within its model scale and paradigm to be developed by a purely academic team using only SFT. We are excited to open-source the OpenSeeker-v2 model weights and share our simple yet effective findings to make frontier search agent research more accessible to the community.
Community
the real clever twist here is the data curriculum itself: with informative, high-difficulty trajectories, a plain sft-only 30b model can rival heavy cpt+sft+rl stacks. they do it by enlarging the evidence subgraph (bumping k), expanding the tool set, and enforcing strict low-step filtering, all while keeping training purely sft. this combination seems to act as a curriculum that forces multi-hop reasoning under long contexts, which the 256k window and up to 200 tool calls per trajectory enable. the arxivlens breakdown helped me parse the method details—worth a read for folks trying to audit the data flow: https://arxivlens.com/PaperView/Details/openseeker-v2-pushing-the-limits-of-search-agents-with-informative-and-high-difficulty-trajectories-5035-ac8e2c31. one q: how sensitive are the gains to the exact k and tool-set size when you move to different domains?
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data (2026)
- Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design (2026)
- DR-Venus: Towards Frontier Edge-Scale Deep Research Agents with Only 10K Open Data (2026)
- SynPlanResearch-R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans (2026)
- GraphWalker: Agentic Knowledge Graph Question Answering via Synthetic Trajectory Curriculum (2026)
- LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent (2026)
- OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.04036 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper