LiteResearcher-4B
๐ Project Page โข ๐ป Code โข ๐ Paper (Coming Soon)
LiteResearcher-4B is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches Claude-4.5-Sonnet on GAIA and outperforms open-source models up to 8ร larger.
Key Results
| Benchmark | LiteResearcher-4B | Notable Comparison |
|---|---|---|
| GAIA-Text | 71.3% | = Claude-4.5-Sonnet (71.2%) |
| Xbench-DS | 78.0% | > Tongyi DeepSearch 30B (75.0%) |
| Frames | 83.1% | > Claude-4-Sonnet (80.7%) |
| WebWalkerQA | 72.7% | > Tongyi DeepSearch 30B (72.2%) |
All with only 4B parameters โ 8โ32ร smaller than comparable models.
Model Details
- Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
- Parameters: 4B
- Max Context: 262,144 tokens
- Training: Two-stage difficulty-aware curriculum RL with virtual world environment
- Agent Mode: ReAct-style with
searchandvisittools
How It Works
LiteResearcher operates as a ReAct agent that iteratively:
- Thinks about what information is needed
- Searches the web via Google
- Visits webpages to extract evidence
- Answers when sufficient information is gathered
The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.
Quick Start
With the Inference Framework
git clone https://github.com/Wanli-Lee/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY
# Start SGLang server
python -m sglang.launch_server \
--model-path wanlilll/LiteResearcher-4B \
--port 6001 --tp 2
# Run inference
bash scripts/run_all.sh \
--model wanlilll/LiteResearcher-4B \
--dataset data/example.jsonl
Direct Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "wanlilll/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")
messages = [
{"role": "system", "content": "You are a deep research assistant..."},
{"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
Training
LiteResearcher is trained with a three-component framework:
- Co-constructed Training Data & Corpus โ 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
- Stable Local Tool Environment โ Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
- Difficulty-Aware Curriculum RL โ Multi-stage training that progressively increases task difficulty and context length
Benchmark Results
LiteResearcher-4B consistently outperforms open-source models up to 8ร larger and matches or exceeds proprietary systems across eight benchmarks.
| Model | Size | GAIA | BrowseComp (en) | BrowseComp (zh) | Humanity | Frames | WebWalkerQA | MAIA | Xbench-DS |
|---|---|---|---|---|---|---|---|---|---|
| Commercial Models | |||||||||
| Claude-4-Sonnet | - | 68.3 | 12.2 | 29.1 | 20.3 | 80.7 | 61.7 | - | 64.6 |
| Claude-4.5-Sonnet | - | 71.2 | 19.6 | 40.8 | 24.5 | 85.0 | - | 53.4 | 66.0 |
| DeepSeek-V3.2 | - | 63.5 | 67.6 | 65.0 | 40.8 | 80.2 | - | 38.5 | 71.0 |
| DeepSeek-V3.1 | - | 63.1 | 30.0 | 49.2 | 29.8 | 83.7 | 61.2 | - | 71.0 |
| Minimax-M2 | - | 75.7 | 44.0 | 48.5 | 31.8 | - | - | - | 72.0 |
| OpenAI-GPT-5-high | - | 76.4 | 54.9 | 65.0 | 35.2 | - | - | 51.4 | 77.8 |
| GLM-4.6 | - | 71.9 | 45.1 | 49.5 | 30.4 | - | - | - | 70.0 |
| Kimi-Researcher | - | - | - | - | 26.9 | 78.8 | - | 36.0 | 69.0 |
| Kimi-K2-0905 | - | 60.2 | 7.4 | 22.2 | 21.7 | 58.1 | - | 25.2 | 61.0 |
| Open-Source Models | |||||||||
| Mirothinker | 8B | 66.4 | 31.1 | 40.2 | 21.5 | 80.6 | 60.6 | 40.4 | 60.6 |
| Tongyi DeepSearch | 30B | 70.9 | 43.4 | 46.7 | 32.9 | 90.6 | 72.2 | - | 75.0 |
| ASearcher QWQ v2 | 32B | 58.7 | - | - | - | 74.5 | - | - | 51.1 |
| WebSailor | 30B | 53.2 | - | - | - | - | - | - | 53.3 |
| WebDancer (QwQ) | 32B | 51.5 | 3.8 | 18.0 | - | - | 47.9 | - | 38.3 |
| WebExplorer | 8B | 50.0 | 15.7 | 32.0 | 17.3 | 75.7 | 62.7 | - | 53.7 |
| DeepMiner | 32B | 58.7 | 33.5 | 40.1 | - | - | - | - | 62.0 |
| AFM-RL | 32B | 55.3 | 11.1 | - | 18.0 | - | 63.0 | - | - |
| SFR-DeepResearch | 20B | 66.0 | - | - | 28.7 | 82.8 | - | - | - |
| AgentCPM-Explore | 4B | 63.9 | 24.1 | 29.1 | 19.1 | 82.7 | 68.1 | 40.5 | 70.0 |
| LiteResearcher | 4B | 71.3 | 27.5* | 32.5* | 22.0 | 83.1 | 72.7 | 41.8 | 78.0 |
Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.
Citation
@article{li2026literesearcher,
title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
year={2026}
}
License
This model is released under the Apache 2.0 License.
- Downloads last month
- -