LiteResearcher-4B

🌐 Project Page • 💻 Code • 📄 Paper (Coming Soon)

LiteResearcher-4B is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches Claude-4.5-Sonnet on GAIA and outperforms open-source models up to 8× larger.

Key Results

Benchmark	LiteResearcher-4B	Notable Comparison
GAIA-Text	71.3%	= Claude-4.5-Sonnet (71.2%)
Xbench-DS	78.0%	> Tongyi DeepSearch 30B (75.0%)
Frames	83.1%	> Claude-4-Sonnet (80.7%)
WebWalkerQA	72.7%	> Tongyi DeepSearch 30B (72.2%)

All with only 4B parameters — 8–32× smaller than comparable models.

Model Details

Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
Parameters: 4B
Max Context: 262,144 tokens
Training: Two-stage difficulty-aware curriculum RL with virtual world environment
Agent Mode: ReAct-style with search and visit tools

How It Works

LiteResearcher operates as a ReAct agent that iteratively:

Thinks about what information is needed
Searches the web via Google
Visits webpages to extract evidence
Answers when sufficient information is gathered

The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.

Quick Start

With the Inference Framework

git clone https://github.com/Wanli-Lee/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

# Start SGLang server
python -m sglang.launch_server \
    --model-path wanlilll/LiteResearcher-4B \
    --port 6001 --tp 2

# Run inference
bash scripts/run_all.sh \
    --model wanlilll/LiteResearcher-4B \
    --dataset data/example.jsonl

Direct Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "wanlilll/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a deep research assistant..."},
    {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Training

LiteResearcher is trained with a three-component framework:

Co-constructed Training Data & Corpus — 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
Stable Local Tool Environment — Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
Difficulty-Aware Curriculum RL — Multi-stage training that progressively increases task difficulty and context length

Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8× larger and matches or exceeds proprietary systems across eight benchmarks.

Model	Size	GAIA	BrowseComp (en)	BrowseComp (zh)	Humanity	Frames	WebWalkerQA	MAIA	Xbench-DS
				Commercial Models
Claude-4-Sonnet	-	68.3	12.2	29.1	20.3	80.7	61.7	-	64.6
Claude-4.5-Sonnet	-	71.2	19.6	40.8	24.5	85.0	-	53.4	66.0
DeepSeek-V3.2	-	63.5	67.6	65.0	40.8	80.2	-	38.5	71.0
DeepSeek-V3.1	-	63.1	30.0	49.2	29.8	83.7	61.2	-	71.0
Minimax-M2	-	75.7	44.0	48.5	31.8	-	-	-	72.0
OpenAI-GPT-5-high	-	76.4	54.9	65.0	35.2	-	-	51.4	77.8
GLM-4.6	-	71.9	45.1	49.5	30.4	-	-	-	70.0
Kimi-Researcher	-	-	-	-	26.9	78.8	-	36.0	69.0
Kimi-K2-0905	-	60.2	7.4	22.2	21.7	58.1	-	25.2	61.0
				Open-Source Models
Mirothinker	8B	66.4	31.1	40.2	21.5	80.6	60.6	40.4	60.6
Tongyi DeepSearch	30B	70.9	43.4	46.7	32.9	90.6	72.2	-	75.0
ASearcher QWQ v2	32B	58.7	-	-	-	74.5	-	-	51.1
WebSailor	30B	53.2	-	-	-	-	-	-	53.3
WebDancer (QwQ)	32B	51.5	3.8	18.0	-	-	47.9	-	38.3
WebExplorer	8B	50.0	15.7	32.0	17.3	75.7	62.7	-	53.7
DeepMiner	32B	58.7	33.5	40.1	-	-	-	-	62.0
AFM-RL	32B	55.3	11.1	-	18.0	-	63.0	-	-
SFR-DeepResearch	20B	66.0	-	-	28.7	82.8	-	-	-
AgentCPM-Explore	4B	63.9	24.1	29.1	19.1	82.7	68.1	40.5	70.0
LiteResearcher	4B	71.3	27.5*	32.5*	22.0	83.1	72.7	41.8	78.0

Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.

Citation

@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16