LiteResearcher-4B

๐ŸŒ Project Page โ€ข ๐Ÿ’ป Code โ€ข ๐Ÿ“„ Paper (Coming Soon)

LiteResearcher-4B is a 4B-parameter deep research agent trained via scalable agentic reinforcement learning. Despite its small size, it matches Claude-4.5-Sonnet on GAIA and outperforms open-source models up to 8ร— larger.

Key Results

Benchmark LiteResearcher-4B Notable Comparison
GAIA-Text 71.3% = Claude-4.5-Sonnet (71.2%)
Xbench-DS 78.0% > Tongyi DeepSearch 30B (75.0%)
Frames 83.1% > Claude-4-Sonnet (80.7%)
WebWalkerQA 72.7% > Tongyi DeepSearch 30B (72.2%)

All with only 4B parameters โ€” 8โ€“32ร— smaller than comparable models.

Model Details

  • Architecture: Qwen3ForCausalLM (Qwen3-4B-Thinking base)
  • Parameters: 4B
  • Max Context: 262,144 tokens
  • Training: Two-stage difficulty-aware curriculum RL with virtual world environment
  • Agent Mode: ReAct-style with search and visit tools

How It Works

LiteResearcher operates as a ReAct agent that iteratively:

  1. Thinks about what information is needed
  2. Searches the web via Google
  3. Visits webpages to extract evidence
  4. Answers when sufficient information is gathered

The model uses <think>, <tool_call>, and <answer> tags to structure its reasoning.

Quick Start

With the Inference Framework

git clone https://github.com/Wanli-Lee/LiteResearcher.git
cd LiteResearcher
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your SERPER_KEY_ID and SCRAPEDO_API_KEY

# Start SGLang server
python -m sglang.launch_server \
    --model-path wanlilll/LiteResearcher-4B \
    --port 6001 --tp 2

# Run inference
bash scripts/run_all.sh \
    --model wanlilll/LiteResearcher-4B \
    --dataset data/example.jsonl

Direct Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "wanlilll/LiteResearcher-4B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto", device_map="auto")

messages = [
    {"role": "system", "content": "You are a deep research assistant..."},
    {"role": "user", "content": "Who won the Nobel Prize in Physics in 2024?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=4096, temperature=0.6, top_p=0.95)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Training

LiteResearcher is trained with a three-component framework:

  1. Co-constructed Training Data & Corpus โ€” 32M+ webpages, 1M+ domains, covering five atomic search capabilities (direct retrieval, aggregation, enumeration, cross-verification, statistics)
  2. Stable Local Tool Environment โ€” Local search engine (BGE-M3 + Milvus) and local browse tool (PostgreSQL) enabling 73.2M tool calls during training at zero marginal cost
  3. Difficulty-Aware Curriculum RL โ€” Multi-stage training that progressively increases task difficulty and context length

Benchmark Results

LiteResearcher-4B consistently outperforms open-source models up to 8ร— larger and matches or exceeds proprietary systems across eight benchmarks.

Model Size GAIA BrowseComp (en) BrowseComp (zh) Humanity Frames WebWalkerQA MAIA Xbench-DS
Commercial Models
Claude-4-Sonnet - 68.3 12.2 29.1 20.3 80.7 61.7 - 64.6
Claude-4.5-Sonnet - 71.2 19.6 40.8 24.5 85.0 - 53.4 66.0
DeepSeek-V3.2 - 63.5 67.6 65.0 40.8 80.2 - 38.5 71.0
DeepSeek-V3.1 - 63.1 30.0 49.2 29.8 83.7 61.2 - 71.0
Minimax-M2 - 75.7 44.0 48.5 31.8 - - - 72.0
OpenAI-GPT-5-high - 76.4 54.9 65.0 35.2 - - 51.4 77.8
GLM-4.6 - 71.9 45.1 49.5 30.4 - - - 70.0
Kimi-Researcher - - - - 26.9 78.8 - 36.0 69.0
Kimi-K2-0905 - 60.2 7.4 22.2 21.7 58.1 - 25.2 61.0
Open-Source Models
Mirothinker 8B 66.4 31.1 40.2 21.5 80.6 60.6 40.4 60.6
Tongyi DeepSearch 30B 70.9 43.4 46.7 32.9 90.6 72.2 - 75.0
ASearcher QWQ v2 32B 58.7 - - - 74.5 - - 51.1
WebSailor 30B 53.2 - - - - - - 53.3
WebDancer (QwQ) 32B 51.5 3.8 18.0 - - 47.9 - 38.3
WebExplorer 8B 50.0 15.7 32.0 17.3 75.7 62.7 - 53.7
DeepMiner 32B 58.7 33.5 40.1 - - - - 62.0
AFM-RL 32B 55.3 11.1 - 18.0 - 63.0 - -
SFR-DeepResearch 20B 66.0 - - 28.7 82.8 - - -
AgentCPM-Explore 4B 63.9 24.1 29.1 19.1 82.7 68.1 40.5 70.0
LiteResearcher 4B 71.3 27.5* 32.5* 22.0 83.1 72.7 41.8 78.0

Best open-source results in bold. Results with * use a 64k context window with a memory mechanism.

Citation

@article{li2026literesearcher,
  title={LiteResearcher: A Scalable Agentic RL Training Framework for Deep Research Agent},
  author={Wanli Li and Bince Qu and Bo Pan and Jianyu Zhang and Zheng Liu and Pan Zhang and Wei Chen and Bo Zhang},
  year={2026}
}

License

This model is released under the Apache 2.0 License.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support