# S1-DeepResearch: End-to-End Models for Long-Horizon Deep Research [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=for-the-badge)](./LICENSE) [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-S1--DeepResearch--15k-0040A1?style=for-the-badge)](https://huggingface.co/datasets/ScienceOne-AI/S1-DeepResearch-15k) [![HuggingFace](https://img.shields.io/badge/🤗%20HuggingFace-S1--DeepResearch--32B-ffd21e?style=for-the-badge)](https://huggingface.co/ScienceOne-AI/S1-DeepResearch-32B) [![ModelScope](https://img.shields.io/badge/🤖%20ModelScope-S1--DeepResearch--32B-mediumpurple?style=for-the-badge)](https://modelscope.cn/models/ScienceOne-AI/S1-DeepResearch-32B) English | [中文](./README_zh.md)

## 🔥 News & Updates - **[2026/04/04]** 🎉 We release [**S1-DeepResearch-32B**](https://huggingface.co/ScienceOne-AI/S1-DeepResearch-32B), an end-to-end agentic model for long-horizon deep research, with stronger emphasis on **real-world deployment**—beyond **long-chain complex reasoning**, it focuses on **deep-research instruction following**, **deep research report writing**, **file understanding and generation**, and **skills using**. On **20 agentic capability benchmarks**, it **outperforms the base model Qwen3-32B by a clear margin across the board**, and overall performance is close to mainstream closed-source flagship models (**GPT 5.2**, **Claude 4.6**, **GLM-5**). Inference code and the [**15K agent training trajectory dataset**](https://huggingface.co/datasets/ScienceOne-AI/S1-DeepResearch-15k) (a subset of the full training data) are released together. - **[2025/12/31]** We open-sourced [**S1-DeepResearch-8B-Preview**](https://huggingface.co/ScienceOne-AI/S1-DeepResearch-8B-Preview), focusing on **general long-chain complex reasoning** and exploring what is feasible in deep research at a smaller parameter scale. ## 📝 Overview **S1-DeepResearch-32B** is an end-to-end model developed by the ScienceOne AI for **long-horizon deep research**. Its core capabilities span **five dimensions**: - **Long-chain complex reasoning**: Supports sustained reasoning and action across multi-stage, multi-hop tasks, going beyond single-step Q&A. Through cross-document retrieval, evidence aggregation, state memory, and policy iteration, it plans paths, integrates information, and converges results in complex settings, keeping the reasoning process stable and conclusions reliable. - **Deep research instruction following**: Parses multi-constraint instructions in deep research scenarios and builds an instruction-understanding paradigm along the full research chain—**task definition → mechanisms → tool execution → result presentation**—with coordinated constraints across cognition, artifacts, execution, and environment so complex tasks stay controllable, processes predictable, and outputs aligned with intent. - **Deep research report writing**: Produces arguable, citable report-style outputs on top of information integration; organizes multi-source material and evidence checks while balancing structure, readability, and traceability—suited for scientific writing and decision support. - **File understanding and generation**: Covers PDFs, tables, web pages, and other modalities for input understanding, plus structured, deliverable outputs. In multi-turn tool-augmented interaction, it keeps semantics and execution aligned, closing the loop **parse → process → generate** and reducing repetitive manual work in research and data-heavy workflows. - **Skills Using**: Organizes literature search, data analysis, experiment design, computational modeling, visualization, report generation, and more as callable modules, dynamically assembled and progressively loaded toward task goals, supporting continuous workflows from data acquisition to presentation. ### ✨ Key Features - **Ultra-long context modeling**: A **128K** context window lets a single session hold longer evidence chains and multi-turn interaction history, suited to long-horizon research tasks. - **Long-horizon tool calling**: Stably runs **150+** consecutive tool-call rounds, building reasoning-driven tool orchestration and a decision closed loop—enabling continuous planning, execution, and self-correction across multi-stage tasks. - **Native tool ecosystem**: **9** built-in common tools (e.g., search, web browsing, code execution, command line) ready to use out of the box. ## 🚀 Model Download
| Model | Parameters | Context length | Download | | :---: | :---: | :---: | :---: | | **S1-DeepResearch-32B** | 32B | 128k | [🤗 HuggingFace](https://huggingface.co/ScienceOne-AI/S1-DeepResearch-32B) \| [🤖 ModelScope](https://modelscope.cn/models/ScienceOne-AI/S1-DeepResearch-32B) | | **S1-DeepResearch-8B-Preview** | 8B | 128k | [🤗 HuggingFace](https://huggingface.co/ScienceOne-AI/S1-DeepResearch-8B-Preview) \| [🤖 ModelScope](https://modelscope.cn/models/ScienceOne-AI/S1-DeepResearch-8B-Preview) |
## 📊 Evaluation We systematically evaluated **S1-DeepResearch-32B** on **20 agentic capability benchmarks** grouped into **5 dimensions** aligned with the five capability areas: - **Long-chain complex reasoning**: Text—GAIA (text), BrowseComp, BrowseComp-ZH, XBench-DeepSearch, HLE (text); vision-language—LiveVQA, MM-Search, BrowseComp-VL, RealX-Bench, HLE-VL, MM-BrowseComp. - **Deep research instruction following**: ComplexBench, DeepResearchIF (in-house). - **Deep research report writing**: DeepResearch Bench, DeepResearch Bench II, Research Rubrics. - **File understanding and generation**: GAIA (file), GTA, FileSys (in-house). - **Skills Using**: SkillsUse (in-house).
S1-DeepResearch-32B vs. base and closed-source flagships on 20 agentic benchmarks
**S1-DeepResearch-32B** gains a **clear advantage** over the base **Qwen3-32B** and the larger **Qwen3-235B** on all listed benchmarks; on in-house leaderboards for deep-research instruction following, file understanding and generation, and skill invocation, it also **surpasses Qwen3.5-397B**. Overall performance is close to mainstream closed-source flagships (**GPT 5.2**, **Claude 4.6**, **GLM-5**, **Kimi-K2.5**). Public benchmarks and internal tasks are mutually consistent, indicating that S1-DeepResearch-32B is **ready for real business deployment**. ## 📂 Example Cases Below is an example of **S1-DeepResearch-32B** using skills: during materials modeling, the model first invokes the scientific skill `scientific-skills/pymatgen` for domain knowledge, then follows the skill guidance to run modeling with `pymatgen` and outputs a CIF file.
English scientific skills collaboration example
More cases will be added under the `cases/` directory. ## 🚀 Quick Start ### Environment setup 1. **Install dependencies**: ```bash pip install -r requirements.txt ``` 2. **Docker setup** The project provides official pre-built Docker images for fast deployment. There are two core images: - **toolkits-api**: Main tool-service container (exposes API capabilities) - **code-sandbox**: Code-execution sandbox image (created on demand by the service for isolated runs) Execution-oriented tools (`execute_code`, `bash`) use **Docker-outside-of-Docker (DooD)**: by mounting the host Docker socket, the tool container talks to the host Docker daemon and creates isolated sandbox containers as needed. **Image tags:** ```text ghcr.io/wenge-research/toolkits-api:v2.0.260403 ghcr.io/wenge-research/code-sandbox:v1.0.260403 ``` **Pull images:** ```text docker pull ghcr.io/wenge-research/toolkits-api:v2.0.260403 docker pull ghcr.io/wenge-research/code-sandbox:v1.0.260403 ``` **Run the container** Mount `src/config.yaml`, the Docker socket (for sandbox execution), and optionally log and cache directories: ```bash docker run -d \ --name toolkits-api \ --network host \ -e API_PORT=8080 \ -e API_WORKERS=4 \ -e HOST_LOG_DIR=$(pwd)/logs \ -e SANDBOX_MODE=docker \ -e HTTP_PROXY=http://your-proxy:port \ -e HTTPS_PROXY=http://your-proxy:port \ -e PROXY_URL=http://your-proxy:port \ -v /etc/localtime:/etc/localtime:ro \ -v /etc/timezone:/etc/timezone:ro \ -v /var/run/docker.sock:/var/run/docker.sock \ -v $(pwd)/src/config.yaml:/app/src/config.yaml \ -v $(pwd)/logs:/app/logs \ -v $(pwd)/cache:/app/cache \ ghcr.io/wenge-research/toolkits-api:v2.0.260403 ``` **Parameter reference** | Flag / env | Description | |------|------| | `-e API_PORT` | Listen port, default 8080 | | `-e API_WORKERS` | Number of worker processes; tune for concurrency, default 1 | | `-e SANDBOX_MODE=docker` | Enable Docker sandbox mode (otherwise subprocess) | | `-e HOST_LOG_DIR` | Host log directory for sandbox mounts when Docker sandbox is enabled | | `-e HTTP_PROXY / HTTPS_PROXY / PROXY_URL` | Proxy settings (optional) | | `--network host` | Use if you rely on a proxy bound on the host (optional) | | `-v /etc/localtime:/etc/localtime:ro` | Sync host timezone (read-only) | | `-v /etc/timezone:/etc/timezone:ro` | Sync host timezone file (read-only) | | `-v /var/run/docker.sock` | Required for Docker sandbox mode to schedule sandbox containers | | `-v config.yaml` | Mount config (API keys, model and sandbox settings) | | `-v logs` | Mount log directory (optional) | | `-v cache` | Mount cache directory; structure mirrors `/app/cache` inside the container (optional) | 3. **Configure the tool service URL** Prefer JSON config or environment variables to override defaults. Avoid editing `utils/configs.py` directly. **Option A (recommended): local JSON** Copy from the example and edit locally: ```bash cp utils/config/config.example.json utils/config/config.local.json ``` Set the tool service base URL in `utils/config/config.local.json`, for example: ```json { "TOOLS_SERVER_BASE_ENDPOINT_URL": [ "http://127.0.0.1:8080" ] } ``` **Option B: environment variables** Point to a config file or override individual keys: ```bash export S1_DR_CONFIG_JSON="utils/config/config.local.json" # or override TOOLS_SERVER_BASE_ENDPOINT_URL only export TOOLS_SERVER_BASE_ENDPOINT_URL='["http://127.0.0.1:8080"]' ``` 4. **API keys** Prefer `utils/config/config.local.json` for provider keys, or mirror the same names with environment variables: ```json { "AIHUBMIX_KEY": "", "AZURE_KEY": "", "VOLCANO_KEY": "", "ALIYUN_KEY": "" } ``` Environment variables: ```bash export AIHUBMIX_KEY="" export AZURE_KEY="" export VOLCANO_KEY="" export ALIYUN_KEY="" ``` ### Single-query inference ```python import asyncio from server.llm_api import LLMClient from server.tool_api import return_all_tools from inference.run_single_inference import run_one_query from utils.prompts import DEEPRESEARCH_SYSTEM_PROMPT async def main(): llm_client_urls = ["http://127.0.0.1:10777/v1/chat/completions"] llm_client_models = ["S1-DeepResearch-32B"] llm_client = LLMClient(llm_client_urls, llm_client_models) all_tools = return_all_tools() result = await run_one_query( llm=llm_client, user_query="阿里巴巴成立时,18位创始团队成员中,姓马、姓蔡、姓张的创始人的平均年龄,保留一位小数", file_path=[], system=DEEPRESEARCH_SYSTEM_PROMPT, max_rounds=15, temperature=0.4, top_p=0.95, extra_payload={}, debug=True, all_tools=all_tools, system_format="deep_research", log_label="quick_start_single", ) final_answer = result[-1]["final_answer"] if result else "" print(final_answer) if __name__ == "__main__": asyncio.run(main()) ``` Notes: - `file_path` must be a `list` in the current implementation (e.g. `[]` or `['/path/a.pdf']`). - `system_format` options: `deep_research`, `azure`, `aihubmix`, `aihubmix_claude`, `aihubmix_glm`, `volcano`, `aliyun`. ### Batch inference Local / vLLM: ```bash cd inference cp run_batch_inference_demo.sh run_batch_local.sh # Edit run_batch_local.sh (LLM_CLIENT_URLS, LLM_CLIENT_MODELS, TEST_DATA_FILE, etc.) bash run_batch_local.sh ``` Hosted APIs: ```bash cd inference cp run_batch_inference_online_demo.sh run_batch_online.sh # Edit run_batch_online.sh (LLM_CLIENT_URLS, LLM_CLIENT_MODELS, SYSTEM_FORMAT, etc.) bash run_batch_online.sh ``` Logs: ```bash tail -f run_logs/*.log ``` 📖 **[Advanced usage](./inference/README.md)**. ## 🔭 Future Work - **S1-DeepResearch Paper:** We expect to release the paper within about two weeks, covering data synthesis for the five capability areas, training and inference design, test-time scaling, and key evaluation takeaways. - **S1-DeepResearch-VL:** In the first half of 2026, we plan to release **S1-DeepResearch-VL** with vision understanding and cross-modal reasoning for richer research-style tasks. ## 📜 License This project is licensed under the **[Apache License 2.0](./LICENSE)**. ## Citation If S1-DeepResearch is useful to your work, please consider citing: ```bibtex @software{s1deepresearch2026, title={S1-DeepResearch: End-to-End Deep Research Models}, author={ScienceOne Team}, year={2026}, url={https://github.com/ScienceOne-AI/S1-DeepResearch}, } ```