δΈζ | English
S1-DeepResearch Inference Framework
Key Features
- Multiple LLM clients: Supports vLLM, Azure OpenAI, AIHubMix, and other LLM services
- Rich toolset: Nine tools covering search, web browsing, file parsing, code execution, multimodal Q&A, bash, and more
- Batch inference: Concurrent batch inference with resume-from-checkpoint and periodic result saving
- Single-query inference: Detailed debugging and testing for individual queries
- Load balancing: Multi-node LLM load balancing and consistent scheduling
- Detailed logging: Per-query log files for easier troubleshooting and analysis
Project Layout (current)
./
βββ run_batch_inference_demo.sh # Local / vLLM script template
βββ run_batch_inference_online_demo.sh # Online platform script template
βββ inference/
β βββ run_batch_inference.py
β βββ run_single_inference.py
βββ server/
βββ tool_kits/
βββ utils/
β βββ config/
β βββ config.example.json
β βββ README.md
βββ models/tokenizer/
βββ test_all_tools.py
Quick Start
1. Install dependencies
pip install -r requirements.txt
2. Configuration (JSON or environment variables recommended)
Precedence: custom JSON > environment variables > defaults in utils/config.py.
Typical workflow:
cp utils/config/config.example.json utils/config/config.local.json
Edit config.local.json as needed, for example:
TOOLS_SERVER_BASE_ENDPOINT_URLAIHUBMIX_KEY/AZURE_KEY/VOLCANO_KEY/ALIYUN_KEYCLIENT_TIMEOUT
You can also override via environment variables, for example:
export S1_DR_CONFIG_JSON="utils/config/config.local.json"
3. Prepare input JSONL
Each line is one JSON object. At minimum include question; usually also id and file_path.
3.1 JSONL example (file inputs)
{"id":"query_001","question":"When Alibaba was founded, what was the average age of the founders whose surnames are Ma, Cai, or Zhang among the 18 co-founders? Round to one decimal place.","file_path":[]}
{"id":"query_002","question":"According to the manual, for DJI's heaviest AIR-series drone by takeoff weight, how many mAh of battery energy remain after flying half a marathon? (Note 1: assume calm air; minimum energy use is flying at 60% of max speed. Note 2: power draw can be converted from max flight time.)","file_path":["/path/to/file.pdf"]}
3.2 JSONL example (using Skills)
{"id":"query_003","question":"Use pymatgen to build a simple TiO2 surface slab. Please generate a common low-index surface, report the Miller index, slab thickness, and vacuum size, and briefly describe the resulting surface structure.","skills":[{"name": "skill_name1", "description": "description1", "skill_path": "skill_path1"}, {"name": "skill_name2", "description": "description2", "skill_path": "skill_path2"}]}
Recommended workflow: copy a script, then run
A. Local / vLLM (run_batch_inference_demo.sh)
cp run_batch_inference_demo.sh run_batch_local.sh
mkdir -p run_logs
# Edit parameters inside run_batch_local.sh
bash run_batch_local.sh
Notes:
- The script starts Python with
nohup ... &and prints the background PID. - Tail logs:
tail -f run_logs/run.log
B. Online platform (run_batch_inference_online_demo.sh)
cp run_batch_inference_online_demo.sh run_batch_online.sh
mkdir -p run_logs
# Edit parameters inside run_batch_online.sh
bash run_batch_online.sh
Notes:
- Focus on:
LLM_CLIENT_URLS,LLM_CLIENT_MODELS,SYSTEM_FORMAT - Tail logs:
tail -f run_logs/run_batch_*.log
Script parameters
Basic
LLM_CLIENT_URLS: Model service URLs, space-separated (paired with the model list)LLM_CLIENT_MODELS: Model names, space-separatedTEST_DATA_FILE: Input JSONL pathOUTPUT_FILE: Output file whenROLLOUT_NUM=1OUTPUT_DIR: Output directory whenROLLOUT_NUM>1(e.g.rollout_01.jsonl, β¦)ROLLOUT_NUM: Number of rollouts per sampleRESUME_FROM_FILE: Resume checkpoint file (may be empty)AVAILABLE_TOOLS: Enabled tools, space-separatedTASK_TYPE: Whether to treat input as text-only; defaultinput_only
Inference control
MAX_ROUNDS: Max rounds per queryCONCURRENCY_WORKERS: Number of concurrent workersSAVE_BATCH_SIZE: Flush results to disk every N samplesTEMPERATURE: Sampling temperatureTOP_P: Top-p (included inrun_batch_inference_demo.sh)EXTRA_PAYLOAD: Extra model payload (JSON string; included inrun_batch_inference_demo.sh)TIMEOUT_FOR_ONE_QUERY: Per-query timeout (seconds)LLM_API_RETRY_TIMES: Retries after LLM failure (not counting the first attempt)SYSTEM_PROMPT: Custom system prompt; empty uses the built-in defaultSYSTEM_FORMAT: Platform format (mainly inrun_batch_inference_online_demo.sh)
Context truncation
DISCARD_ALL_MODE: Enable discard-all (true/false)MODEL_MAX_CONTEXT_TOKENS: Model max context lengthDISCARD_RATIO: Threshold ratio to trigger discardTOKENIZER_PATH: Path to tokenizer used for token counting
Logging
LOG_LABEL: Log label; directory shapelogs/YYYY_MM_DD_<LOG_LABEL>/LOG_FILE: Script log file underrun_logs/*.logLOGGING_ROOT: Log root (set inrun_batch_inference_demo.sh; may be empty)
SYSTEM_FORMAT values
SYSTEM_FORMAT selects platform-specific handling via keyword branches.
deep_research: Local deep-research format (vLLM deployment)azure: Azure OpenAIaihubmix: AIHubMix (OpenAI-compatible)aihubmix_claude: AIHubMix Claude formataihubmix_glm: AIHubMix GLM formatvolcano: Volcano Enginealiyun: Alibaba Cloud Bailian format
Currently available tools (9)
wide_search: General web search via Serp; multiple queries in one roundscholar_search: Google Scholar academic search (+ web results)image_search: Image search; multiple queries supportedwide_visit: Visit pages and summarize toward agoalfile_wide_parse: Parse local/remote files (PDF, DOCX, MD, CSV, etc.)execute_code: Run Python codeask_question_about_image: Image understanding and Q&Aask_question_about_video: Video understanding and Q&Abash: Run shell commands
Tool schemas are defined in DEEPRESEARCH_SYSTEM_PROMPT in utils/prompts.py.
Outputs and logs
Output JSONL fields
Each line written by run_batch_inference.py contains:
time_stamp: Write time for that row (YYYY-MM-DD HH:MM:SS).query_id: Batch-level query id (hash ofquestion).query: This rowβsquestiontext.result: Detailed result object for one segment (fromrun_single_inference.py).status:success/timeout/error.discard_segments: Segments truncated by discard-all and summarized (excluding the final segment).elapsed_sec: Total seconds for this rollout of the query.rollout_idx: Rollout index (1-based).src: Full original input line (often includesid,question,file_path, skills, etc.).segment_idx: Current segment index (1-based).segment_total: Total segments for this query;0if there is no validresult.
Common fields inside result (run_single_inference.py):
query_id: Single-run instance id (includes a time suffix).tools: Enabled tool schemas (string form).messages: Messages for model reasoning and tool interaction.final_answer: Answer text for this segment.transcript: Fuller trajectory (including tool returns).rounds: Rounds executed in this segment.stopped_reason: Why it stopped (e.g.no_tool_calls,discard_all_01,discard_all_final,max_rounds_exceeded).error: Present only on failure.
Log directories
Default layout when LOGGING_ROOT is empty:
logs/
βββ YYYY_MM_DD_<LOG_LABEL>/
βββ collect.log
βββ <query_id>/
βββ run.log
βββ result.json
Tool tests
Run the tool test script:
python test_all_tools.py
This exercises all registered tools and checks that basic behavior works.