ScienceOne-AI's picture
Upload 61 files
816198f verified

δΈ­ζ–‡ | English

S1-DeepResearch Inference Framework

Key Features

  • Multiple LLM clients: Supports vLLM, Azure OpenAI, AIHubMix, and other LLM services
  • Rich toolset: Nine tools covering search, web browsing, file parsing, code execution, multimodal Q&A, bash, and more
  • Batch inference: Concurrent batch inference with resume-from-checkpoint and periodic result saving
  • Single-query inference: Detailed debugging and testing for individual queries
  • Load balancing: Multi-node LLM load balancing and consistent scheduling
  • Detailed logging: Per-query log files for easier troubleshooting and analysis

Project Layout (current)

./
β”œβ”€β”€ run_batch_inference_demo.sh          # Local / vLLM script template
β”œβ”€β”€ run_batch_inference_online_demo.sh   # Online platform script template
β”œβ”€β”€ inference/
β”‚   β”œβ”€β”€ run_batch_inference.py
β”‚   └── run_single_inference.py
β”œβ”€β”€ server/
β”œβ”€β”€ tool_kits/
β”œβ”€β”€ utils/
β”‚   └── config/
β”‚       β”œβ”€β”€ config.example.json
β”‚       └── README.md
β”œβ”€β”€ models/tokenizer/
└── test_all_tools.py

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Configuration (JSON or environment variables recommended)

Precedence: custom JSON > environment variables > defaults in utils/config.py.

Typical workflow:

cp utils/config/config.example.json utils/config/config.local.json

Edit config.local.json as needed, for example:

  • TOOLS_SERVER_BASE_ENDPOINT_URL
  • AIHUBMIX_KEY / AZURE_KEY / VOLCANO_KEY / ALIYUN_KEY
  • CLIENT_TIMEOUT

You can also override via environment variables, for example:

export S1_DR_CONFIG_JSON="utils/config/config.local.json"

3. Prepare input JSONL

Each line is one JSON object. At minimum include question; usually also id and file_path.

3.1 JSONL example (file inputs)

{"id":"query_001","question":"When Alibaba was founded, what was the average age of the founders whose surnames are Ma, Cai, or Zhang among the 18 co-founders? Round to one decimal place.","file_path":[]}
{"id":"query_002","question":"According to the manual, for DJI's heaviest AIR-series drone by takeoff weight, how many mAh of battery energy remain after flying half a marathon? (Note 1: assume calm air; minimum energy use is flying at 60% of max speed. Note 2: power draw can be converted from max flight time.)","file_path":["/path/to/file.pdf"]}

3.2 JSONL example (using Skills)

{"id":"query_003","question":"Use pymatgen to build a simple TiO2 surface slab. Please generate a common low-index surface, report the Miller index, slab thickness, and vacuum size, and briefly describe the resulting surface structure.","skills":[{"name": "skill_name1", "description": "description1", "skill_path": "skill_path1"}, {"name": "skill_name2", "description": "description2", "skill_path": "skill_path2"}]}

Recommended workflow: copy a script, then run

A. Local / vLLM (run_batch_inference_demo.sh)

cp run_batch_inference_demo.sh run_batch_local.sh
mkdir -p run_logs
# Edit parameters inside run_batch_local.sh
bash run_batch_local.sh

Notes:

  • The script starts Python with nohup ... & and prints the background PID.
  • Tail logs: tail -f run_logs/run.log

B. Online platform (run_batch_inference_online_demo.sh)

cp run_batch_inference_online_demo.sh run_batch_online.sh
mkdir -p run_logs
# Edit parameters inside run_batch_online.sh
bash run_batch_online.sh

Notes:

  • Focus on: LLM_CLIENT_URLS, LLM_CLIENT_MODELS, SYSTEM_FORMAT
  • Tail logs: tail -f run_logs/run_batch_*.log

Script parameters

Basic

  • LLM_CLIENT_URLS: Model service URLs, space-separated (paired with the model list)
  • LLM_CLIENT_MODELS: Model names, space-separated
  • TEST_DATA_FILE: Input JSONL path
  • OUTPUT_FILE: Output file when ROLLOUT_NUM=1
  • OUTPUT_DIR: Output directory when ROLLOUT_NUM>1 (e.g. rollout_01.jsonl, …)
  • ROLLOUT_NUM: Number of rollouts per sample
  • RESUME_FROM_FILE: Resume checkpoint file (may be empty)
  • AVAILABLE_TOOLS: Enabled tools, space-separated
  • TASK_TYPE: Whether to treat input as text-only; default input_only

Inference control

  • MAX_ROUNDS: Max rounds per query
  • CONCURRENCY_WORKERS: Number of concurrent workers
  • SAVE_BATCH_SIZE: Flush results to disk every N samples
  • TEMPERATURE: Sampling temperature
  • TOP_P: Top-p (included in run_batch_inference_demo.sh)
  • EXTRA_PAYLOAD: Extra model payload (JSON string; included in run_batch_inference_demo.sh)
  • TIMEOUT_FOR_ONE_QUERY: Per-query timeout (seconds)
  • LLM_API_RETRY_TIMES: Retries after LLM failure (not counting the first attempt)
  • SYSTEM_PROMPT: Custom system prompt; empty uses the built-in default
  • SYSTEM_FORMAT: Platform format (mainly in run_batch_inference_online_demo.sh)

Context truncation

  • DISCARD_ALL_MODE: Enable discard-all (true/false)
  • MODEL_MAX_CONTEXT_TOKENS: Model max context length
  • DISCARD_RATIO: Threshold ratio to trigger discard
  • TOKENIZER_PATH: Path to tokenizer used for token counting

Logging

  • LOG_LABEL: Log label; directory shape logs/YYYY_MM_DD_<LOG_LABEL>/
  • LOG_FILE: Script log file under run_logs/*.log
  • LOGGING_ROOT: Log root (set in run_batch_inference_demo.sh; may be empty)

SYSTEM_FORMAT values

SYSTEM_FORMAT selects platform-specific handling via keyword branches.

  • deep_research: Local deep-research format (vLLM deployment)
  • azure: Azure OpenAI
  • aihubmix: AIHubMix (OpenAI-compatible)
  • aihubmix_claude: AIHubMix Claude format
  • aihubmix_glm: AIHubMix GLM format
  • volcano: Volcano Engine
  • aliyun: Alibaba Cloud Bailian format

Currently available tools (9)

  • wide_search: General web search via Serp; multiple queries in one round
  • scholar_search: Google Scholar academic search (+ web results)
  • image_search: Image search; multiple queries supported
  • wide_visit: Visit pages and summarize toward a goal
  • file_wide_parse: Parse local/remote files (PDF, DOCX, MD, CSV, etc.)
  • execute_code: Run Python code
  • ask_question_about_image: Image understanding and Q&A
  • ask_question_about_video: Video understanding and Q&A
  • bash: Run shell commands

Tool schemas are defined in DEEPRESEARCH_SYSTEM_PROMPT in utils/prompts.py.

Outputs and logs

Output JSONL fields

Each line written by run_batch_inference.py contains:

  • time_stamp: Write time for that row (YYYY-MM-DD HH:MM:SS).
  • query_id: Batch-level query id (hash of question).
  • query: This row’s question text.
  • result: Detailed result object for one segment (from run_single_inference.py).
  • status: success / timeout / error.
  • discard_segments: Segments truncated by discard-all and summarized (excluding the final segment).
  • elapsed_sec: Total seconds for this rollout of the query.
  • rollout_idx: Rollout index (1-based).
  • src: Full original input line (often includes id, question, file_path, skills, etc.).
  • segment_idx: Current segment index (1-based).
  • segment_total: Total segments for this query; 0 if there is no valid result.

Common fields inside result (run_single_inference.py):

  • query_id: Single-run instance id (includes a time suffix).
  • tools: Enabled tool schemas (string form).
  • messages: Messages for model reasoning and tool interaction.
  • final_answer: Answer text for this segment.
  • transcript: Fuller trajectory (including tool returns).
  • rounds: Rounds executed in this segment.
  • stopped_reason: Why it stopped (e.g. no_tool_calls, discard_all_01, discard_all_final, max_rounds_exceeded).
  • error: Present only on failure.

Log directories

Default layout when LOGGING_ROOT is empty:

logs/
└── YYYY_MM_DD_<LOG_LABEL>/
    β”œβ”€β”€ collect.log
    └── <query_id>/
        β”œβ”€β”€ run.log
        └── result.json

Tool tests

Run the tool test script:

python test_all_tools.py

This exercises all registered tools and checks that basic behavior works.