ScienceOne-AI

Upload 61 files

816198f verified 6 days ago

preview code

raw

history blame contribute delete

8.22 kB

中文 | English

S1-DeepResearch Inference Framework

Key Features

Multiple LLM clients: Supports vLLM, Azure OpenAI, AIHubMix, and other LLM services
Rich toolset: Nine tools covering search, web browsing, file parsing, code execution, multimodal Q&A, bash, and more
Batch inference: Concurrent batch inference with resume-from-checkpoint and periodic result saving
Single-query inference: Detailed debugging and testing for individual queries
Load balancing: Multi-node LLM load balancing and consistent scheduling
Detailed logging: Per-query log files for easier troubleshooting and analysis

Project Layout (current)

./
├── run_batch_inference_demo.sh          # Local / vLLM script template
├── run_batch_inference_online_demo.sh   # Online platform script template
├── inference/
│   ├── run_batch_inference.py
│   └── run_single_inference.py
├── server/
├── tool_kits/
├── utils/
│   └── config/
│       ├── config.example.json
│       └── README.md
├── models/tokenizer/
└── test_all_tools.py

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Configuration (JSON or environment variables recommended)

Precedence: custom JSON > environment variables > defaults in utils/config.py.

Typical workflow:

cp utils/config/config.example.json utils/config/config.local.json

Edit config.local.json as needed, for example:

TOOLS_SERVER_BASE_ENDPOINT_URL
AIHUBMIX_KEY / AZURE_KEY / VOLCANO_KEY / ALIYUN_KEY
CLIENT_TIMEOUT

You can also override via environment variables, for example:

export S1_DR_CONFIG_JSON="utils/config/config.local.json"

3. Prepare input JSONL

Each line is one JSON object. At minimum include question; usually also id and file_path.

3.1 JSONL example (file inputs)

{"id":"query_001","question":"When Alibaba was founded, what was the average age of the founders whose surnames are Ma, Cai, or Zhang among the 18 co-founders? Round to one decimal place.","file_path":[]}
{"id":"query_002","question":"According to the manual, for DJI's heaviest AIR-series drone by takeoff weight, how many mAh of battery energy remain after flying half a marathon? (Note 1: assume calm air; minimum energy use is flying at 60% of max speed. Note 2: power draw can be converted from max flight time.)","file_path":["/path/to/file.pdf"]}

3.2 JSONL example (using Skills)

{"id":"query_003","question":"Use pymatgen to build a simple TiO2 surface slab. Please generate a common low-index surface, report the Miller index, slab thickness, and vacuum size, and briefly describe the resulting surface structure.","skills":[{"name": "skill_name1", "description": "description1", "skill_path": "skill_path1"}, {"name": "skill_name2", "description": "description2", "skill_path": "skill_path2"}]}

Recommended workflow: copy a script, then run

A. Local / vLLM (`run_batch_inference_demo.sh`)

cp run_batch_inference_demo.sh run_batch_local.sh
mkdir -p run_logs
# Edit parameters inside run_batch_local.sh
bash run_batch_local.sh

Notes:

The script starts Python with nohup ... & and prints the background PID.
Tail logs: tail -f run_logs/run.log

B. Online platform (`run_batch_inference_online_demo.sh`)

cp run_batch_inference_online_demo.sh run_batch_online.sh
mkdir -p run_logs
# Edit parameters inside run_batch_online.sh
bash run_batch_online.sh

Notes:

Focus on: LLM_CLIENT_URLS, LLM_CLIENT_MODELS, SYSTEM_FORMAT
Tail logs: tail -f run_logs/run_batch_*.log

Script parameters

Basic

LLM_CLIENT_URLS: Model service URLs, space-separated (paired with the model list)
LLM_CLIENT_MODELS: Model names, space-separated
TEST_DATA_FILE: Input JSONL path
OUTPUT_FILE: Output file when ROLLOUT_NUM=1
OUTPUT_DIR: Output directory when ROLLOUT_NUM>1 (e.g. rollout_01.jsonl, …)
ROLLOUT_NUM: Number of rollouts per sample
RESUME_FROM_FILE: Resume checkpoint file (may be empty)
AVAILABLE_TOOLS: Enabled tools, space-separated
TASK_TYPE: Whether to treat input as text-only; default input_only

Inference control

MAX_ROUNDS: Max rounds per query
CONCURRENCY_WORKERS: Number of concurrent workers
SAVE_BATCH_SIZE: Flush results to disk every N samples
TEMPERATURE: Sampling temperature
TOP_P: Top-p (included in run_batch_inference_demo.sh)
EXTRA_PAYLOAD: Extra model payload (JSON string; included in run_batch_inference_demo.sh)
TIMEOUT_FOR_ONE_QUERY: Per-query timeout (seconds)
LLM_API_RETRY_TIMES: Retries after LLM failure (not counting the first attempt)
SYSTEM_PROMPT: Custom system prompt; empty uses the built-in default
SYSTEM_FORMAT: Platform format (mainly in run_batch_inference_online_demo.sh)

Context truncation

DISCARD_ALL_MODE: Enable discard-all (true/false)
MODEL_MAX_CONTEXT_TOKENS: Model max context length
DISCARD_RATIO: Threshold ratio to trigger discard
TOKENIZER_PATH: Path to tokenizer used for token counting

Logging

LOG_LABEL: Log label; directory shape logs/YYYY_MM_DD_<LOG_LABEL>/
LOG_FILE: Script log file under run_logs/*.log
LOGGING_ROOT: Log root (set in run_batch_inference_demo.sh; may be empty)

`SYSTEM_FORMAT` values

SYSTEM_FORMAT selects platform-specific handling via keyword branches.

deep_research: Local deep-research format (vLLM deployment)
azure: Azure OpenAI
aihubmix: AIHubMix (OpenAI-compatible)
aihubmix_claude: AIHubMix Claude format
aihubmix_glm: AIHubMix GLM format
volcano: Volcano Engine
aliyun: Alibaba Cloud Bailian format

Currently available tools (9)

wide_search: General web search via Serp; multiple queries in one round
scholar_search: Google Scholar academic search (+ web results)
image_search: Image search; multiple queries supported
wide_visit: Visit pages and summarize toward a goal
file_wide_parse: Parse local/remote files (PDF, DOCX, MD, CSV, etc.)
execute_code: Run Python code
ask_question_about_image: Image understanding and Q&A
ask_question_about_video: Video understanding and Q&A
bash: Run shell commands

Tool schemas are defined in DEEPRESEARCH_SYSTEM_PROMPT in utils/prompts.py.

Outputs and logs

Output JSONL fields

Each line written by run_batch_inference.py contains:

time_stamp: Write time for that row (YYYY-MM-DD HH:MM:SS).
query_id: Batch-level query id (hash of question).
query: This row’s question text.
result: Detailed result object for one segment (from run_single_inference.py).
status: success / timeout / error.
discard_segments: Segments truncated by discard-all and summarized (excluding the final segment).
elapsed_sec: Total seconds for this rollout of the query.
rollout_idx: Rollout index (1-based).
src: Full original input line (often includes id, question, file_path, skills, etc.).
segment_idx: Current segment index (1-based).
segment_total: Total segments for this query; 0 if there is no valid result.

Common fields inside result (run_single_inference.py):

query_id: Single-run instance id (includes a time suffix).
tools: Enabled tool schemas (string form).
messages: Messages for model reasoning and tool interaction.
final_answer: Answer text for this segment.
transcript: Fuller trajectory (including tool returns).
rounds: Rounds executed in this segment.
stopped_reason: Why it stopped (e.g. no_tool_calls, discard_all_01, discard_all_final, max_rounds_exceeded).
error: Present only on failure.

Log directories

Default layout when LOGGING_ROOT is empty:

logs/
└── YYYY_MM_DD_<LOG_LABEL>/
    ├── collect.log
    └── <query_id>/
        ├── run.log
        └── result.json

Tool tests

Run the tool test script:

python test_all_tools.py

This exercises all registered tools and checks that basic behavior works.