YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
library_name: transformers license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE pipeline_tag: text-generation base_model:
- Qwen/Qwen3-8B-Base
ToolOrchestra β agentic/ToolOrchestra/
Our code hub is:https://github.com/LMIS-ORG/slime-agentic/tree/main
Reproduces the core idea of ToolOrchestra: an Orchestrator-Expert multi-agent framework for RL training. A central Orchestrator LLM learns to route tasks to the best specialized expert model and the corresponding tools through multi-turn tool calls. GRPO is applied to the Orchestrator's decision trajectory, enabling it to improve tool-use and routing capabilities without manually annotated intermediate steps.
Architecture
Input question
β
βΌ
Orchestrator LLM β Decide which tool to call (loss_mask=1)
β
βββΊ for turn in range(max_turns):
β
ββ parse_tool_call() β Parse <tool_call> from model output
β
ββ tool call β Call retrieval / external tool (loss_mask=0)
β ββ FAISS retrieval service (port 8000)
β
ββ call_expert βββββββββββββββΊ Expert LLM routing (loss_mask=0)
β ββ specialist models on separate ports
β
ββ answer βββββββββββββββββββΊ Final answer β stop loop
β
βΌ
GenerationOutput
- token_ids + log_probs (all turns concatenated)
- loss_mask: Orchestrator output = 1 / tool result = 0
Results
| Model | Dataset | Baseline (Qwen3-8B) | ToolOrchestra (Ours) | Improvement |
|---|---|---|---|---|
| Qwen3-8B | ΟΒ²-Bench | 0.278 | 0.388 | +0.110 |
- Downloads last month
- 219