Multi-Step Workflow Architecture
For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.
Why Multi-Step?
Single-prompt tool calling fails when:
- There are 5+ tools and the LLM gets confused about which to use
- The task requires sequential operations (discover β configure β validate)
- You need to enforce that certain tools are called before others
- Validation must happen before the LLM returns a final answer
Architecture Overview
βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
β Step 1: β β Step 2: β β Step 3: β
β Discovery βββββ>β Configuration βββββ>β Assembly β
β β β β β β
β Tools: β β Tools: β β Tools: β
β - search β β - get_details β β - assemble β
β - list β β - validate_minimal β β - validate_full β
β - get_info β β - validate_full β β - deploy β
β β β β β β
β Output: β β Output: β β Output: β
β What to use β β How to configure β β Final result β
βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
Key Patterns
1. Isolated Tool Sets
Each step only sees relevant tools:
registry.register(name="search", function=search_fn, steps=[1])
registry.register(name="get_details", function=details_fn, steps=[1, 2])
registry.register(name="validate", function=validate_fn, steps=[2])
Why: Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.
2. Pydantic Schema Validation
Every LLM response is validated against a Pydantic schema:
class StepResponse(BaseModel):
success: Optional[bool] = None
result: Optional[Dict] = None
tool_calls: Optional[List[ToolCall]] = None
# Validate structure
schema_class(**json.loads(llm_output))
Why: The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.
3. Dual-Purpose Response Schema
The same schema handles both tool call requests and final responses:
# Tool call request
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}
# Final response
{"success": true, "result": {...}, "reasoning": "..."}
Why: The LLM doesn't need to learn two different output formats. The orchestrator checks for tool_calls first, and treats anything else as a final response.
4. Validation Enforcement
The orchestrator requires certain tools to be called and pass before accepting a final response:
result = run_step(
...,
validation_tools=["validate_minimal", "validate_full"]
)
If the LLM tries to return "success" without all validations passing:
You returned a final response but validations have not all passed.
Validation Errors Found:
1. Property: channel
Message: Required field 'channel' is missing
Please fix the errors and call the validation tools again.
5. Structured Error Feedback
When a tool call fails, the error is formatted with enough detail for the LLM to fix it:
<tool_response>
<tool_name>validate</tool_name>
<status>ERROR</status>
<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
</tool_response>
IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.
6. Workflow Order Enforcement
Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:
Follow this workflow in order. Do NOT skip steps or go back.
1. Get information (ONCE)
2. Configure
3. Validate
4. If validation fails: FIX and re-validate (do NOT go back to step 1)
Iteration Budget
Each step needs multiple LLM turns:
Turn 1: Call get_details for component A (tool call)
Turn 2: Call get_details for component B (tool call)
Turn 3: Configure both components (tool call to validate)
Turn 4: Validation fails β fix errors (tool call to re-validate)
Turn 5: Validation passes β return result (final response)
Minimum: 5 iterations per step. Recommended: 10. Complex: 15.
Retry Logic
Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:
for retry in range(max_step_retries):
result = run_step(...)
if result:
break
else:
# Step failed after all retries
Recommended: 3 retries per step.
Implementation
See examples/multi_step_orchestrator.py for complete working code with:
VLLMClientβ Simple VLLM API clientToolRegistryβ Step-based tool registration and executionrun_step()β Single step execution with validation enforcementrun_workflow()β Multi-step orchestration with retry logic
When to Use Multi-Step
| Scenario | Single Prompt | Multi-Step |
|---|---|---|
| 1-2 simple tools | Yes | Overkill |
| 3-5 tools, all independent | Yes | Optional |
| 5+ tools with dependencies | No | Yes |
| Sequential operations | No | Yes |
| Validation required | No | Yes |
| Production reliability needed | No | Yes |