Multi-Step Workflow Architecture

For complex tasks, single-prompt tool calling is unreliable. This guide explains the multi-step architecture that makes tool calling work at production quality.

Why Multi-Step?

Single-prompt tool calling fails when:

There are 5+ tools and the LLM gets confused about which to use
The task requires sequential operations (discover → configure → validate)
You need to enforce that certain tools are called before others
Validation must happen before the LLM returns a final answer

Architecture Overview

┌─────────────────┐     ┌─────────────────────┐     ┌──────────────────┐
│ Step 1:          │     │ Step 2:              │     │ Step 3:           │
│ Discovery        │────>│ Configuration        │────>│ Assembly          │
│                  │     │                      │     │                   │
│ Tools:           │     │ Tools:               │     │ Tools:            │
│ - search         │     │ - get_details        │     │ - assemble        │
│ - list           │     │ - validate_minimal   │     │ - validate_full   │
│ - get_info       │     │ - validate_full      │     │ - deploy          │
│                  │     │                      │     │                   │
│ Output:          │     │ Output:              │     │ Output:           │
│ What to use      │     │ How to configure     │     │ Final result      │
└─────────────────┘     └─────────────────────┘     └──────────────────┘

Key Patterns

1. Isolated Tool Sets

Each step only sees relevant tools:

registry.register(name="search", function=search_fn, steps=[1])
registry.register(name="get_details", function=details_fn, steps=[1, 2])
registry.register(name="validate", function=validate_fn, steps=[2])

Why: Reducing tool count per step dramatically improves LLM accuracy. With 15 tools, the LLM often calls the wrong one. With 3-4 tools per step, it's reliable.

2. Pydantic Schema Validation

Every LLM response is validated against a Pydantic schema:

class StepResponse(BaseModel):
    success: Optional[bool] = None
    result: Optional[Dict] = None
    tool_calls: Optional[List[ToolCall]] = None

# Validate structure
schema_class(**json.loads(llm_output))

Why: The LLM may return syntactically valid JSON that's structurally wrong (missing fields, wrong types). Pydantic catches this before it reaches your application.

3. Dual-Purpose Response Schema

The same schema handles both tool call requests and final responses:

# Tool call request
{"tool_calls": [{"name": "search", "arguments": {"q": "test"}}]}

# Final response
{"success": true, "result": {...}, "reasoning": "..."}

Why: The LLM doesn't need to learn two different output formats. The orchestrator checks for tool_calls first, and treats anything else as a final response.

4. Validation Enforcement

The orchestrator requires certain tools to be called and pass before accepting a final response:

result = run_step(
    ...,
    validation_tools=["validate_minimal", "validate_full"]
)

If the LLM tries to return "success" without all validations passing:

You returned a final response but validations have not all passed.

Validation Errors Found:
1. Property: channel
   Message: Required field 'channel' is missing

Please fix the errors and call the validation tools again.

5. Structured Error Feedback

When a tool call fails, the error is formatted with enough detail for the LLM to fix it:

<tool_response>
<tool_name>validate</tool_name>
<status>ERROR</status>
<result>{"valid": false, "errors": [{"property": "name", "message": "required"}]}</result>
</tool_response>
IMPORTANT: This tool call failed. Read the error, understand the issue, fix and retry.

6. Workflow Order Enforcement

Without explicit instructions, the LLM restarts from scratch when validation fails. The prompt must enforce:

Follow this workflow in order. Do NOT skip steps or go back.

1. Get information (ONCE)
2. Configure
3. Validate
4. If validation fails: FIX and re-validate (do NOT go back to step 1)

Iteration Budget

Each step needs multiple LLM turns:

Turn 1: Call get_details for component A     (tool call)
Turn 2: Call get_details for component B     (tool call)
Turn 3: Configure both components            (tool call to validate)
Turn 4: Validation fails — fix errors        (tool call to re-validate)
Turn 5: Validation passes — return result    (final response)

Minimum: 5 iterations per step. Recommended: 10. Complex: 15.

Retry Logic

Steps can fail (LLM returns text instead of JSON, runs out of iterations, etc.). The workflow runner retries each step:

for retry in range(max_step_retries):
    result = run_step(...)
    if result:
        break
else:
    # Step failed after all retries

Recommended: 3 retries per step.

Implementation

See examples/multi_step_orchestrator.py for complete working code with:

VLLMClient — Simple VLLM API client
ToolRegistry — Step-based tool registration and execution
run_step() — Single step execution with validation enforcement
run_workflow() — Multi-step orchestration with retry logic

When to Use Multi-Step

Scenario	Single Prompt	Multi-Step
1-2 simple tools	Yes	Overkill
3-5 tools, all independent	Yes	Optional
5+ tools with dependencies	No	Yes
Sequential operations	No	Yes
Validation required	No	Yes
Production reliability needed	No	Yes