File size: 5,594 Bytes
aca7812
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
---
library_name: transformers
tags:
- tool-use
- agentic-rl
- environment-synthesis
- EnvFactory
license: apache-2.0
datasets:
- LARK-Lab/EnvFactory-RL
- LARK-Lab/EnvFactory-SFT-FILTERED
language:
- en
base_model:
- Qwen/Qwen3-4B
---

<h2 align="center">
  EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
</h2>

<p align="center">
  <a href="https://arxiv.org/abs/2605.18703">
    <img
      src="https://img.shields.io/badge/Paper-Arxiv-red?logo=arxiv&logoColor=red"
      alt="EnvFactory Paper on arXiv"
    />
  </a>
  <a href="https://github.com/LARK-AI-Lab/EnvFactory">
    <img 
        src="https://img.shields.io/badge/GitHub-Code-181717?logo=github&logoColor=white" 
        alt="GitHub Code"
    />
  </a>
  <a href="https://lark-ai-lab.github.io/envfactory.github.io/">
    <img 
        src="https://img.shields.io/badge/GitHub-Page-4078c0?logo=github&logoColor=white" 
        alt="GitHub Page"
    />
  </a>
  <a href="https://huggingface.co/collections/LARK-Lab/envfactory">
    <img 
        src="https://img.shields.io/badge/Datasets-Hugging%20Face%20Data-orange?logo=huggingface&logoColor=yellow" 
        alt="Datasets on Hugging Face"
    />
  </a>
  <a href="https://huggingface.co/collections/LARK-Lab/envfactory">
    <img 
        src="https://img.shields.io/badge/EnvFactory-Hugging%20Face%20Model-FFCC00?logo=huggingface&logoColor=yellow" 
        alt="EnvFactory on Hugging Face"
    />
  </a>
</p>

## Overview

We propose **EnvFactory**, a fully automated framework that addresses the challenges of equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL). EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents.

This model is the official **EnvFactory-4B** trained from [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) using SFT and RL on synthesized tool-use trajectories.

## Key Features

- **Executable Environment Synthesis**: Automatically discovers, validates, and deploys MCP-based tool environments from real-world APIs
- **Topology-Aware Trajectory Sampling**: Generates natural multi-turn tool-use trajectories that capture implicit human reasoning
- **Robust RL Training**: Uses verified environments and calibrated refinement for stable reinforcement learning
- **Scalable Architecture**: Achieves superior performance with significantly fewer environments (85 environments across 7 domains)

## Training Details

### Training Data

- **SFT Data**: [LARK-Lab/EnvFactory-SFT-FILTERED](https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-FILTERED) - 53.4k filtered trajectories
- **RL Data**: [LARK-Lab/EnvFactory-RL](https://huggingface.co/datasets/LARK-Lab/EnvFactory-RL) - 3.09k trajectories

### Training Procedure

- **SFT Stage**: Full fine-tuning using LlamaFactory with DeepSpeed ZeRO-3
- **RL Stage**: Reinforcement learning using forked VeRL framework
- **Base Model**: Qwen/Qwen3-4B
- **Training Epochs**: 1 epoch for SFT
- **Learning Rate**: 1.0e-6 with cosine scheduler
- **Batch Size**: 1 per device with gradient accumulation of 32

## Performance

Results on tool-use benchmarks compared to the base model:

| Model | BFCL Single Turn | BFCL Multi Turn | MCP-Atlas Pass Rate | MCP-Atlas Mean Cov. | τ²-Bench Avg. | VitaBench Avg. | Overall Avg. |
|-------|------------------|-----------------|---------------------|---------------------|---------------|----------------|--------------|
| Qwen3-4B (Base) | 85.15 | 33.50 | 4.12 | 12.86 | 25.25 | 7.67 | 24.09 |
| **EnvFactory-4B** | 85.46 | 48.50 | 9.97 | 21.89 | 30.13 | 16.00 | 30.77 |

## Usage

### Tool-Use Agent

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_path = "LARK-Lab/EnvFactory-4B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")

# Example tool-use conversation
messages = [
    {"role": "system", "content": "You are a helpful assistant with access to various tools."},
    {"role": "user", "content": "Search for recent papers about tool-use agents on arxiv."}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
outputs = model.generate(input_ids, max_new_tokens=1024, temperature=0.7, top_p=0.9)
response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
print(response)
```

### With MCP Tools

```python
# Load MCP tool configuration
import json

with open("configs/mcp_server.json", "r") as f:
    mcp_config = json.load(f)

# Use with your preferred MCP client
# See https://github.com/LARK-AI-Lab/EnvFactory for integration details
```

## Citation

If you find our work helpful, please consider citing:

```bibtex
@misc{xu2026envfactoryscalingtooluseagents,
      title={EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL}, 
      author={Minrui Xu and Zilin Wang and Mengyi DENG and Zhiwei Li and Zhicheng Yang and Xiao Zhu and Yinhong Liu and Boyu Zhu and Baiyu Huang and Chao Chen and Heyuan Deng and Fei Mi and Lifeng Shang and Xingshan Zeng and Zhijiang Guo},
      year={2026},
      eprint={2605.18703},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.18703}, 
}
```

## License

This model is released under the Apache 2.0 License.