RolandXMR commited on
Commit
d297288
·
verified ·
1 Parent(s): b40b5ec

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +148 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - tool-use
5
+ - agentic-rl
6
+ - environment-synthesis
7
+ - EnvFactory
8
+ license: apache-2.0
9
+ datasets:
10
+ - LARK-Lab/EnvFactory-RL
11
+ - LARK-Lab/EnvFactory-SFT-FILTERED
12
+ language:
13
+ - en
14
+ base_model:
15
+ - Qwen/Qwen3-8B
16
+ ---
17
+
18
+ <h2 align="center">
19
+ EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL
20
+ </h2>
21
+
22
+ <p align="center">
23
+ <a href="https://arxiv.org/abs/2605.18703">
24
+ <img
25
+ src="https://img.shields.io/badge/Paper-Arxiv-red?logo=arxiv&logoColor=red"
26
+ alt="EnvFactory Paper on arXiv"
27
+ />
28
+ </a>
29
+ <a href="https://github.com/LARK-AI-Lab/EnvFactory">
30
+ <img
31
+ src="https://img.shields.io/badge/GitHub-Code-181717?logo=github&logoColor=white"
32
+ alt="GitHub Code"
33
+ />
34
+ </a>
35
+ <a href="https://lark-ai-lab.github.io/envfactory.github.io/">
36
+ <img
37
+ src="https://img.shields.io/badge/GitHub-Page-4078c0?logo=github&logoColor=white"
38
+ alt="GitHub Page"
39
+ />
40
+ </a>
41
+ <a href="https://huggingface.co/collections/LARK-Lab/envfactory">
42
+ <img
43
+ src="https://img.shields.io/badge/Datasets-Hugging%20Face%20Data-orange?logo=huggingface&logoColor=yellow"
44
+ alt="Datasets on Hugging Face"
45
+ />
46
+ </a>
47
+ <a href="https://huggingface.co/collections/LARK-Lab/envfactory">
48
+ <img
49
+ src="https://img.shields.io/badge/EnvFactory-Hugging%20Face%20Model-FFCC00?logo=huggingface&logoColor=yellow"
50
+ alt="EnvFactory on Hugging Face"
51
+ />
52
+ </a>
53
+ </p>
54
+
55
+ ## Overview
56
+
57
+ We propose **EnvFactory**, a fully automated framework that addresses the challenges of equipping LLMs with tool-use capabilities via Agentic Reinforcement Learning (Agentic RL). EnvFactory autonomously explores and verifies stateful, executable tool environments from authentic resources, and synthesizes natural multi-turn trajectories through topology-aware sampling and calibrated refinement, producing grounded queries with implicit intents.
58
+
59
+ This model is the official **EnvFactory-8B** trained from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) using SFT and RL on synthesized tool-use trajectories.
60
+
61
+ ## Key Features
62
+
63
+ - **Executable Environment Synthesis**: Automatically discovers, validates, and deploys MCP-based tool environments from real-world APIs
64
+ - **Topology-Aware Trajectory Sampling**: Generates natural multi-turn tool-use trajectories that capture implicit human reasoning
65
+ - **Robust RL Training**: Uses verified environments and calibrated refinement for stable reinforcement learning
66
+ - **Scalable Architecture**: Achieves superior performance with significantly fewer environments (85 environments across 7 domains)
67
+
68
+ ## Training Details
69
+
70
+ ### Training Data
71
+
72
+ - **SFT Data**: [LARK-Lab/EnvFactory-SFT-FILTERED](https://huggingface.co/datasets/LARK-Lab/EnvFactory-SFT-FILTERED) - 53.4k filtered trajectories
73
+ - **RL Data**: [LARK-Lab/EnvFactory-RL](https://huggingface.co/datasets/LARK-Lab/EnvFactory-RL) - 3.09k trajectories
74
+
75
+ ### Training Procedure
76
+
77
+ - **SFT Stage**: Full fine-tuning using LlamaFactory with DeepSpeed ZeRO-3
78
+ - **RL Stage**: Reinforcement learning using forked VeRL framework
79
+ - **Base Model**: Qwen/Qwen3-8B
80
+ - **Training Epochs**: 1 epoch for SFT
81
+ - **Learning Rate**: 1.0e-6 with cosine scheduler
82
+ - **Batch Size**: 1 per device with gradient accumulation of 32
83
+
84
+ ## Performance
85
+
86
+ Results on tool-use benchmarks compared to the base model:
87
+
88
+ | Model | BFCL Single Turn | BFCL Multi Turn | MCP-Atlas Pass Rate | MCP-Atlas Mean Cov. | τ²-Bench Avg. | VitaBench Avg. | Overall Avg. |
89
+ |-------|------------------|-----------------|---------------------|---------------------|---------------|----------------|--------------|
90
+ | Qwen3-8B (Base) | 84.31 | 41.25 | 5.15 | 14.86 | 32.30 | 16.70 | 29.23 |
91
+ | **EnvFactory-8B** | 86.02 | 49.00 | 13.75 | 25.98 | 33.67 | 18.67 | 33.40 |
92
+
93
+ ## Usage
94
+
95
+ ### Tool-Use Agent
96
+
97
+ ```python
98
+ from transformers import AutoTokenizer, AutoModelForCausalLM
99
+ import torch
100
+
101
+ model_path = "LARK-Lab/EnvFactory-8B"
102
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
103
+ model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, device_map="auto")
104
+
105
+ # Example tool-use conversation
106
+ messages = [
107
+ {"role": "system", "content": "You are a helpful assistant with access to various tools."},
108
+ {"role": "user", "content": "Search for recent papers about tool-use agents on arxiv."}
109
+ ]
110
+
111
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to(model.device)
112
+ outputs = model.generate(input_ids, max_new_tokens=1024, temperature=0.7, top_p=0.9)
113
+ response = tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True)
114
+ print(response)
115
+ ```
116
+
117
+ ### With MCP Tools
118
+
119
+ ```python
120
+ # Load MCP tool configuration
121
+ import json
122
+
123
+ with open("configs/mcp_server.json", "r") as f:
124
+ mcp_config = json.load(f)
125
+
126
+ # Use with your preferred MCP client
127
+ # See https://github.com/LARK-AI-Lab/EnvFactory for integration details
128
+ ```
129
+
130
+ ## Citation
131
+
132
+ If you find our work helpful, please consider citing:
133
+
134
+ ```bibtex
135
+ @misc{xu2026envfactoryscalingtooluseagents,
136
+ title={EnvFactory: Scaling Tool-Use Agents via Executable Environments Synthesis and Robust RL},
137
+ author={Minrui Xu and Zilin Wang and Mengyi DENG and Zhiwei Li and Zhicheng Yang and Xiao Zhu and Yinhong Liu and Boyu Zhu and Baiyu Huang and Chao Chen and Heyuan Deng and Fei Mi and Lifeng Shang and Xingshan Zeng and Zhijiang Guo},
138
+ year={2026},
139
+ eprint={2605.18703},
140
+ archivePrefix={arXiv},
141
+ primaryClass={cs.CL},
142
+ url={https://arxiv.org/abs/2605.18703},
143
+ }
144
+ ```
145
+
146
+ ## License
147
+
148
+ This model is released under the Apache 2.0 License.