| --- |
| license: mit |
| --- |
| |
| # STT-Agent-SFT |
|
|
| This repository contains the **STT-Agent-SFT** model fine-tuned for spatio‑temporal tool use, based on the refined trajectories. |
|
|
| ## 📊 Performance on STT-Arena |
|
|
| Below is the overall Pass@1 performance of STT-Agent compared to other frontier models: |
|
|
|
|
|  |
|
|
|
|
| ### Ablation: Effect of Iterative Trajectory Refinement |
|
|
| | Model | Easy | Medium | Hard | Impossible | Overall | Avg. Calls | |
| |-------|------|--------|------|------------|---------|-------------| |
| | Qwen-3-4B (baseline) | 18.31 | 9.46 | 2.82 | 10.00 | 10.57 | 7.63 | |
| | STT-Agent (w/o refine) | 28.17 | 16.92 | 11.86 | 47.01 | 23.10 | 32.70 | |
| | **{model_name} (with refine)** | **26.76** | **17.41** | **13.56** | **61.11** | **25.11** | **15.30** | |
| |
| Trajectory refinement significantly improves both accuracy and efficiency (reduces average API calls). |
| |
| ## 🚀 Usage |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "{model_name}" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name) |
| |
| # Example tool-use prompt |
| prompt = "User: Book the cheapest flight from PVG to CDG.\n" |
| inputs = tokenizer(prompt, return_tensors="pt") |
| outputs = model.generate(**inputs) |
| print(tokenizer.decode(outputs[0])) |
| ``` |
| |
| ## 🧪 Training Details |
| |
| Base model: Qwen-3-4B-Base |
| SFT: 2,212 refined trajectories |
| RL strategy: REINFORCE++ |
| Compute: 4× NVIDIA H200 GPUs |
| |
| ## 📄 Citation |
| |
| ```bibtex |
| @misc{hui2026sttarenarealisticenvironmenttoolusing, |
| title={STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics}, |
| author={Tingfeng Hui and Hao Xu and Pengyu Zhu and Hongsheng Xin and Kun Zhan and Sen Su and Chunxiao Liu and Ning Miao}, |
| year={2026}, |
| eprint={2605.18548}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2605.18548}, |
| } |
| ``` |