File size: 2,037 Bytes

---
license: mit
---

# STT-Agent-SFT

This repository contains the **STT-Agent-RL** model throught online RL training based on **STT-Agent-SFT**.

## 📊 Performance on STT-Arena

Below is the overall Pass@1 performance of STT-Agent compared to other frontier models:


![image](https://cdn-uploads.huggingface.co/production/uploads/66fa30dee6210a5175235a3c/jEVVEMz_uIFeGpNirY2vh.png)

![STT-Arena Results](images/stt_arena_results.png)


### Ablation: Effect of Iterative Trajectory Refinement

| Model | Easy | Medium | Hard | Impossible | Overall | Avg. Calls |
|-------|------|--------|------|------------|---------|-------------|
| Qwen-3-4B (baseline) | 18.31 | 9.46 | 2.82 | 10.00 | 10.57 | 7.63 |
| STT-Agent (w/o refine) | 28.17 | 16.92 | 11.86 | 47.01 | 23.10 | 32.70 |
| **{model_name} (with refine)** | **26.76** | **17.41** | **13.56** | **61.11** | **25.11** | **15.30** |

Trajectory refinement significantly improves both accuracy and efficiency (reduces average API calls).

## 🚀 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "{model_name}"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Example tool-use prompt
prompt = "User: Book the cheapest flight from PVG to CDG.\n"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
```

## 🧪 Training Details

Base model: Qwen-3-4B-Base
SFT: 2,212 refined trajectories
RL strategy: REINFORCE++
Compute: 4× NVIDIA H200 GPUs

## 📄 Citation

```bibtex
@misc{hui2026sttarenarealisticenvironmenttoolusing,
      title={STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics}, 
      author={Tingfeng Hui and Hao Xu and Pengyu Zhu and Hongsheng Xin and Kun Zhan and Sen Su and Chunxiao Liu and Ning Miao},
      year={2026},
      eprint={2605.18548},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.18548}, 
}
```