Miaow-Lab
/

STT-Agent-RL

Model card Files Files and versions

Chaox72 commited on 2 days ago

Commit

e5e8b76

·

verified ·

1 Parent(s): 8c6574b

Update README.md

Files changed (1) hide show

README.md +53 -0

README.md CHANGED Viewed

@@ -1,3 +1,56 @@
 ---
 license: mit
 ---

 ---
 license: mit
 ---
+# STT-Agent-SFT
+This repository contains the **STT-Agent-RL** model throught online RL training based on **STT-Agent-SFT**.
+## 📊 Performance on STT-Arena
+Below is the overall Pass@1 performance of STT-Agent compared to other frontier models:
+![image](https://cdn-uploads.huggingface.co/production/uploads/66fa30dee6210a5175235a3c/jEVVEMz_uIFeGpNirY2vh.png)
+![STT-Arena Results](images/stt_arena_results.png)
+### Ablation: Effect of Iterative Trajectory Refinement
+| Model | Easy | Medium | Hard | Impossible | Overall | Avg. Calls |
+|-------|------|--------|------|------------|---------|-------------|
+| Qwen-3-4B (baseline) | 18.31 | 9.46 | 2.82 | 10.00 | 10.57 | 7.63 |
+| STT-Agent (w/o refine) | 28.17 | 16.92 | 11.86 | 47.01 | 23.10 | 32.70 |
+| **{model_name} (with refine)** | **26.76** | **17.41** | **13.56** | **61.11** | **25.11** | **15.30** |
+Trajectory refinement significantly improves both accuracy and efficiency (reduces average API calls).
+## 🚀 Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "{model_name}"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name)
+# Example tool-use prompt
+prompt = "User: Book the cheapest flight from PVG to CDG.\n"
+inputs = tokenizer(prompt, return_tensors="pt")
+outputs = model.generate(**inputs)
+print(tokenizer.decode(outputs[0]))
+```
+## 🧪 Training Details
+Base model: Qwen-3-4B-Base
+SFT: 2,212 refined trajectories
+RL strategy: REINFORCE++
+Compute: 4× NVIDIA H200 GPUs
+## 📄 Citation
+```bibtex
+xxx
+```