Qwen3.5-27B Agent SFT v2 — Multi-Turn Tool-Calling LoRA
A LoRA adapter fine-tuned on 220K multi-turn tool-calling trajectories for grounded image generation agent tasks.
Model Details
| Parameter | Value |
|---|---|
| Base model | Qwen3.5-27B (26.9B params) |
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Trainable params | 159.4M |
| Training steps | 13,799 (batch 8, 3x H200 GPUs) |
| Final loss | 0.36 |
| Token accuracy | 87.6% |
Training Data (220K Multi-Turn Examples)
| Source | Examples | Key Strength |
|---|---|---|
| ToolMind | 203K | Multi-agent synthesis, reasoning traces, 20K+ tools |
| FunReason-MT | 13K | Complex sequential tool dependencies |
| APIGen-MT-5k | 5K | Gold standard, 99% human-verified trajectories |
Average 9.2 turns per example with sequential tool calls.
Key Insight
Single-turn tool-calling data (e.g., raw Toucan-1.5M, xLAM-60K) actively degrades multi-step reasoning. Our v1 SFT trained on 70K single-turn examples scored -5% vs base. This v2 with 220K multi-turn data scored +10% vs base.
Evaluation (vs Base Qwen3.5-27B)
| Test | Base | SFT v2 | Delta |
|---|---|---|---|
| Real location (NTU Hive) | 0.70 | 0.85 | +21% |
| Multi-step (Marina Bay Sands) | 0.70 | 1.00 | +43% |
| Database search | 1.00 | 1.00 | 0% |
| Fictional scene | 1.00 | 1.00 | 0% |
| No tools needed | 0.75 | 0.75 | 0% |
| Full pipeline | 0.70 | 0.85 | +21% |
| Average | 0.81 | 0.91 | +10% |
Usage
Infrastructure
- 3x NVIDIA H200 (144GB VRAM each), data-parallel training
- Checkpoints auto-pushed to HuggingFace every 500 steps
- Part of SC4062 Grounded Image Generation Agent project
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for Rabornkraken/qwen3.5-27b-agent-sft-v2
Base model
Qwen/Qwen3.5-27B