Qwen3.5-27B Agent SFT v2 — Multi-Turn Tool-Calling LoRA

A LoRA adapter fine-tuned on 220K multi-turn tool-calling trajectories for grounded image generation agent tasks.

Model Details

Parameter Value
Base model Qwen3.5-27B (26.9B params)
LoRA rank 32
LoRA alpha 32
Trainable params 159.4M
Training steps 13,799 (batch 8, 3x H200 GPUs)
Final loss 0.36
Token accuracy 87.6%

Training Data (220K Multi-Turn Examples)

Source Examples Key Strength
ToolMind 203K Multi-agent synthesis, reasoning traces, 20K+ tools
FunReason-MT 13K Complex sequential tool dependencies
APIGen-MT-5k 5K Gold standard, 99% human-verified trajectories

Average 9.2 turns per example with sequential tool calls.

Key Insight

Single-turn tool-calling data (e.g., raw Toucan-1.5M, xLAM-60K) actively degrades multi-step reasoning. Our v1 SFT trained on 70K single-turn examples scored -5% vs base. This v2 with 220K multi-turn data scored +10% vs base.

Evaluation (vs Base Qwen3.5-27B)

Test Base SFT v2 Delta
Real location (NTU Hive) 0.70 0.85 +21%
Multi-step (Marina Bay Sands) 0.70 1.00 +43%
Database search 1.00 1.00 0%
Fictional scene 1.00 1.00 0%
No tools needed 0.75 0.75 0%
Full pipeline 0.70 0.85 +21%
Average 0.81 0.91 +10%

Usage

Infrastructure

  • 3x NVIDIA H200 (144GB VRAM each), data-parallel training
  • Checkpoints auto-pushed to HuggingFace every 500 steps
  • Part of SC4062 Grounded Image Generation Agent project
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Rabornkraken/qwen3.5-27b-agent-sft-v2

Base model

Qwen/Qwen3.5-27B
Adapter
(60)
this model

Datasets used to train Rabornkraken/qwen3.5-27b-agent-sft-v2