SWE-Gym Qwen2.5-Coder-32B-Instruct Full SFT (64K Context)
A full-parameter supervised fine-tuned version of Qwen2.5-Coder-32B-Instruct using SWE-agent trajectory data distilled from Qwen3-Coder-480B-A35B-Instruct on the SWE-Gym dataset.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-Coder-32B-Instruct |
| Fine-tuning Method | Full-parameter SFT |
| Parameters | ~32.8B |
| Architecture | Qwen2ForCausalLM |
| Hidden Size | 5120 |
| Num Layers | 64 |
| Num Attention Heads | 40 (8 KV heads, GQA) |
| Intermediate Size | 27648 |
| Max Context Length | 64K tokens |
| Precision | bfloat16 |
Training Details
| Property | Value |
|---|---|
| Training Data | 634 resolved SWE-Gym instances |
| Teacher Model | Qwen3-Coder-480B-A35B-Instruct |
| Agent Framework | OpenHands CodeActAgent |
| Epochs | 3 |
| Total Steps | 60 |
| Batch Size | 1 per device × 8 GPUs × 4 grad accum = 32 effective |
| Learning Rate | 1e-5 (cosine schedule, 10% warmup) |
| Optimizer | AdamW (β1=0.9, β2=0.999, ε=1e-8) |
| Final Training Loss | 0.269 |
| Training Runtime | ~6.0 hours |
| Framework | LLaMA-Factory + DeepSpeed |
| Transformers | 5.2.0 |
| PyTorch | 2.6.0 |
Training Data
The training data consists of 634 resolved instances from the SWE-Gym training set. Trajectories were generated by running Qwen3-Coder-480B-A35B-Instruct (via OpenHands CodeActAgent with maxiter=100) on SWE-Gym tasks, then filtering to only resolved (successful) trajectories. Function-calling messages were converted to non-function-calling format for SFT, and trajectories exceeding 64K tokens were excluded.
Training Curve
| Step | Epoch | Loss | Learning Rate |
|---|---|---|---|
| 5 | 0.25 | 0.489 | 6.67e-06 |
| 10 | 0.50 | 0.369 | 9.92e-06 |
| 15 | 0.75 | 0.317 | 9.47e-06 |
| 20 | 1.00 | 0.280 | 8.64e-06 |
| 25 | 1.25 | 0.255 | 7.50e-06 |
| 30 | 1.50 | 0.234 | 6.15e-06 |
| 35 | 1.75 | 0.231 | 4.71e-06 |
| 40 | 2.00 | 0.226 | 3.29e-06 |
| 45 | 2.25 | 0.208 | 2.01e-06 |
| 50 | 2.50 | 0.202 | 9.89e-07 |
| 55 | 2.75 | 0.211 | 3.02e-07 |
| 60 | 3.00 | 0.206 | 8.46e-09 |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"MMR115/swegym-qwen2.5-coder-32b-instruct-sft-64k",
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("MMR115/swegym-qwen2.5-coder-32b-instruct-sft-64k")
Citation
If you use this model, please cite SWE-Gym and OpenHands:
@article{pan2024swegym,
title={Training Software Engineering Agents and Verifiers with SWE-Gym},
author={Pan, Jiayi and Xiao, Xingyao and Wang, Jinda and Graham, Colin and Wang, Xinran and Hu, Hoang and Wang, Rui and Shi, Heng and Liu, Pengfei and Wang, Huan and Qian, Cong},
journal={ICML},
year={2025}
}
- Downloads last month
- 2
Model tree for MMR115/swegym-qwen2.5-coder-32b-instruct-sft-64k
Base model
Qwen/Qwen2.5-32B Finetuned
Qwen/Qwen2.5-Coder-32B Finetuned
Qwen/Qwen2.5-Coder-32B-Instruct