Qwen3-30B-A3B LoRA for R2E-Gym Bug Fixing
This is a LoRA (Low-Rank Adaptation) fine-tuned version of Qwen/Qwen3-30B-A3B for automated bug fixing tasks in the R2E-Gym environment.
Model Details
- Base Model: Qwen/Qwen3-30B-A3B
- Training Method: Supervised Fine-Tuning (SFT)
- LoRA Rank: 16
- LoRA Alpha: 32
- Target Modules: all-linear
- Training Framework: Tinker
Training Results
Action Format Validation
| Metric | Base Model (Qwen3-30B-A3B) | After SFT |
|---|---|---|
| Valid Format Rate | 73.3% | 100% (first attempt) |
| Has Thought Rate | 100% | 100% |
The base model already produces valid actions 73.3% of the time, but often fails on the first step (no "Action:" line found). After SFT training, the model produces correctly formatted actions from the very first step.
Action Type Distribution (Base Model)
- search: 66.7%
- read: 6.7%
- invalid: 26.7%
Training Data
The model was fine-tuned on synthetic trajectories generated from gold patches in the R2E-Gym benchmark. Each training example consists of:
- A problem description from a GitHub issue
- A sequence of actions (search, read, edit, submit) to fix the bug
- Thought processes explaining the reasoning behind each action
Training Data Stats:
- 254 SFT training examples from 50 instances
- 3 epochs x 32 batches = 96 total training steps
- Training time: ~10 minutes
Intended Use
This model is designed for:
- Automated bug fixing in software repositories
- Code editing tasks following a structured action format
- R2E-Gym evaluation
Action Format
The model produces outputs in the following format:
Thought: [reasoning about the problem]
Action: [action_type] [arguments]
Available actions:
bash <command>: Run shell commandread <file>: Read file contentsearch <pattern> [path]: Search for pattern (grep -rn)edit <file> <start> <end>\n<new_content>: Replace lines start-end with new_contentsubmit: Submit the solution
Usage
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-30B-A3B",
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B", trust_remote_code=True)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "k-l-lambda/qwen3-30b-a3b-r2e-gym-sft")
# Generate
prompt = """You are an expert software engineer fixing bugs.
AVAILABLE ACTIONS:
- bash <command>: Run shell command
- read <file>: Read file content
- search <pattern>: Search for pattern
- edit <file> <start> <end>: Edit file
- submit: Submit solution
## Problem:
Fix a bug in the authentication module where users cannot log in with valid credentials.
## Your turn:
Thought:"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Configuration
- Learning Rate: 1e-5
- Epochs: 3
- Batch Size: 8
- Max Sequence Length: 4096
- LoRA Rank: 16
- LoRA Alpha: 32
Limitations
- The model was trained on a sample of R2E-Gym data and may not generalize to all bug types
- Performance depends on the quality of problem descriptions
License
This model is released under the Apache 2.0 license.
Citation
If you use this model, please cite the R2E-Gym benchmark and Qwen3 model.
- Downloads last month
- 5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support