Qwen3-30B-A3B LoRA for R2E-Gym Bug Fixing

This is a LoRA (Low-Rank Adaptation) fine-tuned version of Qwen/Qwen3-30B-A3B for automated bug fixing tasks in the R2E-Gym environment.

Model Details

Base Model: Qwen/Qwen3-30B-A3B
Training Method: Supervised Fine-Tuning (SFT)
LoRA Rank: 16
LoRA Alpha: 32
Target Modules: all-linear
Training Framework: Tinker

Training Results

Action Format Validation

Metric	Base Model (Qwen3-30B-A3B)	After SFT
Valid Format Rate	73.3%	100% (first attempt)
Has Thought Rate	100%	100%

The base model already produces valid actions 73.3% of the time, but often fails on the first step (no "Action:" line found). After SFT training, the model produces correctly formatted actions from the very first step.

Action Type Distribution (Base Model)

search: 66.7%
read: 6.7%
invalid: 26.7%

Training Data

The model was fine-tuned on synthetic trajectories generated from gold patches in the R2E-Gym benchmark. Each training example consists of:

A problem description from a GitHub issue
A sequence of actions (search, read, edit, submit) to fix the bug
Thought processes explaining the reasoning behind each action

Training Data Stats:

254 SFT training examples from 50 instances
3 epochs x 32 batches = 96 total training steps
Training time: ~10 minutes

Intended Use

This model is designed for:

Automated bug fixing in software repositories
Code editing tasks following a structured action format
R2E-Gym evaluation

Action Format

The model produces outputs in the following format:

Thought: [reasoning about the problem]
Action: [action_type] [arguments]

Available actions:

bash <command>: Run shell command
read <file>: Read file content
search <pattern> [path]: Search for pattern (grep -rn)
edit <file> <start> <end>\n<new_content>: Replace lines start-end with new_content
submit: Submit the solution

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-30B-A3B",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B", trust_remote_code=True)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "k-l-lambda/qwen3-30b-a3b-r2e-gym-sft")

# Generate
prompt = """You are an expert software engineer fixing bugs.

AVAILABLE ACTIONS:
- bash <command>: Run shell command
- read <file>: Read file content
- search <pattern>: Search for pattern
- edit <file> <start> <end>: Edit file
- submit: Submit solution

## Problem:
Fix a bug in the authentication module where users cannot log in with valid credentials.

## Your turn:
Thought:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Configuration

Learning Rate: 1e-5
Epochs: 3
Batch Size: 8
Max Sequence Length: 4096
LoRA Rank: 16
LoRA Alpha: 32

Limitations

The model was trained on a sample of R2E-Gym data and may not generalize to all bug types
Performance depends on the quality of problem descriptions

License

This model is released under the Apache 2.0 license.

Citation

If you use this model, please cite the R2E-Gym benchmark and Qwen3 model.

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for k-l-lambda/qwen3-30b-a3b-r2e-gym-sft

Base model

Qwen/Qwen3-30B-A3B-Base

Finetuned

Qwen/Qwen3-30B-A3B

Adapter

(39)

this model