Qwen3-30B-A3B LoRA for R2E-Gym Bug Fixing

This is a LoRA (Low-Rank Adaptation) fine-tuned version of Qwen/Qwen3-30B-A3B for automated bug fixing tasks in the R2E-Gym environment.

Model Details

  • Base Model: Qwen/Qwen3-30B-A3B
  • Training Method: Supervised Fine-Tuning (SFT)
  • LoRA Rank: 16
  • LoRA Alpha: 32
  • Target Modules: all-linear
  • Training Framework: Tinker

Training Results

Action Format Validation

Metric Base Model (Qwen3-30B-A3B) After SFT
Valid Format Rate 73.3% 100% (first attempt)
Has Thought Rate 100% 100%

The base model already produces valid actions 73.3% of the time, but often fails on the first step (no "Action:" line found). After SFT training, the model produces correctly formatted actions from the very first step.

Action Type Distribution (Base Model)

  • search: 66.7%
  • read: 6.7%
  • invalid: 26.7%

Training Data

The model was fine-tuned on synthetic trajectories generated from gold patches in the R2E-Gym benchmark. Each training example consists of:

  1. A problem description from a GitHub issue
  2. A sequence of actions (search, read, edit, submit) to fix the bug
  3. Thought processes explaining the reasoning behind each action

Training Data Stats:

  • 254 SFT training examples from 50 instances
  • 3 epochs x 32 batches = 96 total training steps
  • Training time: ~10 minutes

Intended Use

This model is designed for:

  • Automated bug fixing in software repositories
  • Code editing tasks following a structured action format
  • R2E-Gym evaluation

Action Format

The model produces outputs in the following format:

Thought: [reasoning about the problem]
Action: [action_type] [arguments]

Available actions:

  • bash <command>: Run shell command
  • read <file>: Read file content
  • search <pattern> [path]: Search for pattern (grep -rn)
  • edit <file> <start> <end>\n<new_content>: Replace lines start-end with new_content
  • submit: Submit the solution

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-30B-A3B",
    trust_remote_code=True,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B", trust_remote_code=True)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "k-l-lambda/qwen3-30b-a3b-r2e-gym-sft")

# Generate
prompt = """You are an expert software engineer fixing bugs.

AVAILABLE ACTIONS:
- bash <command>: Run shell command
- read <file>: Read file content
- search <pattern>: Search for pattern
- edit <file> <start> <end>: Edit file
- submit: Submit solution

## Problem:
Fix a bug in the authentication module where users cannot log in with valid credentials.

## Your turn:
Thought:"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Configuration

  • Learning Rate: 1e-5
  • Epochs: 3
  • Batch Size: 8
  • Max Sequence Length: 4096
  • LoRA Rank: 16
  • LoRA Alpha: 32

Limitations

  • The model was trained on a sample of R2E-Gym data and may not generalize to all bug types
  • Performance depends on the quality of problem descriptions

License

This model is released under the Apache 2.0 license.

Citation

If you use this model, please cite the R2E-Gym benchmark and Qwen3 model.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for k-l-lambda/qwen3-30b-a3b-r2e-gym-sft

Adapter
(39)
this model