AgentTune: Qwen2.5-3B ReAct Agent LoRA

QLoRA fine-tuned adapter that teaches Qwen2.5-3B-Instruct multi-step agent reasoning using the ReAct (Thought → Action → Observation → Answer) framework.

Key Results

Metric	Zero-Shot	Fine-Tuned	Improvement
Task Success Rate	93.3%	100%	+6.7%
Tool Selection Accuracy	30.0%	100%	+70.0%
Exact Tool Match	30.0%	100%	+70.0%

Training Details

Method: QLoRA (4-bit NF4, double quantization)
LoRA rank / alpha: 16 / 32
Target modules: All attention + MLP projections
Training samples: 500 ReAct trajectories
Epochs: 3
Learning rate: 2e-4 (cosine schedule)
Training time: ~10 minutes on L4 GPU
Final loss: 0.419

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "Cheng-1/agenttune-qwen2.5-3b-react-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

Model tree for Cheng-1/agenttune-qwen2.5-3b-react-lora

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1116)

this model

Cheng-1
/

agenttune-qwen2.5-3b-react-lora

AgentTune: Qwen2.5-3B ReAct Agent LoRA

Key Results

Training Details

Usage

Links

Model tree for Cheng-1/agenttune-qwen2.5-3b-react-lora