AgentTune: Qwen2.5-3B ReAct Agent LoRA

QLoRA fine-tuned adapter that teaches Qwen2.5-3B-Instruct multi-step agent reasoning using the ReAct (Thought โ†’ Action โ†’ Observation โ†’ Answer) framework.

Key Results

Metric Zero-Shot Fine-Tuned Improvement
Task Success Rate 93.3% 100% +6.7%
Tool Selection Accuracy 30.0% 100% +70.0%
Exact Tool Match 30.0% 100% +70.0%

Training Details

  • Method: QLoRA (4-bit NF4, double quantization)
  • LoRA rank / alpha: 16 / 32
  • Target modules: All attention + MLP projections
  • Training samples: 500 ReAct trajectories
  • Epochs: 3
  • Learning rate: 2e-4 (cosine schedule)
  • Training time: ~10 minutes on L4 GPU
  • Final loss: 0.419

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "Cheng-1/agenttune-qwen2.5-3b-react-lora")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")

Links

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Cheng-1/agenttune-qwen2.5-3b-react-lora

Base model

Qwen/Qwen2.5-3B
Adapter
(1116)
this model