Qwen3-4B Function Calling โ LoRA Adapter
This is a standalone LoRA adapter (PEFT) for Qwen/Qwen3-4B-Instruct-2507, fine-tuned for function calling / tool use.
Developed by Prabhu Nithin Gollapudi.
If you prefer loading a single self-contained model without needing the original base weights, use the fully merged version:
prabhu-nithin/qwen3-4b-xlam-function-calling-60k
What this model does
The model decides when to call a tool vs. answer directly. When a tool call is needed, it outputs a structured JSON payload wrapped in <tool_call> tags. Given the tool result back as a <tool_response>, it then produces a natural language final answer.
User Query -> Model -> <tool_call>{"name": "fn", "arguments": {...}}</tool_call>
|
Execute Python function
|
<tool_response>{"result": ...}</tool_response>
|
Model -> Final Answer
Quick start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
import json
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "prabhu-nithin/qwen3-4b-xlam-function-calling-60k-lora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(base_model_id, torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(base_model, adapter_id)
tool_definitions = [
{
"name": "get_weather",
"description": "Get current weather information for a given city.",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "The city name"},
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"], "default": "celsius"}
},
"required": ["city"]
}
}
]
system_prompt = f"""You are a helpful assistant.
# Tools
You may call one or more functions to assist with the user query.
You are provided with function signatures within <tools></tools> XML tags:
<tools>
{json.dumps(tool_definitions, indent=2)}
</tools>
For each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:
<tool_call>
{{"name": "<function-name>", "arguments": <args-json-object>}}
</tool_call>"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": "What's the weather like in Paris?"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
Training procedure
This adapter was trained with SFT (Supervised Fine-Tuning) using QLoRA (4-bit quantization + LoRA). Training took approximately 18 hours on an NVIDIA RTX 3060 Laptop GPU.
Dataset
- Source: Salesforce/xlam-function-calling-60k
- Split: 95% train / 5% eval
- Format: Converted to Qwen3's chat template with
<tool_call>/<tool_response>tokens
Hyperparameters
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Quantization | 4-bit (QLoRA) |
LoRA rank (r) |
16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Learning rate | 2e-4 |
| LR scheduler | cosine |
| Warmup ratio | 0.05 |
| Batch size (per device) | 1 |
| Gradient accumulation steps | 8 (effective batch size = 8) |
| Weight decay | 0.01 |
| Optimizer | paged_adamw_8bit |
| Max sequence length | 2048 |
| Precision | bf16 |
Hardware
| Setting | Value |
|---|---|
| GPU | NVIDIA RTX 3060 Laptop |
| VRAM | ~8 GB used (QLoRA 4-bit) |
| Training time | ~18 hours |
Framework versions
- PEFT: 0.18.1
- TRL: 0.29.0
- Transformers: 5.3.0
- Pytorch: 2.5.1+cu121
- Datasets: 4.6.1
- Tokenizers: 0.22.2
License
- Code: MIT License
- Base Model: Apache 2.0
- Dataset: CC-BY-4.0
- Downloads last month
- 2
Model tree for prabhu-nithin/qwen3-4b-xlam-function-calling-60k-lora
Base model
Qwen/Qwen3-4B-Instruct-2507