Qwen3-4B Agent 2-Phase SFT (ALF+DB) 20260225-2c11
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using 2-Phase Sequential Fine-Tuning with LoRA + Unsloth.
Training Strategy: 2-Phase Learning (Catastrophic Forgetting Prevention)
Phase 1: ALFWorld Specialization
- Dataset: ALFWorld trajectory data + 6 custom ALFWorld samples (×80 = 480 samples)
- Goal: Achieve ALF 72-74% performance
- Epochs: 2
- Learning Rate: 6e-6
- Custom data ratio: ~16%
Phase 2: DBBench Addition (Continual Learning)
- Dataset: ALFWorld data + 9 custom DBBench samples (×70 = 630 samples)
- Goal: Maintain ALF 70-72%, achieve DB 55%+
- Epochs: 1
- Learning Rate: 3e-6 (lower for gentle learning)
- Based on Phase 1 adapter
- Custom data ratio: ~20%
Why 2-Phase Training?
Problem: Single-phase training causes trade-off between ALFWorld and DBBench. When DBBench improves, ALFWorld degrades.
Solution: Sequential fine-tuning minimizes catastrophic forgetting:
- Phase 1 establishes strong ALFWorld foundation
- Phase 2 carefully adds DBBench capability with lower LR
- Maintains ALFWorld knowledge while gaining DBBench skills
Expected Performance
- ALFWorld: 72%+
- DBBench: 55%+
- Overall Score: 5.0+
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: 2-Phase LoRA (continual learning)
- Max sequence length: 2048
- Phase 1: 2 epochs @ 6e-6 LR
- Phase 2: 1 epoch @ 3e-6 LR
- LoRA: r=128, alpha=128
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "TToyo2511/ttoyo_advance_2c11" #★TTT20260225 2c11版
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license and the base model's original terms of use.
- Downloads last month
- -
Model tree for TToyo2511/ttoyo_advance_2c11
Base model
Qwen/Qwen3-4B-Instruct-2507