Qwen3-4B Agent 2-Phase SFT (ALF+DB) 20260225-2c11

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using 2-Phase Sequential Fine-Tuning with LoRA + Unsloth.

Training Strategy: 2-Phase Learning (Catastrophic Forgetting Prevention)

Phase 1: ALFWorld Specialization

Dataset: ALFWorld trajectory data + 6 custom ALFWorld samples (×80 = 480 samples)
Goal: Achieve ALF 72-74% performance
Epochs: 2
Learning Rate: 6e-6
Custom data ratio: ~16%

Phase 2: DBBench Addition (Continual Learning)

Dataset: ALFWorld data + 9 custom DBBench samples (×70 = 630 samples)
Goal: Maintain ALF 70-72%, achieve DB 55%+
Epochs: 1
Learning Rate: 3e-6 (lower for gentle learning)
Based on Phase 1 adapter
Custom data ratio: ~20%

Why 2-Phase Training?

Problem: Single-phase training causes trade-off between ALFWorld and DBBench. When DBBench improves, ALFWorld degrades.

Solution: Sequential fine-tuning minimizes catastrophic forgetting:

Phase 1 establishes strong ALFWorld foundation
Phase 2 carefully adds DBBench capability with lower LR
Maintains ALFWorld knowledge while gaining DBBench skills

Expected Performance

ALFWorld: 72%+
DBBench: 55%+
Overall Score: 5.0+

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: 2-Phase LoRA (continual learning)
Max sequence length: 2048
Phase 1: 2 epochs @ 6e-6 LR
Phase 2: 1 epoch @ 3e-6 LR
LoRA: r=128, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "TToyo2511/ttoyo_advance_2c11" #★TTT20260225 2c11版

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/sft_alfworld_trajectory_dataset_v5

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license and the base model's original terms of use.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for TToyo2511/ttoyo_advance_2c11

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5268)

this model

TToyo2511
/

ttoyo_advance_2c11