Qwen3-4B Agent 2-Phase SFT (ALF+DB) 20260225-2c11

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using 2-Phase Sequential Fine-Tuning with LoRA + Unsloth.

Training Strategy: 2-Phase Learning (Catastrophic Forgetting Prevention)

Phase 1: ALFWorld Specialization

  • Dataset: ALFWorld trajectory data + 6 custom ALFWorld samples (×80 = 480 samples)
  • Goal: Achieve ALF 72-74% performance
  • Epochs: 2
  • Learning Rate: 6e-6
  • Custom data ratio: ~16%

Phase 2: DBBench Addition (Continual Learning)

  • Dataset: ALFWorld data + 9 custom DBBench samples (×70 = 630 samples)
  • Goal: Maintain ALF 70-72%, achieve DB 55%+
  • Epochs: 1
  • Learning Rate: 3e-6 (lower for gentle learning)
  • Based on Phase 1 adapter
  • Custom data ratio: ~20%

Why 2-Phase Training?

Problem: Single-phase training causes trade-off between ALFWorld and DBBench. When DBBench improves, ALFWorld degrades.

Solution: Sequential fine-tuning minimizes catastrophic forgetting:

  1. Phase 1 establishes strong ALFWorld foundation
  2. Phase 2 carefully adds DBBench capability with lower LR
  3. Maintains ALFWorld knowledge while gaining DBBench skills

Expected Performance

  • ALFWorld: 72%+
  • DBBench: 55%+
  • Overall Score: 5.0+

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: 2-Phase LoRA (continual learning)
  • Max sequence length: 2048
  • Phase 1: 2 epochs @ 6e-6 LR
  • Phase 2: 1 epoch @ 3e-6 LR
  • LoRA: r=128, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "TToyo2511/ttoyo_advance_2c11" #★TTT20260225 2c11版

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/sft_alfworld_trajectory_dataset_v5

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license and the base model's original terms of use.

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TToyo2511/ttoyo_advance_2c11

Adapter
(5268)
this model

Dataset used to train TToyo2511/ttoyo_advance_2c11