qwen2.5-7b-instruct-sft-v6
This repository provides a merged full model produced by supervised fine-tuning for AgentBench-oriented ALFWorld/DBBench robustness.
Training Objective
Improve strict action selection reliability for ALFWorld prompts and strengthen SQL error-recovery robustness for DBBench prompts, while keeping balanced mixed-task behavior.
Training Configuration
- Method: SFT (Unsloth LoRA) + merge to full model
- Base model ID (upstream):
Qwen/Qwen2.5-7B-Instruct - Initialization model for this stage: prior merged checkpoint from the previous advanced retraining stage
- LoRA:
r=16,alpha=32,dropout=0.0 - LoRA target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Output learn mode:
from_marker(ACTION:andAction:markers) - Max sequence length:
4096 - Max steps:
400 - Epochs:
1 - Learning rate:
2.0e-6 - Per-device train batch size:
1 - Per-device eval batch size:
2 - Gradient accumulation steps:
32 - Effective global batch size:
32 - Warmup ratio:
0.03 - Weight decay:
0.01 - Eval/Save steps:
50 / 25
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "uchkw/qwen2.5-7b-instruct-sft-v6"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Training Data / Sources & License (IMPORTANT)
- Primary source datasets:
u-10bei/sft_alfworld_trajectory_dataset_v5u-10bei/dbbench_sft_dataset_react_v4u-10bei/dbbench_sft_dataset_react_v3
- Data construction policy (concise):
- ALFWorld samples were converted into strict one-line action supervision (
ACTION: ...) with exact matching againstAVAILABLE ACTIONS. - Added hard-copy style ALF augmentation to reinforce exact action copying and reduce formatting drift.
- Mixed DBBench supervision and recovery-oriented examples for
Unknown columnstyle failures. - Mixed train ratio was controlled at approximately
ALF:DB = 55:45.
- ALFWorld samples were converted into strict one-line action supervision (
- Dataset scale (fix8 stage2):
- Train samples:
138496 - Validation samples:
7289 - Train ALF rows:
76173 - Train DB rows:
62323 - ALF strict-match in training set:
1.0 - ALF completion-verb ratio:
0.4726 - ALF toggle rows:
1772
- Train samples:
- Evaluation snapshot (checkpoint-350, official_v02 setting):
- DB overall_cat_accuracy:
0.4993505979 - ALF success_rate:
0.60 - ALF invalid_action_rate:
0.10 - ALF task_limit_rate:
0.30
- DB overall_cat_accuracy:
- Compliance:
- Follow each source dataset card and license terms.
- Follow base model terms of use.
- Downloads last month
- 4