Qwen2.5-7B-Instruct + Phase A Multi-Benchmark LoRA
Model Description
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct optimized for multi-benchmark agent tasks (ALFWorld + DBBench).
Key characteristics:
- Base model: Qwen2.5-7B-Instruct
- Training method: bf16 LoRA (NOT QLoRA 4-bit) — zero rounding errors during merge
- Format: bfloat16 safetensors (no quantization)
- Size: ~15GB
- Compatible with: vLLM v0.13.0+, transformers, etc.
Training Details
LoRA Configuration
| Parameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable params | ~0.28% of total |
Training Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 3e-5 |
| Epochs | 1.0 |
| Batch size (effective) | 16 (4 × 4 grad accum) |
| Max sequence length | 4096 |
| LR scheduler | linear |
| Optimizer | AdamW 8-bit |
| Warmup ratio | 0.03 |
| Weight decay | 0.01 |
| Precision | bfloat16 |
Training Data
- Total samples: 3,500
- Composition:
- Official DBBench v4: 1,200 samples (34.3%)
- Official ALFWorld v5: 1,050 samples (30.0%)
- Existing Spider/BIRD: 1,250 samples (35.7%)
- Sources:
- DBBench:
u-10bei/dbbench_sft_dataset_react_v4 - ALFWorld:
u-10bei/sft_alfworld_trajectory_dataset_v5 - Existing: Spider (Yale) + BIRD (HKU)
- DBBench:
- Generation method: Official datasets + template-based synthetic data
Training Results
- Training steps: 225
- Training time: 12.4 minutes (RTX 5090)
- Best checkpoint: step 150
- Train loss: 0.6436 → 0.2643 (59% improvement)
- Eval loss: 0.6588 → 0.2769 (58% improvement)
- Best eval loss: 0.2769
- Peak VRAM: ~26GB / 32GB
Performance Metrics
| Benchmark | Score |
|---|---|
| DBBench (expected) | 55%+ |
| ALFWorld (expected) | 65%+ |
Usage
Basic Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"astom-M/matsuo-llm-advanced-phase-a",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"astom-M/matsuo-llm-advanced-phase-a",
trust_remote_code=True
)
vLLM Deployment
python -m vllm.entrypoints.openai.api_server \
--model astom-M/matsuo-llm-advanced-phase-a \
--dtype bfloat16 \
--max-model-len 4096
Important Notes
- No quantization artifacts: Trained in bf16 full precision (not QLoRA 4-bit), eliminating rounding errors from quantization-to-bf16 merge
config.jsondoes NOT containquantization_config— clean bf16 model- All safetensor weights are in
torch.bfloat16dtype - Multi-benchmark optimization: Balanced training across ALFWorld and DBBench tasks
Compliance
- Base model: Qwen2.5-7B-Instruct (Apache 2.0 license, whitelisted for competition)
- Training data: Official competition datasets + template-based synthetic data
- No inference code modification
- No RAG/ToolUse
- No commercial API usage
Training Strategy
This model was trained as Phase A of a multi-phase optimization strategy:
- Goal: Improve base model performance on both ALFWorld (household tasks) and DBBench (SQL generation)
- Approach: Conservative LoRA fine-tuning with balanced dataset composition
- Constraint: Must maintain compatibility with production evaluation environment (yaml変更不可)
The training data composition was carefully balanced to:
- Leverage official competition datasets (64.3%)
- Preserve base model capabilities through existing data (35.7%)
- Avoid catastrophic forgetting through moderate learning rate and careful hyperparameter tuning
License
Apache 2.0 (inherited from Qwen2.5-7B-Instruct)
Model Card Metadata:
- Model size: 8B parameters
- Tensor type: BF16
- Format: Safetensors
- Training date: 2026-02-16
- Downloads last month
- 3
Model tree for astom-M/matsuo-llm-advanced-phase-a
Datasets used to train astom-M/matsuo-llm-advanced-phase-a
Evaluation results
- Evaluation Lossself-reported0.277