Qwen2.5-7B-Instruct + Phase A Multi-Benchmark LoRA

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct optimized for multi-benchmark agent tasks (ALFWorld + DBBench).

Key characteristics:

Base model: Qwen2.5-7B-Instruct
Training method: bf16 LoRA (NOT QLoRA 4-bit) — zero rounding errors during merge
Format: bfloat16 safetensors (no quantization)
Size: ~15GB
Compatible with: vLLM v0.13.0+, transformers, etc.

Training Details

LoRA Configuration

Parameter	Value
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0
Target modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params	~0.28% of total

Training Hyperparameters

Parameter	Value
Learning rate	3e-5
Epochs	1.0
Batch size (effective)	16 (4 × 4 grad accum)
Max sequence length	4096
LR scheduler	linear
Optimizer	AdamW 8-bit
Warmup ratio	0.03
Weight decay	0.01
Precision	bfloat16

Training Data

Total samples: 3,500
Composition:
- Official DBBench v4: 1,200 samples (34.3%)
- Official ALFWorld v5: 1,050 samples (30.0%)
- Existing Spider/BIRD: 1,250 samples (35.7%)
Sources:
- DBBench: u-10bei/dbbench_sft_dataset_react_v4
- ALFWorld: u-10bei/sft_alfworld_trajectory_dataset_v5
- Existing: Spider (Yale) + BIRD (HKU)
Generation method: Official datasets + template-based synthetic data

Training Results

Training steps: 225
Training time: 12.4 minutes (RTX 5090)
Best checkpoint: step 150
Train loss: 0.6436 → 0.2643 (59% improvement)
Eval loss: 0.6588 → 0.2769 (58% improvement)
Best eval loss: 0.2769
Peak VRAM: ~26GB / 32GB

Performance Metrics

Benchmark	Score
DBBench (expected)	55%+
ALFWorld (expected)	65%+

Usage

Basic Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    trust_remote_code=True
)

vLLM Deployment

python -m vllm.entrypoints.openai.api_server \
    --model astom-M/matsuo-llm-advanced-phase-a \
    --dtype bfloat16 \
    --max-model-len 4096

Important Notes

No quantization artifacts: Trained in bf16 full precision (not QLoRA 4-bit), eliminating rounding errors from quantization-to-bf16 merge
config.json does NOT contain quantization_config — clean bf16 model
All safetensor weights are in torch.bfloat16 dtype
Multi-benchmark optimization: Balanced training across ALFWorld and DBBench tasks

Compliance

Base model: Qwen2.5-7B-Instruct (Apache 2.0 license, whitelisted for competition)
Training data: Official competition datasets + template-based synthetic data
No inference code modification
No RAG/ToolUse
No commercial API usage

Training Strategy

This model was trained as Phase A of a multi-phase optimization strategy:

Goal: Improve base model performance on both ALFWorld (household tasks) and DBBench (SQL generation)
Approach: Conservative LoRA fine-tuning with balanced dataset composition
Constraint: Must maintain compatibility with production evaluation environment (yaml変更不可)

The training data composition was carefully balanced to:

Leverage official competition datasets (64.3%)
Preserve base model capabilities through existing data (35.7%)
Avoid catastrophic forgetting through moderate learning rate and careful hyperparameter tuning

License

Apache 2.0 (inherited from Qwen2.5-7B-Instruct)

Model Card Metadata:

Model size: 8B parameters
Tensor type: BF16
Format: Safetensors
Training date: 2026-02-16

Downloads last month: 3

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for astom-M/matsuo-llm-advanced-phase-a

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1776)

this model

Datasets used to train astom-M/matsuo-llm-advanced-phase-a

Evaluation results

Evaluation Loss
self-reported

0.277