Qwen2.5-7B-Instruct + Phase A Multi-Benchmark LoRA

Model Description

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct optimized for multi-benchmark agent tasks (ALFWorld + DBBench).

Key characteristics:

  • Base model: Qwen2.5-7B-Instruct
  • Training method: bf16 LoRA (NOT QLoRA 4-bit) — zero rounding errors during merge
  • Format: bfloat16 safetensors (no quantization)
  • Size: ~15GB
  • Compatible with: vLLM v0.13.0+, transformers, etc.

Training Details

LoRA Configuration

Parameter Value
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params ~0.28% of total

Training Hyperparameters

Parameter Value
Learning rate 3e-5
Epochs 1.0
Batch size (effective) 16 (4 × 4 grad accum)
Max sequence length 4096
LR scheduler linear
Optimizer AdamW 8-bit
Warmup ratio 0.03
Weight decay 0.01
Precision bfloat16

Training Data

  • Total samples: 3,500
  • Composition:
    • Official DBBench v4: 1,200 samples (34.3%)
    • Official ALFWorld v5: 1,050 samples (30.0%)
    • Existing Spider/BIRD: 1,250 samples (35.7%)
  • Sources:
    • DBBench: u-10bei/dbbench_sft_dataset_react_v4
    • ALFWorld: u-10bei/sft_alfworld_trajectory_dataset_v5
    • Existing: Spider (Yale) + BIRD (HKU)
  • Generation method: Official datasets + template-based synthetic data

Training Results

  • Training steps: 225
  • Training time: 12.4 minutes (RTX 5090)
  • Best checkpoint: step 150
  • Train loss: 0.6436 → 0.2643 (59% improvement)
  • Eval loss: 0.6588 → 0.2769 (58% improvement)
  • Best eval loss: 0.2769
  • Peak VRAM: ~26GB / 32GB

Performance Metrics

Benchmark Score
DBBench (expected) 55%+
ALFWorld (expected) 65%+

Usage

Basic Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "astom-M/matsuo-llm-advanced-phase-a",
    trust_remote_code=True
)

vLLM Deployment

python -m vllm.entrypoints.openai.api_server \
    --model astom-M/matsuo-llm-advanced-phase-a \
    --dtype bfloat16 \
    --max-model-len 4096

Important Notes

  • No quantization artifacts: Trained in bf16 full precision (not QLoRA 4-bit), eliminating rounding errors from quantization-to-bf16 merge
  • config.json does NOT contain quantization_config — clean bf16 model
  • All safetensor weights are in torch.bfloat16 dtype
  • Multi-benchmark optimization: Balanced training across ALFWorld and DBBench tasks

Compliance

  • Base model: Qwen2.5-7B-Instruct (Apache 2.0 license, whitelisted for competition)
  • Training data: Official competition datasets + template-based synthetic data
  • No inference code modification
  • No RAG/ToolUse
  • No commercial API usage

Training Strategy

This model was trained as Phase A of a multi-phase optimization strategy:

  • Goal: Improve base model performance on both ALFWorld (household tasks) and DBBench (SQL generation)
  • Approach: Conservative LoRA fine-tuning with balanced dataset composition
  • Constraint: Must maintain compatibility with production evaluation environment (yaml変更不可)

The training data composition was carefully balanced to:

  1. Leverage official competition datasets (64.3%)
  2. Preserve base model capabilities through existing data (35.7%)
  3. Avoid catastrophic forgetting through moderate learning rate and careful hyperparameter tuning

License

Apache 2.0 (inherited from Qwen2.5-7B-Instruct)


Model Card Metadata:

  • Model size: 8B parameters
  • Tensor type: BF16
  • Format: Safetensors
  • Training date: 2026-02-16
Downloads last month
3
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for astom-M/matsuo-llm-advanced-phase-a

Base model

Qwen/Qwen2.5-7B
Adapter
(1776)
this model

Datasets used to train astom-M/matsuo-llm-advanced-phase-a

Evaluation results