matsuo-llm-advanced-phase-x1

Fine-tuned from unsloth/Qwen2.5-7B-Instruct for agent tasks (DB + ALFWorld).

Architecture

Two-stage training with weighted LoRA merge:

  1. Phase D reproduction (unsloth base): Same data/hyperparams as Phase D but using unsloth/Qwen2.5-7B-Instruct
  2. DB-specific LoRA: Trained on 480 teacher-generated DB traces (Qwen3-32B-AWQ)
  3. Weighted merge: final = phase_d_weight + 0.2 * (lora_merged - phase_d_weight)

Training Configuration

Stage 1: Phase D Reproduction

  • Base: unsloth/Qwen2.5-7B-Instruct
  • LoRA: r=8, alpha=16
  • lr: 1e-5, epochs: 0.3, batch: 4x4=16
  • Data: Spider/BIRD 70% + DBBench v4 20% + ALFWorld v5 10% (3500 samples)

Stage 2: DB LoRA

  • Base: Stage 1 merged model
  • LoRA: r=8, alpha=16
  • lr: 5e-6, epochs: 0.25, batch: 2x8=16
  • Data: 480 teacher-generated DB traces (Qwen3-32B-AWQ distilled)

Stage 3: Weighted Merge

  • Lambda: 0.2 (conservative — preserves ALFWorld ability)
  • Formula: final = base + λ * (full_merge - base)

Teacher Signal Generation

  • Teacher model: Qwen/Qwen3-32B-AWQ (whitelisted for distillation/synthesis)
  • Input: Spider/BIRD table schemas only (no AgentBench/ALFWorld data)
  • 500 generated, 480 validated (regex + sqlparse filtering, no LLM)
  • SQL type distribution: SELECT 73%, INSERT 18%, DELETE 5%, UPDATE 4%
  • Weak category emphasis: INSERT, counting, agg-max, agg-sum

Datasets

  • u-10bei/dbbench_sft_dataset_react_v4 — Listed in the organizer-shared Phase B dataset list. Used as provided (no modification).
  • xlangai/spider — CC BY-SA 4.0 (Yale/Columbia Spider project)
  • birdsql/bird_mini_dev — CC BY-SA 4.0 (HKU)
  • Official Phase B ALFWorld v5 dataset — Organizer-provided, used as provided.
  • Synthetic DB traces generated by Qwen/Qwen3-32B-AWQ (whitelisted model)

Compliance

  • Evaluation data not used in training.
  • LLM was not used for data quality filtering or selection.
  • Teacher signals generated from whitelisted model (Qwen/Qwen3-32B-AWQ) using Spider/BIRD schemas only.
  • Filtering is regex/sqlparse only (non-LLM).
  • Inference code not modified.
  • Base model: unsloth/Qwen2.5-7B-Instruct (whitelisted learning-designated model).

Usage

Compatible with vLLM v0.13.0+.

Downloads last month
2
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for astom-M/matsuo-llm-advanced-phase-x1

Base model

Qwen/Qwen2.5-7B
Finetuned
(2127)
this model