astom-M
/

matsuo-llm-advanced-phase-x1

Text Generation

Model card Files Files and versions

matsuo-llm-advanced-phase-x1

Fine-tuned from unsloth/Qwen2.5-7B-Instruct for agent tasks (DB + ALFWorld).

Architecture

Two-stage training with weighted LoRA merge:

Phase D reproduction (unsloth base): Same data/hyperparams as Phase D but using unsloth/Qwen2.5-7B-Instruct
DB-specific LoRA: Trained on 480 teacher-generated DB traces (Qwen3-32B-AWQ)
Weighted merge: final = phase_d_weight + 0.2 * (lora_merged - phase_d_weight)

Training Configuration

Stage 1: Phase D Reproduction

Base: unsloth/Qwen2.5-7B-Instruct
LoRA: r=8, alpha=16
lr: 1e-5, epochs: 0.3, batch: 4x4=16
Data: Spider/BIRD 70% + DBBench v4 20% + ALFWorld v5 10% (3500 samples)

Stage 2: DB LoRA

Base: Stage 1 merged model
LoRA: r=8, alpha=16
lr: 5e-6, epochs: 0.25, batch: 2x8=16
Data: 480 teacher-generated DB traces (Qwen3-32B-AWQ distilled)

Stage 3: Weighted Merge

Lambda: 0.2 (conservative — preserves ALFWorld ability)
Formula: final = base + λ * (full_merge - base)

Teacher Signal Generation

Teacher model: Qwen/Qwen3-32B-AWQ (whitelisted for distillation/synthesis)
Input: Spider/BIRD table schemas only (no AgentBench/ALFWorld data)
500 generated, 480 validated (regex + sqlparse filtering, no LLM)
SQL type distribution: SELECT 73%, INSERT 18%, DELETE 5%, UPDATE 4%
Weak category emphasis: INSERT, counting, agg-max, agg-sum

Datasets

u-10bei/dbbench_sft_dataset_react_v4 — Listed in the organizer-shared Phase B dataset list. Used as provided (no modification).
xlangai/spider — CC BY-SA 4.0 (Yale/Columbia Spider project)
birdsql/bird_mini_dev — CC BY-SA 4.0 (HKU)
Official Phase B ALFWorld v5 dataset — Organizer-provided, used as provided.
Synthetic DB traces generated by Qwen/Qwen3-32B-AWQ (whitelisted model)

Compliance

Evaluation data not used in training.
LLM was not used for data quality filtering or selection.
Teacher signals generated from whitelisted model (Qwen/Qwen3-32B-AWQ) using Spider/BIRD schemas only.
Filtering is regex/sqlparse only (non-LLM).
Inference code not modified.
Base model: unsloth/Qwen2.5-7B-Instruct (whitelisted learning-designated model).

Usage

Compatible with vLLM v0.13.0+.

Downloads last month: 2

Safetensors

Model size

8B params

Tensor type

BF16

·

Model tree for astom-M/matsuo-llm-advanced-phase-x1

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

unsloth/Qwen2.5-7B-Instruct

Finetuned

(2127)

this model