matsuo-llm-advanced-phase-x1
Fine-tuned from unsloth/Qwen2.5-7B-Instruct for agent tasks (DB + ALFWorld).
Architecture
Two-stage training with weighted LoRA merge:
- Phase D reproduction (unsloth base): Same data/hyperparams as Phase D but using unsloth/Qwen2.5-7B-Instruct
- DB-specific LoRA: Trained on 480 teacher-generated DB traces (Qwen3-32B-AWQ)
- Weighted merge: final = phase_d_weight + 0.2 * (lora_merged - phase_d_weight)
Training Configuration
Stage 1: Phase D Reproduction
- Base: unsloth/Qwen2.5-7B-Instruct
- LoRA: r=8, alpha=16
- lr: 1e-5, epochs: 0.3, batch: 4x4=16
- Data: Spider/BIRD 70% + DBBench v4 20% + ALFWorld v5 10% (3500 samples)
Stage 2: DB LoRA
- Base: Stage 1 merged model
- LoRA: r=8, alpha=16
- lr: 5e-6, epochs: 0.25, batch: 2x8=16
- Data: 480 teacher-generated DB traces (Qwen3-32B-AWQ distilled)
Stage 3: Weighted Merge
- Lambda: 0.2 (conservative — preserves ALFWorld ability)
- Formula: final = base + λ * (full_merge - base)
Teacher Signal Generation
- Teacher model: Qwen/Qwen3-32B-AWQ (whitelisted for distillation/synthesis)
- Input: Spider/BIRD table schemas only (no AgentBench/ALFWorld data)
- 500 generated, 480 validated (regex + sqlparse filtering, no LLM)
- SQL type distribution: SELECT 73%, INSERT 18%, DELETE 5%, UPDATE 4%
- Weak category emphasis: INSERT, counting, agg-max, agg-sum
Datasets
u-10bei/dbbench_sft_dataset_react_v4— Listed in the organizer-shared Phase B dataset list. Used as provided (no modification).xlangai/spider— CC BY-SA 4.0 (Yale/Columbia Spider project)birdsql/bird_mini_dev— CC BY-SA 4.0 (HKU)- Official Phase B ALFWorld v5 dataset — Organizer-provided, used as provided.
- Synthetic DB traces generated by Qwen/Qwen3-32B-AWQ (whitelisted model)
Compliance
- Evaluation data not used in training.
- LLM was not used for data quality filtering or selection.
- Teacher signals generated from whitelisted model (Qwen/Qwen3-32B-AWQ) using Spider/BIRD schemas only.
- Filtering is regex/sqlparse only (non-LLM).
- Inference code not modified.
- Base model: unsloth/Qwen2.5-7B-Instruct (whitelisted learning-designated model).
Usage
Compatible with vLLM v0.13.0+.
- Downloads last month
- 2