qwen3-4b-structured-output-phase3-merged

This repository provides a merged FP16 model built from Qwen/Qwen3-4B-Instruct-2507 and a Phase3 LoRA adapter (trained on top of Phase2).
This repository contains the full merged model weights, so it can be loaded directly with from_pretrained().

Training Objective

This model is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV), with a particular focus in Phase3 on reducing TOML parse failures (e.g., overwrite/declaration issues, inline tables) and improving CSV robustness (e.g., header-only outputs, inconsistent columns).

Loss is applied only to the final assistant output (completion-only loss), while the prompt portion is masked (labels = -100).
Data preparation removes <think>...</think> and code fences to prioritize strict, parseable outputs.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA SFT (4-bit) → merged into FP16 weights for inference
Framework: transformers + trl + peft
Output: merged FP16 model (safe_serialization)

Phases

Phase1 (mixed SFT)

python scripts/20_train_sft.py \
  --train_jsonl data/train.jsonl \
  --valid_jsonl data/valid.jsonl \
  --out_dir outputs/sft_phase1 \
  --max_seq_length 2048 \
  --epochs 1 \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 4 \
  --learning_rate 2e-4

Phase2 (hard-only SFT resume from Phase1)

python scripts/20_train_sft.py \
  --train_jsonl data/phase2_train.jsonl \
  --valid_jsonl data/phase2_valid.jsonl \
  --resume_from outputs/sft_phase1 \
  --out_dir outputs/sft_phase2 \
  --max_seq_length 2048 \
  --epochs 1 \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --learning_rate 5e-5

Phase3 (TOML/CSV-focused SFT resume from Phase2)

python scripts/20_train_sft.py \
  --train_jsonl data/phase3_train_3000.jsonl \
  --valid_jsonl data/phase3_valid_3000.jsonl \
  --resume_from outputs/sft_phase2 \
  --out_dir outputs/sft_phase3 \
  --max_seq_length 2048 \
  --epochs 2 \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --learning_rate 1e-5

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

Sources & License (IMPORTANT)

Base model: Qwen/Qwen3-4B-Instruct-2507 (please follow the base model’s terms of use)
Training Data (examples/mixes):
- daichira/structured-5k-mix-sft
- daichira/structured-hard-sft-4k
- u-10bei/structured_data_with_cot_dataset_512_v5 (output-only extraction used)
Phase3 dataset is built by filtering to TOML/CSV and removing unsafe patterns (e.g., TOML inline tables, overly long outputs), then sampling a subset (3000 train examples).
Dataset License: Please follow each dataset’s license and attribution requirements.
Compliance: Users must comply with both the datasets’ attribution requirements and the base model’s original terms of use.

＜日本語訳＞

qwen3-4b-structured-output-phase3-merged

このリポジトリは、Qwen/Qwen3-4B-Instruct-2507 と Phase3 の LoRA アダプター（Phase2から継続学習）から作成した マージ済み FP16 モデルを提供します。本リポジトリには マージ済みのフルモデル重みが含まれているため、from_pretrained() で直接ロードできます。

学習の目的

このモデルは、構造化出力（JSON / YAML / XML / TOML / CSV）の精度向上を目的として学習されています。特に Phase3 では、TOMLのパース失敗（上書き/重複宣言/インラインテーブル等）を減らし、CSVの頑健性（ヘッダのみ出力、列数不一致など）を改善することに重点を置いています。

学習時の損失（Loss）は 最終アシスタント出力（completion）のみに適用され、プロンプト部分はマスク（labels=-100）されています。また、厳密にパース可能な出力を優先するため、データ整形時に <think>...</think> やコードフェンスを除去しています。

学習設定

ベースモデル: Qwen/Qwen3-4B-Instruct-2507
手法: QLoRA SFT (4-bit) → 推論用に FP16 へマージ
フレームワーク: transformers + trl + peft
出力: マージ済み FP16（safe_serialization）

学習フェーズ

Phase1（混合SFT）

python scripts/20_train_sft.py \
  --train_jsonl data/train.jsonl \
  --valid_jsonl data/valid.jsonl \
  --out_dir outputs/sft_phase1 \
  --max_seq_length 2048 \
  --epochs 1 \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 4 \
  --learning_rate 2e-4

Phase2（hard-onlyをPhase1から継続学習）

python scripts/20_train_sft.py \
  --train_jsonl data/phase2_train.jsonl \
  --valid_jsonl data/phase2_valid.jsonl \
  --resume_from outputs/sft_phase1 \
  --out_dir outputs/sft_phase2 \
  --max_seq_length 2048 \
  --epochs 1 \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --learning_rate 5e-5

Phase3（TOML/CSV特化をPhase2から継続学習）

python scripts/20_train_sft.py \
  --train_jsonl data/phase3_train_3000.jsonl \
  --valid_jsonl data/phase3_valid_3000.jsonl \
  --resume_from outputs/sft_phase2 \
  --out_dir outputs/sft_phase3 \
  --max_seq_length 2048 \
  --epochs 2 \
  --per_device_train_batch_size 2 \
  --gradient_accumulation_steps 16 \
  --learning_rate 1e-5

使い方

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

ソースおよびライセンス（重要）

ベースモデル: Qwen/Qwen3-4B-Instruct-2507（ベースモデルの利用規約に従ってください）
学習データ（例/混合）:
- daichira/structured-5k-mix-sft
- daichira/structured-hard-sft-4k
- u-10bei/structured_data_with_cot_dataset_512_v5（出力部のみ抽出して利用）
Phase3データは TOML/CSV にフィルタし、危険パターン（例：TOMLインラインテーブル、過度に長い出力など）を除外した上で、学習用にサブセット（train 3000例）をサンプリングして作成しています。
データセットライセンス: 各データセットのライセンスおよび帰属要件に従ってください。
遵守事項: 利用者は、データセットの帰属表記（クレジット）要件、およびベースモデルの元の利用規約の両方を遵守する必要があります。

Downloads last month: 10

Safetensors

Model size

4B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daisuke-hoshina/qwen3-4b-structured-output-phase3-merged

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1537)

this model