matsuo-llm-2025-qwen3-4b-sft-v6a-merged

This repository provides a merged (16-bit) fine-tuned model based on Qwen/Qwen3-4B-Instruct-2507, trained using QLoRA (4-bit, Unsloth).

This is a fully merged model. No separate base model loading is required.

Training Objective

This model is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit) → merged to 16-bit
Max sequence length: 2048
Epochs: 2
Learning rate: 2e-5
LoRA: r=128, alpha=256, dropout=0
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch size: 2 (per device) × 8 (gradient accumulation)
Warmup ratio: 0.1
Weight decay: 0.05
Scheduler: cosine
Precision: BF16
CoT masking: enabled (learn from marker)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "karaage0703/matsuo-llm-2025-qwen3-4b-sft-v6a-merged"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & License (IMPORTANT)

Training Data: u-10bei/structured_data_with_cot_dataset_512_v4 (4,608 samples)
Dataset License: MIT License. This dataset is used and can be redistributed under the terms of the MIT license.
Compliance: Users must comply with both the dataset's attribution requirements and the base model's original terms of use.

This model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16