matsuo-llm-2025-qwen3-4b-sft-v6a-merged

This repository provides a merged (16-bit) fine-tuned model based on Qwen/Qwen3-4B-Instruct-2507, trained using QLoRA (4-bit, Unsloth).

This is a fully merged model. No separate base model loading is required.

Training Objective

This model is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit) → merged to 16-bit
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-5
  • LoRA: r=128, alpha=256, dropout=0
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Batch size: 2 (per device) × 8 (gradient accumulation)
  • Warmup ratio: 0.1
  • Weight decay: 0.05
  • Scheduler: cosine
  • Precision: BF16
  • CoT masking: enabled (learn from marker)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "karaage0703/matsuo-llm-2025-qwen3-4b-sft-v6a-merged"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & License (IMPORTANT)

  • Training Data: u-10bei/structured_data_with_cot_dataset_512_v4 (4,608 samples)
  • Dataset License: MIT License. This dataset is used and can be redistributed under the terms of the MIT license.
  • Compliance: Users must comply with both the dataset's attribution requirements and the base model's original terms of use.

This model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support