<# Qwen3-4B-Instruct-2507-lora001>

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth + PEFT (LoRA).

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). with an emphasis on returning only the requested structured output (no explanations, no markdown fences).

Key design choices implemented in the training code:

  • Assistant-only loss: loss is applied only to the final assistant output tokens, while the prompt/context is provided as input.
  • CoT masking (optional): when enabled, training can ignore intermediate reasoning and apply loss only after markers such as Output: / Final: / Answer:.

In addition, the training pipeline applies output cleaning and format-biased augmentation:

  • Removes <think>...</think> blocks when present
  • Strips markdown fences and leading “explanation” lines
  • Cuts to the first likely structure start ({, [, <, YAML/TOML-like starts)
  • Adds TOML-focused augmentation (strict instruction copies + TOML repair tasks)

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (PEFT) with Unsloth loader
  • Max sequence length: 1536
  • Epochs: 2
  • Learning rate: 5e-05
  • LoRA: r=128, alpha=256
  • LoRA: r=128 (SFT_LORA_R), alpha=256 (SFT_LORA_ALPHA), dropout=0.05 (SFT_LORA_DROPOUT)
  • LoRA target modules (default):
    • q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj (SFT_LORA_TARGET_MODULES)
  • Eval/Save: every 200 steps (SFT_EVAL_STEPS, SFT_SAVE_STEPS)

Notes:

  • The script currently loads the base model with load_in_4bit=False (i.e., not QLoRA 4-bit by default).
  • BF16 is enabled in TrainingArguments (bf16=True).

Data / Split

  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v5
  • Validation split: 0.03
  • Seed: 3407

Batch / Steps

  • Train batch size (per device): 6
  • Eval batch size (per device): 8
  • Gradient accumulation steps: 4
  • Effective batch size: 24 × (number_of_gpus) (effective = per_device_train_bs × grad_accum × num_gpus)
  • Max steps: -1 (epoch-based)
  • Logging steps: 20
  • Eval strategy: steps (eval_steps=200)
  • Save strategy: steps (save_steps=200, save_total_limit=6)

Optimization

  • LR scheduler: cosine
  • Warmup ratio: 0.03
  • Weight decay: 0.01

CoT masking (Output-only supervision)

  • mask_cot: enabled (SFT_MASK_COT=1)
  • output_markers: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
  • output_learn_mode: after_marker
  • upsampling: disabled (SFT_USE_UPSAMPLING=0)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

messages = [
    {
        "role": "user",
        "content": (
            "IMPORTANT:
"
            "- Output ONLY the requested TOML.
"
            "- No explanations, no markdown fences.
"
            "- Ensure the output parses correctly.

"
            "Task: Create a TOML config with title='demo' and ports=[8000,8001].

"
            "Output:
"
        )
    }
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        temperature=0.0,
    )

print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v5

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month
246
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kabuizuchi-trading/qwen3-4b-lora-structured

Adapter
(5268)
this model
Finetunes
1 model

Dataset used to train kabuizuchi-trading/qwen3-4b-lora-structured