<# Qwen3-4B-Instruct-2507-lora001>
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth + PEFT (LoRA).
This repository contains LoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). with an emphasis on returning only the requested structured output (no explanations, no markdown fences).
Key design choices implemented in the training code:
- Assistant-only loss: loss is applied only to the final assistant output tokens, while the prompt/context is provided as input.
- CoT masking (optional): when enabled, training can ignore intermediate reasoning and apply loss only after markers such as
Output:/Final:/Answer:.
In addition, the training pipeline applies output cleaning and format-biased augmentation:
- Removes
<think>...</think>blocks when present - Strips markdown fences and leading “explanation” lines
- Cuts to the first likely structure start (
{,[,<, YAML/TOML-like starts) - Adds TOML-focused augmentation (strict instruction copies + TOML repair tasks)
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (PEFT) with Unsloth loader
- Max sequence length: 1536
- Epochs: 2
- Learning rate: 5e-05
- LoRA: r=128, alpha=256
- LoRA:
r=128(SFT_LORA_R),alpha=256(SFT_LORA_ALPHA),dropout=0.05(SFT_LORA_DROPOUT) - LoRA target modules (default):
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj(SFT_LORA_TARGET_MODULES)
- Eval/Save: every
200steps (SFT_EVAL_STEPS,SFT_SAVE_STEPS)
Notes:
- The script currently loads the base model with
load_in_4bit=False(i.e., not QLoRA 4-bit by default). - BF16 is enabled in TrainingArguments (
bf16=True).
Data / Split
- Dataset: u-10bei/structured_data_with_cot_dataset_512_v5
- Validation split: 0.03
- Seed: 3407
Batch / Steps
- Train batch size (per device): 6
- Eval batch size (per device): 8
- Gradient accumulation steps: 4
- Effective batch size: 24 × (number_of_gpus) (effective = per_device_train_bs × grad_accum × num_gpus)
- Max steps: -1 (epoch-based)
- Logging steps: 20
- Eval strategy: steps (eval_steps=200)
- Save strategy: steps (save_steps=200, save_total_limit=6)
Optimization
- LR scheduler: cosine
- Warmup ratio: 0.03
- Weight decay: 0.01
CoT masking (Output-only supervision)
- mask_cot: enabled (SFT_MASK_COT=1)
- output_markers: Output:, OUTPUT:, Final:, Answer:, Result:, Response:
- output_learn_mode: after_marker
- upsampling: disabled (SFT_USE_UPSAMPLING=0)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
messages = [
{
"role": "user",
"content": (
"IMPORTANT:
"
"- Output ONLY the requested TOML.
"
"- No explanations, no markdown fences.
"
"- Ensure the output parses correctly.
"
"Task: Create a TOML config with title='demo' and ports=[8000,8001].
"
"Output:
"
)
}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
temperature=0.0,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
Sources & Terms (IMPORTANT)
Training data: u-10bei/structured_data_with_cot_dataset_512_v5
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
- Downloads last month
- 246