qwen3-4b-cell21-sft-lora-todzpp7v-20260219
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using SFT + QLoRA (4-bit, Unsloth) in the Cell21 sweep workflow.
This repository contains LoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve structured-output quality for conversion / extraction tasks across JSON / YAML / XML / TOML / CSV style outputs.
Following the standard SFT notebook design, training applies loss to assistant output, and masks intermediate reasoning segments where configured in the training pipeline.
Training Configuration (Selected Run: todzpp7v)
- Base model:
Qwen/Qwen3-4B-Instruct-2507 - Dataset:
u-10bei/structured_data_with_cot_dataset_512_v2 - Method:
QLoRA (4-bit) - Max sequence length:
512 - Epochs:
1 - per_device_train_bs:
4 - per_device_eval_bs:
2 - grad_accum:
4 - learning_rate:
5.952370361506472e-06 - warmup_ratio:
0.11578683981780603 - weight_decay:
0.008896168851922471 - LoRA:
r=32, alpha=192, dropout=0 - Best eval_loss (W&B summary snapshot):
1.0048444271087646
Validation Snapshot (internal checker on public_150.json)
- parse_success_rate:
0.32 - strict_success_rate:
0.10666666666666667 - requirement_coverage_rate:
0.2 - conversion_schema_f1_mean:
0.1365028769545505
Note: these are internal diagnostics from this repository and are not the official competition score.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Reproducibility Notes
- Standard training notebook reference:
2025最終課題メインコンペ_標準コード1(SFT).ipynb - Docker operation references:
README.md,DOCKER_RUN.md - Runtime used in this repository baseline: Python 3.12 / torch 2.8.0 / transformers 4.56.2 / trl 0.24.0 / unsloth 2025.12.7
License and Compliance
- This repo card declares adapter repository license as
apache-2.0. - Please also comply with:
- Base model license/terms:
Qwen/Qwen3-4B-Instruct-2507 - Dataset terms for
u-10bei/structured_data_with_cot_dataset_512_v2
- Base model license/terms:
Limitations
- This model is specialized for structured output tasks and may degrade in open-domain chat.
- Format strictness can still fail on specific schemas or long constraints.
- Always validate outputs before production use.
- Downloads last month
- 240
Model tree for feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219
Base model
Qwen/Qwen3-4B-Instruct-2507