qwen3-4b-cell21-sft-lora-todzpp7v-20260219

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using SFT + QLoRA (4-bit, Unsloth) in the Cell21 sweep workflow.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured-output quality for conversion / extraction tasks across JSON / YAML / XML / TOML / CSV style outputs.

Following the standard SFT notebook design, training applies loss to assistant output, and masks intermediate reasoning segments where configured in the training pipeline.

Training Configuration (Selected Run: `todzpp7v`)

Base model: Qwen/Qwen3-4B-Instruct-2507
Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
Method: QLoRA (4-bit)
Max sequence length: 512
Epochs: 1
per_device_train_bs: 4
per_device_eval_bs: 2
grad_accum: 4
learning_rate: 5.952370361506472e-06
warmup_ratio: 0.11578683981780603
weight_decay: 0.008896168851922471
LoRA: r=32, alpha=192, dropout=0
Best eval_loss (W&B summary snapshot): 1.0048444271087646

Validation Snapshot (internal checker on `public_150.json`)

parse_success_rate: 0.32
strict_success_rate: 0.10666666666666667
requirement_coverage_rate: 0.2
conversion_schema_f1_mean: 0.1365028769545505

Note: these are internal diagnostics from this repository and are not the official competition score.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Reproducibility Notes

Standard training notebook reference: 2025最終課題メインコンペ_標準コード1（SFT）.ipynb
Docker operation references: README.md, DOCKER_RUN.md
Runtime used in this repository baseline: Python 3.12 / torch 2.8.0 / transformers 4.56.2 / trl 0.24.0 / unsloth 2025.12.7

License and Compliance

This repo card declares adapter repository license as apache-2.0.
Please also comply with:
- Base model license/terms: Qwen/Qwen3-4B-Instruct-2507
- Dataset terms for u-10bei/structured_data_with_cot_dataset_512_v2

Limitations

This model is specialized for structured output tasks and may degrade in open-domain chat.
Format strictness can still fail on specific schemas or long constraints.
Always validate outputs before production use.

Downloads last month: 240

Model tree for feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5273)

this model

feel7jp
/

qwen3-4b-cell21-sft-lora-todzpp7v-20260219

qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Training Objective

Training Configuration (Selected Run: `todzpp7v`)

Validation Snapshot (internal checker on `public_150.json`)

Usage

Reproducibility Notes

License and Compliance

Limitations

Model tree for feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Dataset used to train feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Training Objective

Training Configuration (Selected Run: todzpp7v)

Validation Snapshot (internal checker on public_150.json)

Usage

Reproducibility Notes

License and Compliance

Limitations

Model tree for feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Dataset used to train feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Training Configuration (Selected Run: `todzpp7v`)

Validation Snapshot (internal checker on `public_150.json`)