qwen3-4b-cell21-sft-lora-todzpp7v-20260219

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using SFT + QLoRA (4-bit, Unsloth) in the Cell21 sweep workflow.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured-output quality for conversion / extraction tasks across JSON / YAML / XML / TOML / CSV style outputs.

Following the standard SFT notebook design, training applies loss to assistant output, and masks intermediate reasoning segments where configured in the training pipeline.

Training Configuration (Selected Run: todzpp7v)

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
  • Method: QLoRA (4-bit)
  • Max sequence length: 512
  • Epochs: 1
  • per_device_train_bs: 4
  • per_device_eval_bs: 2
  • grad_accum: 4
  • learning_rate: 5.952370361506472e-06
  • warmup_ratio: 0.11578683981780603
  • weight_decay: 0.008896168851922471
  • LoRA: r=32, alpha=192, dropout=0
  • Best eval_loss (W&B summary snapshot): 1.0048444271087646

Validation Snapshot (internal checker on public_150.json)

  • parse_success_rate: 0.32
  • strict_success_rate: 0.10666666666666667
  • requirement_coverage_rate: 0.2
  • conversion_schema_f1_mean: 0.1365028769545505

Note: these are internal diagnostics from this repository and are not the official competition score.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Reproducibility Notes

  • Standard training notebook reference: 2025最終課題メインコンペ_標準コード1(SFT).ipynb
  • Docker operation references: README.md, DOCKER_RUN.md
  • Runtime used in this repository baseline: Python 3.12 / torch 2.8.0 / transformers 4.56.2 / trl 0.24.0 / unsloth 2025.12.7

License and Compliance

  • This repo card declares adapter repository license as apache-2.0.
  • Please also comply with:
    • Base model license/terms: Qwen/Qwen3-4B-Instruct-2507
    • Dataset terms for u-10bei/structured_data_with_cot_dataset_512_v2

Limitations

  • This model is specialized for structured output tasks and may degrade in open-domain chat.
  • Format strictness can still fail on specific schemas or long constraints.
  • Always validate outputs before production use.
Downloads last month
240
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219

Adapter
(5273)
this model

Dataset used to train feel7jp/qwen3-4b-cell21-sft-lora-todzpp7v-20260219