qwen3-4b-sft-dpo-no-cot-merged-dataclearn1

This model is a high-performance, full-merged 16-bit version of Qwen3-4B, specifically optimized for strict structured data generation (JSON, YAML, CSV, etc.). It was developed for the Matsuo Lab LLM Competition with a focus on eliminating conversational noise and maximizing format compliance.

🛠 Strategic Training Pipeline

This model deviates from standard tuning by applying rigorous data cleaning to both SFT and DPO stages:

1. Supervised Fine-Tuning (SFT)

Objective: Direct Mapping from User Query to Structured Data.
Data Cleaning:
- System Prompts Removed: Matches inference environments where system roles are unavailable.
- CoT Physically Removed: Reasoning traces (Chain-of-Thought) were deleted to force the model to output the final answer immediately, reducing token waste and parse errors.
Dataset: u-10bei/structured_data_with_cot_dataset_512_v2

2. Direct Preference Optimization (DPO)

Objective: Refine output quality and format adherence through preference learning.
Data Cleaning: Both Chosen and Rejected pairs were stripped of CoT and System prompts. This ensures the model learns the preference based strictly on the content and validity of the structured data itself.
Dataset: u-10bei/dpo-dataset-qwen-cot-dataclearn2
Beta: 0.4

🚀 Model Status: Fully Merged

This repository contains the complete 16-bit weights. No adapters are required. Optimized for immediate response without "Here is the JSON..." or other filler text.

📈 Configuration

Learning rate: 1e-07
Epochs: 1
Max length: 2048
LoRA Config: r=64, alpha=64

💻 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mark-22/dpo-qwen-cot-merged-dataclearn3"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto"
)

# The model outputs the data immediately after the user prompt.
messages = [
    {"role": "user", "content": "Convert to JSON: name: Alice, age: 30"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: 7

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for mark-22/dpo-qwen-cot-merged-dataclearn3

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1536)

this model

mark-22
/

dpo-qwen-cot-merged-dataclearn3