qwen3-4b-sft-dpo-no-cot-merged-dataclearn1

This model is a high-performance, full-merged 16-bit version of Qwen3-4B, specifically optimized for strict structured data generation (JSON, YAML, CSV, etc.). It was developed for the Matsuo Lab LLM Competition with a focus on eliminating conversational noise and maximizing format compliance.

🛠 Strategic Training Pipeline

This model deviates from standard tuning by applying rigorous data cleaning to both SFT and DPO stages:

1. Supervised Fine-Tuning (SFT)

  • Objective: Direct Mapping from User Query to Structured Data.
  • Data Cleaning:
    • System Prompts Removed: Matches inference environments where system roles are unavailable.
    • CoT Physically Removed: Reasoning traces (Chain-of-Thought) were deleted to force the model to output the final answer immediately, reducing token waste and parse errors.
  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v2

2. Direct Preference Optimization (DPO)

  • Objective: Refine output quality and format adherence through preference learning.
  • Data Cleaning: Both Chosen and Rejected pairs were stripped of CoT and System prompts. This ensures the model learns the preference based strictly on the content and validity of the structured data itself.
  • Dataset: u-10bei/dpo-dataset-qwen-cot-dataclearn2
  • Beta: 0.4

🚀 Model Status: Fully Merged

This repository contains the complete 16-bit weights. No adapters are required. Optimized for immediate response without "Here is the JSON..." or other filler text.

📈 Configuration

  • Learning rate: 1e-07
  • Epochs: 1
  • Max length: 2048
  • LoRA Config: r=64, alpha=64

💻 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "mark-22/dpo-qwen-cot-merged-dataclearn3"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
    device_map="auto"
)

# The model outputs the data immediately after the user prompt.
messages = [
    {"role": "user", "content": "Convert to JSON: name: Alice, age: 30"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
7
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mark-22/dpo-qwen-cot-merged-dataclearn3

Finetuned
(1536)
this model

Dataset used to train mark-22/dpo-qwen-cot-merged-dataclearn3