qwen3-4b-sft-dpo-no-cot-merged-dataclearn1
This model is a high-performance, full-merged 16-bit version of Qwen3-4B, specifically optimized for strict structured data generation (JSON, YAML, CSV, etc.). It was developed for the Matsuo Lab LLM Competition with a focus on eliminating conversational noise and maximizing format compliance.
🛠Strategic Training Pipeline
This model deviates from standard tuning by applying rigorous data cleaning to both SFT and DPO stages:
1. Supervised Fine-Tuning (SFT)
- Objective: Direct Mapping from User Query to Structured Data.
- Data Cleaning:
- System Prompts Removed: Matches inference environments where system roles are unavailable.
- CoT Physically Removed: Reasoning traces (Chain-of-Thought) were deleted to force the model to output the final answer immediately, reducing token waste and parse errors.
- Dataset:
u-10bei/structured_data_with_cot_dataset_512_v2
2. Direct Preference Optimization (DPO)
- Objective: Refine output quality and format adherence through preference learning.
- Data Cleaning: Both Chosen and Rejected pairs were stripped of CoT and System prompts. This ensures the model learns the preference based strictly on the content and validity of the structured data itself.
- Dataset:
u-10bei/dpo-dataset-qwen-cot-dataclearn2 - Beta: 0.4
🚀 Model Status: Fully Merged
This repository contains the complete 16-bit weights. No adapters are required. Optimized for immediate response without "Here is the JSON..." or other filler text.
📈 Configuration
- Learning rate: 1e-07
- Epochs: 1
- Max length: 2048
- LoRA Config: r=64, alpha=64
💻 Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "mark-22/dpo-qwen-cot-merged-dataclearn3"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16,
device_map="auto"
)
# The model outputs the data immediately after the user prompt.
messages = [
{"role": "user", "content": "Convert to JSON: name: Alice, age: 30"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 7
Model tree for mark-22/dpo-qwen-cot-merged-dataclearn3
Base model
Qwen/Qwen3-4B-Instruct-2507