Qwen2.5-1.5B-Orca-BD3LM-SFT
Model Description
This is a Block Diffusion Language Model (BD3LM) fine-tuned on Vietnamese Intel Orca dataset for instruction following and question answering tasks. The model is based on Qwen2.5-1.5B architecture with BD3LM diffusion approach.
Training Details
- Base Model: ChaosAiVision/Qwen2.5-1.5B-ddlm-bd3lm-pretrain-5550-vi
- Training Method: BD3LM (Block Diffusion Language Model) - SFT
- Dataset: Vietnamese Intel Orca (5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated)
- Training Samples: 11,862 instruction-response pairs
- Training Epochs: 3 epochs
- Max Length: 1024 tokens
- Block Size: 32 tokens
- Batch Size: 2 per device × 4 gradient accumulation = 8 effective batch size
- Learning Rate: 1e-4
- Framework: dLLM (Diffusion Language Model Library)
Model Architecture
- Architecture: A2D-Qwen2 (Autoregressive to Diffusion) with BD3LM
- Hidden Size: 1536
- Num Layers: 28
- Num Attention Heads: 12
- Num KV Heads: 2
- Intermediate Size: 8960
- Vocab Size: 151,936
Dataset Format
The model was trained on Vietnamese instruction-following data with:
- System prompts (
system_vi): Task instructions - Questions (
question_vi): User queries - Answers (
chosen_vi): Expected responses
Usage
import torch
import dllm
from transformers import AutoTokenizer
model_name = "ChaosAiVision/qwen2.5-1.5b-orca-bd3lm-sft-orca"
# Load model and tokenizer
model_args = type("Args", (), {"model_name_or_path": model_name})()
model = dllm.utils.get_model(model_args=model_args).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Setup BD3LM sampler
sampler = dllm.core.samplers.BD3LMSampler(model=model, tokenizer=tokenizer)
sampler_config = dllm.core.samplers.BD3LMSamplerConfig(
steps=128,
max_new_tokens=512,
temperature=0.0,
block_size=32
)
# Prepare messages
messages = [
{"role": "system", "content": "Bạn là má»™t trợ lý AI hữu Ãch."},
{"role": "user", "content": "Thủ đô của Việt Nam là gì?"}
]
# Generate
prompt_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).cuda()
output = sampler.sample(inputs=[prompt_ids[0]], config=sampler_config)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)
Training Process
- Pretraining: Model was first pretrained on Vietnamese Wikipedia (50K samples, 5500 steps)
- SFT: Then fine-tuned on Vietnamese Intel Orca dataset (11,862 samples, 3 epochs)
Limitations
- The model may generate repetitive text in some cases
- Performance depends on inference parameters (steps, temperature, block_size)
- Best results with steps >= 128 and appropriate temperature settings
License
Apache 2.0
Citation
@misc{qwen25-bd3lm-orca-sft,
title={Qwen2.5-1.5B BD3LM Vietnamese Orca SFT},
author={ChaosAiVision},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/ChaosAiVision/qwen2.5-1.5b-orca-bd3lm-sft-orca}}
}
- Downloads last month
- 4
Model tree for ChaosAIVision/qwen2.5-1.5b-orca-bd3lm-sft-orca
Base model
Qwen/Qwen2.5-1.5B