Qwen2.5-1.5B-Orca-BD3LM-SFT

Model Description

This is a Block Diffusion Language Model (BD3LM) fine-tuned on Vietnamese Intel Orca dataset for instruction following and question answering tasks. The model is based on Qwen2.5-1.5B architecture with BD3LM diffusion approach.

Training Details

Base Model: ChaosAiVision/Qwen2.5-1.5B-ddlm-bd3lm-pretrain-5550-vi
Training Method: BD3LM (Block Diffusion Language Model) - SFT
Dataset: Vietnamese Intel Orca (5CD-AI/Vietnamese-Intel-orca_dpo_pairs-gg-translated)
Training Samples: 11,862 instruction-response pairs
Training Epochs: 3 epochs
Max Length: 1024 tokens
Block Size: 32 tokens
Batch Size: 2 per device × 4 gradient accumulation = 8 effective batch size
Learning Rate: 1e-4
Framework: dLLM (Diffusion Language Model Library)

Model Architecture

Architecture: A2D-Qwen2 (Autoregressive to Diffusion) with BD3LM
Hidden Size: 1536
Num Layers: 28
Num Attention Heads: 12
Num KV Heads: 2
Intermediate Size: 8960
Vocab Size: 151,936

Dataset Format

The model was trained on Vietnamese instruction-following data with:

System prompts (system_vi): Task instructions
Questions (question_vi): User queries
Answers (chosen_vi): Expected responses

Usage

import torch
import dllm
from transformers import AutoTokenizer

model_name = "ChaosAiVision/qwen2.5-1.5b-orca-bd3lm-sft-orca"

# Load model and tokenizer
model_args = type("Args", (), {"model_name_or_path": model_name})()
model = dllm.utils.get_model(model_args=model_args).eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Setup BD3LM sampler
sampler = dllm.core.samplers.BD3LMSampler(model=model, tokenizer=tokenizer)
sampler_config = dllm.core.samplers.BD3LMSamplerConfig(
    steps=128,
    max_new_tokens=512,
    temperature=0.0,
    block_size=32
)

# Prepare messages
messages = [
    {"role": "system", "content": "Bạn là một trợ lý AI hữu ích."},
    {"role": "user", "content": "Thủ đô của Việt Nam là gì?"}
]

# Generate
prompt_ids = tokenizer.apply_chat_template(
    messages, 
    tokenize=True, 
    add_generation_prompt=True, 
    return_tensors="pt"
).cuda()

output = sampler.sample(inputs=[prompt_ids[0]], config=sampler_config)
response = tokenizer.decode(output[0], skip_special_tokens=True)
print(response)

Training Process

Pretraining: Model was first pretrained on Vietnamese Wikipedia (50K samples, 5500 steps)
SFT: Then fine-tuned on Vietnamese Intel Orca dataset (11,862 samples, 3 epochs)

Limitations

The model may generate repetitive text in some cases
Performance depends on inference parameters (steps, temperature, block_size)
Best results with steps >= 128 and appropriate temperature settings

License

Apache 2.0

Citation

@misc{qwen25-bd3lm-orca-sft,
  title={Qwen2.5-1.5B BD3LM Vietnamese Orca SFT},
  author={ChaosAiVision},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/ChaosAiVision/qwen2.5-1.5b-orca-bd3lm-sft-orca}}
}

Downloads last month: 4

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for ChaosAIVision/qwen2.5-1.5b-orca-bd3lm-sft-orca

Base model

Qwen/Qwen2.5-1.5B

Finetuned

ChaosAIVision/Qwen2.5-1.5B-ddlm-bd3lm-pretrain-5550-vi