Qwen2.5-0.5B-SFT-OpenHermes-2.5-100

This model is a fine-tuned version of Qwen/Qwen2.5-0.5B trained on the teknium/OpenHermes-2.5 dataset using SFT with LoRA adapters.

Overview

Qwen2.5-0.5B-SFT-OpenHermes-2.5-100 is a language model optimized using SFT. Supervised Fine-Tuning (SFT) trains the model to follow instructions by learning from high-quality demonstration data.

Key Features

  • High-Quality Fine-Tuning: Trained on N/A carefully curated examples
  • Efficient Training: Uses LoRA (Low-Rank Adaptation) with 4-bit quantization
  • Strong Performance: Achieves N/A token accuracy on evaluation set
  • Optimized for Inference: Available in multiple formats including GGUF quantizations

Model Details

Property Value
Developed by ermiaazarkhalili
License CC-BY-NC-4.0
Language English
Base Model Qwen/Qwen2.5-0.5B
Model Size 0.5B parameters
Tensor Type BF16
Context Length 2,048 tokens
Training Method SFT with LoRA

Training Information

Training Configuration

Parameter Value
Learning Rate 0.0002
Batch Size 2 per device
Gradient Accumulation Steps 8
Effective Batch Size 16
Number of Epochs 1
Max Sequence Length 2,048 tokens
LR Scheduler Linear warmup + Cosine annealing
Warmup Ratio 0.1
Precision BF16 mixed precision
Gradient Checkpointing Enabled
Random Seed 42

LoRA Configuration

Parameter Value
LoRA Rank (r) 32
LoRA Alpha 64
LoRA Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Quantization 4-bit NF4

Training Metrics

Metric Value
Hardware NVIDIA H100 MIG

Dataset

This model was trained on the teknium/OpenHermes-2.5 dataset.

Split Samples
Training N/A
Evaluation N/A

Usage

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is the sum of 2 + 2?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
print(response)

Using Pipeline

from transformers import pipeline

generator = pipeline("text-generation", model="ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100", device_map="auto")
messages = [{"role": "user", "content": "Explain the concept of machine learning."}]
output = generator(messages, max_new_tokens=256, return_full_text=False)
print(output[0]["generated_text"])

4-bit Quantized Inference

from transformers import AutoModelForCausalLM, BitsAndBytesConfig
import torch

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100",
    quantization_config=quantization_config,
    device_map="auto"
)

GGUF Versions

For CPU or mixed CPU/GPU inference, GGUF quantized versions are available at: ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100-GGUF

Using with Ollama

ollama pull hf.co/ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100-GGUF:Q4_K_M
ollama run hf.co/ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100-GGUF:Q4_K_M "Hello!"

Limitations

  • Language: Primarily trained on English data
  • Knowledge Cutoff: Limited to base model's training data cutoff
  • Hallucinations: May generate plausible-sounding but incorrect information
  • Context Length: Fine-tuned with 2,048 token limit
  • Safety: Not extensively safety-tuned; use with appropriate guardrails

Intended Use

Recommended Uses

  • Research on language model fine-tuning
  • Educational purposes
  • Personal projects
  • Prototyping conversational AI

Out-of-Scope Uses

  • Production systems without additional safety measures
  • Medical, legal, or financial advice
  • Generating harmful or misleading content

Training Framework

  • TRL: 0.26.2
  • Transformers: 4.57.3
  • PyTorch: 2.7.1
  • Datasets: 4.4.2
  • PEFT: 0.18.0
  • BitsAndBytes: 0.49.0

Citation

@misc{ermiaazarkhalili_qwen2.5_0.5b_sft_openhermes_2.5_100,
    author = {ermiaazarkhalili},
    title = {Qwen2.5-0.5B-SFT-OpenHermes-2.5-100: Fine-tuned Qwen2.5-0.5B on OpenHermes-2.5},
    year = {2025},
    publisher = {Hugging Face},
    howpublished = {\url{https://huggingface.co/ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100}}
}

Acknowledgments

  • Base model developers at Qwen
  • Hugging Face TRL Team for the training library
  • Dataset creators and contributors
  • Compute Canada / DRAC for HPC resources

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100

Adapter
(380)
this model
Quantizations
1 model

Dataset used to train ermiaazarkhalili/Qwen2.5-0.5B-SFT-OpenHermes-2.5-100