Qwen3-4BD: Initial.

a044b60 17 days ago

25.5 kB

	---
	base_model:
	- Qwen/Qwen3-4B

	tags:
	- distillation
	- distilled
	- sft
	- peft
	- qwen3

	datasets:
	- ianncity/KIMI-K2.5-550000x
	- Jackrong/Qwen3.5-reasoning-700x
	- nohurry/Opus-4.6-Reasoning-3000x-filtered
	- TeichAI/claude-4.5-opus-high-reasoning-250x
	- TeichAI/gemini-3-pro-preview-high-reasoning-250x
	- TeichAI/claude-haiku-4.5-high-reasoning-1700x
	- TeichAI/gpt-5.2-high-reasoning-250x
	- Roman1111111/gemini-3.1-pro-hard-high-reasoning
	- Jackrong/glm-4.7-multiturn-CoT
	- bmeyer2025/glm5-reasoning-traces
	- TeichAI/claude-sonnet-4.5-high-reasoning-250x
	- TeichAI/deepseek-v3.2-speciale-openr1-math-3k
	- TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k
	- TeichAI/deepseek-v3.2-speciale-1000x
	- TeichAI/gpt-5-codex-1000x

	model-index:
	- name: hadadxyz/Qwen3-4B-Diversity
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu
	type: cais/mmlu
	metrics:
	- type: acc
	value: 67.8
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Humanities
	type: cais/mmlu
	metrics:
	- type: acc
	value: 57.9
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Formal Logic
	type: cais/mmlu
	metrics:
	- type: acc
	value: 58.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School European History
	type: cais/mmlu
	metrics:
	- type: acc
	value: 78.2
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Us History
	type: cais/mmlu
	metrics:
	- type: acc
	value: 84.8
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School World History
	type: cais/mmlu
	metrics:
	- type: acc
	value: 83.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu International Law
	type: cais/mmlu
	metrics:
	- type: acc
	value: 77.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Jurisprudence
	type: cais/mmlu
	metrics:
	- type: acc
	value: 78.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Logical Fallacies
	type: cais/mmlu
	metrics:
	- type: acc
	value: 82.8
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Moral Disputes
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Moral Scenarios
	type: cais/mmlu
	metrics:
	- type: acc
	value: 28.4
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Philosophy
	type: cais/mmlu
	metrics:
	- type: acc
	value: 73.3
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Prehistory
	type: cais/mmlu
	metrics:
	- type: acc
	value: 76.2
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Professional Law
	type: cais/mmlu
	metrics:
	- type: acc
	value: 47.4
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu World Religions
	type: cais/mmlu
	metrics:
	- type: acc
	value: 78.4
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Other
	type: cais/mmlu
	metrics:
	- type: acc
	value: 72.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Business Ethics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 73.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Clinical Knowledge
	type: cais/mmlu
	metrics:
	- type: acc
	value: 75.5
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Medicine
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Global Facts
	type: cais/mmlu
	metrics:
	- type: acc
	value: 41.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Human Aging
	type: cais/mmlu
	metrics:
	- type: acc
	value: 67.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Management
	type: cais/mmlu
	metrics:
	- type: acc
	value: 84.5
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Marketing
	type: cais/mmlu
	metrics:
	- type: acc
	value: 85.5
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Medical Genetics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 75.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Miscellaneous
	type: cais/mmlu
	metrics:
	- type: acc
	value: 79.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Nutrition
	type: cais/mmlu
	metrics:
	- type: acc
	value: 74.8
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Professional Accounting
	type: cais/mmlu
	metrics:
	- type: acc
	value: 55.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Professional Medicine
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Virology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 53.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Social Sciences
	type: cais/mmlu
	metrics:
	- type: acc
	value: 78.4
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Econometrics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 64.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Geography
	type: cais/mmlu
	metrics:
	- type: acc
	value: 84.3
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Government And Politics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 87.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Macroeconomics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 74.6
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Microeconomics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 80.7
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Psychology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 87.2
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Human Sexuality
	type: cais/mmlu
	metrics:
	- type: acc
	value: 75.6
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Professional Psychology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.2
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Public Relations
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.8
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Security Studies
	type: cais/mmlu
	metrics:
	- type: acc
	value: 74.3
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Sociology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 84.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Us Foreign Policy
	type: cais/mmlu
	metrics:
	- type: acc
	value: 81.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Stem
	type: cais/mmlu
	metrics:
	- type: acc
	value: 68.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Abstract Algebra
	type: cais/mmlu
	metrics:
	- type: acc
	value: 45.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Anatomy
	type: cais/mmlu
	metrics:
	- type: acc
	value: 61.5
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Astronomy
	type: cais/mmlu
	metrics:
	- type: acc
	value: 78.9
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Biology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 83.3
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Chemistry
	type: cais/mmlu
	metrics:
	- type: acc
	value: 54.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Computer Science
	type: cais/mmlu
	metrics:
	- type: acc
	value: 69.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Mathematics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 58.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu College Physics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 53.9
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Computer Security
	type: cais/mmlu
	metrics:
	- type: acc
	value: 80.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Conceptual Physics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 77.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Electrical Engineering
	type: cais/mmlu
	metrics:
	- type: acc
	value: 76.6
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Elementary Mathematics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 65.6
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Biology
	type: cais/mmlu
	metrics:
	- type: acc
	value: 86.1
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Chemistry
	type: cais/mmlu
	metrics:
	- type: acc
	value: 70.4
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Computer Science
	type: cais/mmlu
	metrics:
	- type: acc
	value: 86.0
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Mathematics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 42.6
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Physics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 62.9
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu High School Statistics
	type: cais/mmlu
	metrics:
	- type: acc
	value: 71.3
	name: accuracy
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Mmlu Machine Learning
	type: cais/mmlu
	metrics:
	- type: acc
	value: 57.1
	name: accuracy

	pipeline_tag: text-generation

	library_name: transformers

	license: apache-2.0

	license_link: https://huggingface.co/hadadxyz/Qwen3-4B-Diversity/blob/main/LICENSE
	---

	# Introduction

	![MMLU](evaluations/mmlu.png)

	Qwen3-4B-Diversity is a fine-tuned language model based on [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) that has been trained on a diverse collection of high-quality reasoning datasets. This model combines knowledge distilled from various state-of-the-art AI systems to provide enhanced reasoning capabilities across multiple domains including mathematics, coding, general problem-solving, and multi-turn conversations.

	### Training Configuration

	The model was trained using supervised fine-tuning techniques with parameter-efficient methods to optimize performance while maintaining computational efficiency. Key training parameters include:

	\| Parameter \| Value \|
	\|------------------\|--------\|
	\| Number of Epochs \| 2 \|
	\| Context Length \| 40,960 \|

	### Hardware and Resources

	\| Resource \| Specification \|
	\|-------------------\|------------------------\|
	\| GPU \| A100-80GB \|
	\| Training Duration \| Approximately 17 hours \|
	\| Estimated Cost \| $27 to $30 \|

	### Training Data

	\| Dataset \| Rows Used \| Model \|
	\|--------------------------------------------------------------------------------------------------------------------------------------------\|------------\|------------------------------------\|
	\| [ianncity/KIMI-K2.5-550000x](https://huggingface.co/datasets/ianncity/KIMI-K2.5-550000x) (General-Distillation) \| 1,000 \| Kimi K2.5 \|
	\| [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x) \| 633 \| Qwen3.5 \|
	\| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) \| 2,326 \| Claude Opus 4.6 \|
	\| [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x) \| 250 \| Claude Opus 4.5 \|
	\| [TeichAI/gemini-3-pro-preview-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/gemini-3-pro-preview-high-reasoning-250x) \| 248 \| Gemini 3 Pro \|
	\| [TeichAI/claude-haiku-4.5-high-reasoning-1700x](https://huggingface.co/datasets/TeichAI/claude-haiku-4.5-high-reasoning-1700x) \| 1,688 \| Claude Haiku 4.5 \|
	\| [TeichAI/gpt-5.2-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/gpt-5.2-high-reasoning-250x) \| 249 \| GPT-5.2 \|
	\| [Roman1111111/gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning) \| 3,150 \| Gemini 3.1 Pro \|
	\| [Jackrong/glm-4.7-multiturn-CoT](https://huggingface.co/datasets/Jackrong/glm-4.7-multiturn-CoT) \| 5,090 \| GLM-4.7 \|
	\| [bmeyer2025/glm5-reasoning-traces](https://huggingface.co/datasets/bmeyer2025/glm5-reasoning-traces) \| 1,744 \| GLM-5 \|
	\| [TeichAI/claude-sonnet-4.5-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-sonnet-4.5-high-reasoning-250x) \| 247 \| Claude Sonnet 4.5 \|
	\| [TeichAI/deepseek-v3.2-speciale-openr1-math-3k](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-openr1-math-3k) \| 3,317 \| DeepSeek V3.2-Speciale \|
	\| [TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k) \| 2,953 \| DeepSeek V3.2-Speciale \|
	\| [TeichAI/deepseek-v3.2-speciale-1000x](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-1000x) \| 991 \| DeepSeek V3.2-Speciale \|
	\| [TeichAI/gpt-5-codex-1000x](https://huggingface.co/datasets/TeichAI/gpt-5-codex-1000x) \| 991 \| GPT-5 Codex \|
	\| Total \| 24,877 \| Combined diverse reasoning dataset \|

	## Model Capabilities

	This model excels in several key areas:

	1. Advanced Reasoning: The model can break down complex problems into steps and provide detailed reasoning processes.

	2. Mathematical Problem Solving: Enhanced capabilities for mathematical reasoning and problem-solving through dedicated math-focused datasets.

	3. Code Generation and Understanding: Improved coding abilities from multiple code-reasoning datasets including DeepSeek and GPT-5 Codex data.

	4. Multi-Turn Conversations: Better handling of extended dialogues and context-aware responses.

	5. Domain Versatility: Exposure to reasoning patterns from various AI systems provides flexibility across different domains and task types.

	## Usage

	### Quick Demo

	If you are looking for a quick demo that is completely free and without any cost, you can use [Google Colab](https://colab.research.google.com/drive/1qy1n9MigDuwT0cA1Y6ImHChAIlsZPIcC).

	### Ollama (Local)

	```bash
	# https://ollama.com/hadad/qwen3-4bd

	# hadad/qwen3-4bd:Q8_0 \| 4.3GB
	# hadad/qwen3-4bd:BF16 \| 8.1GB

	# ollama pull hadad/qwen3-4bd:Q8_0

	ollama run hadad/qwen3-4bd:Q8_0
	```

	If you are using Ollama and are interested in tools or function calling, it is recommended to use the OpenAI-compatible API provided by Ollama. This approach is more powerful.

	Refer to the [Ollama documentation](https://docs.ollama.com/api/openai-compatibility).

	### Python (Local)

	```bash
	#pip install transformers==4.56.2
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "hadadxyz/Qwen3-4B-Diversity"

	# load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype="auto",
	device_map="auto"
	)

	# prepare the model input
	prompt = "Give me a short introduction to large language model."
	messages = [
	{"role": "user", "content": prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
	)
	model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

	# conduct text completion
	generated_ids = model.generate(
	**model_inputs,
	max_new_tokens=32768
	)
	output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()

	# parsing thinking content
	try:
	# rindex finding 151668 (</think>)
	index = len(output_ids) - output_ids[::-1].index(151668)
	except ValueError:
	index = 0

	thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
	content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

	print("thinking content:", thinking_content)
	print("content:", content)
	```

	## Inference Parameters

	For optimal results, we recommend the following generation parameters:

	### Thinking

	\| Parameter \| Recommended Value \| Description \|
	\|-----------------\|-------------------\|------------------------------------------\|
	\| temperature \| 0.6 \| Controls randomness in generation \|
	\| top_p \| 0.95 \| Nucleus sampling threshold \|
	\| top_k \| 20 \| Top-k sampling parameter \|
	\| min_p \| 0 \| Minimum probability threshold \|

	### Non-Thinking

	\| Parameter \| Recommended Value \| Description \|
	\|-----------------\|-------------------\|------------------------------------------\|
	\| temperature \| 0.7 \| Controls randomness in generation \|
	\| top_p \| 0.8 \| Nucleus sampling threshold \|
	\| top_k \| 20 \| Top-k sampling parameter \|
	\| min_p \| 0 \| Minimum probability threshold \|

	## Citation

	If you use this model in your research or applications, please cite both this model and the base model:

	```bibtex
	@misc{qwen3-4b-diversity,
	author = {hadadxyz},
	title = {Qwen3-4B-Diversity},
	year = {2026},
	url = {https://huggingface.co/hadadxyz/Qwen3-4B-Diversity}
	}
	```

	## Acknowledgments

	This model was made possible through the combination of multiple high-quality datasets from the community. We acknowledge and thank all dataset creators and the Qwen team for providing the excellent base model.