File size: 25,526 Bytes

a044b60

---
base_model:
  - Qwen/Qwen3-4B

tags:
  - distillation
  - distilled
  - sft
  - peft
  - qwen3

datasets:
  - ianncity/KIMI-K2.5-550000x
  - Jackrong/Qwen3.5-reasoning-700x
  - nohurry/Opus-4.6-Reasoning-3000x-filtered
  - TeichAI/claude-4.5-opus-high-reasoning-250x
  - TeichAI/gemini-3-pro-preview-high-reasoning-250x
  - TeichAI/claude-haiku-4.5-high-reasoning-1700x
  - TeichAI/gpt-5.2-high-reasoning-250x
  - Roman1111111/gemini-3.1-pro-hard-high-reasoning
  - Jackrong/glm-4.7-multiturn-CoT
  - bmeyer2025/glm5-reasoning-traces
  - TeichAI/claude-sonnet-4.5-high-reasoning-250x
  - TeichAI/deepseek-v3.2-speciale-openr1-math-3k
  - TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k
  - TeichAI/deepseek-v3.2-speciale-1000x
  - TeichAI/gpt-5-codex-1000x

model-index:
  - name: hadadxyz/Qwen3-4B-Diversity
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu
          type: cais/mmlu
        metrics:
          - type: acc
            value: 67.8
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Humanities
          type: cais/mmlu
        metrics:
          - type: acc
            value: 57.9
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Formal Logic
          type: cais/mmlu
        metrics:
          - type: acc
            value: 58.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School European History
          type: cais/mmlu
        metrics:
          - type: acc
            value: 78.2
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Us History
          type: cais/mmlu
        metrics:
          - type: acc
            value: 84.8
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School World History
          type: cais/mmlu
        metrics:
          - type: acc
            value: 83.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu International Law
          type: cais/mmlu
        metrics:
          - type: acc
            value: 77.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Jurisprudence
          type: cais/mmlu
        metrics:
          - type: acc
            value: 78.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Logical Fallacies
          type: cais/mmlu
        metrics:
          - type: acc
            value: 82.8
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Moral Disputes
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Moral Scenarios
          type: cais/mmlu
        metrics:
          - type: acc
            value: 28.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Philosophy
          type: cais/mmlu
        metrics:
          - type: acc
            value: 73.3
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Prehistory
          type: cais/mmlu
        metrics:
          - type: acc
            value: 76.2
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Professional Law
          type: cais/mmlu
        metrics:
          - type: acc
            value: 47.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu World Religions
          type: cais/mmlu
        metrics:
          - type: acc
            value: 78.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Other
          type: cais/mmlu
        metrics:
          - type: acc
            value: 72.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Business Ethics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 73.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Clinical Knowledge
          type: cais/mmlu
        metrics:
          - type: acc
            value: 75.5
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Medicine
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Global Facts
          type: cais/mmlu
        metrics:
          - type: acc
            value: 41.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Human Aging
          type: cais/mmlu
        metrics:
          - type: acc
            value: 67.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Management
          type: cais/mmlu
        metrics:
          - type: acc
            value: 84.5
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Marketing
          type: cais/mmlu
        metrics:
          - type: acc
            value: 85.5
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Medical Genetics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 75.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Miscellaneous
          type: cais/mmlu
        metrics:
          - type: acc
            value: 79.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Nutrition
          type: cais/mmlu
        metrics:
          - type: acc
            value: 74.8
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Professional Accounting
          type: cais/mmlu
        metrics:
          - type: acc
            value: 55.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Professional Medicine
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Virology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 53.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Social Sciences
          type: cais/mmlu
        metrics:
          - type: acc
            value: 78.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Econometrics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 64.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Geography
          type: cais/mmlu
        metrics:
          - type: acc
            value: 84.3
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Government And Politics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 87.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Macroeconomics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 74.6
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Microeconomics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 80.7
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Psychology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 87.2
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Human Sexuality
          type: cais/mmlu
        metrics:
          - type: acc
            value: 75.6
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Professional Psychology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.2
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Public Relations
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.8
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Security Studies
          type: cais/mmlu
        metrics:
          - type: acc
            value: 74.3
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Sociology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 84.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Us Foreign Policy
          type: cais/mmlu
        metrics:
          - type: acc
            value: 81.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Stem
          type: cais/mmlu
        metrics:
          - type: acc
            value: 68.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Abstract Algebra
          type: cais/mmlu
        metrics:
          - type: acc
            value: 45.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Anatomy
          type: cais/mmlu
        metrics:
          - type: acc
            value: 61.5
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Astronomy
          type: cais/mmlu
        metrics:
          - type: acc
            value: 78.9
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Biology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 83.3
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Chemistry
          type: cais/mmlu
        metrics:
          - type: acc
            value: 54.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Computer Science
          type: cais/mmlu
        metrics:
          - type: acc
            value: 69.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Mathematics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 58.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu College Physics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 53.9
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Computer Security
          type: cais/mmlu
        metrics:
          - type: acc
            value: 80.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Conceptual Physics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 77.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Electrical Engineering
          type: cais/mmlu
        metrics:
          - type: acc
            value: 76.6
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Elementary Mathematics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 65.6
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Biology
          type: cais/mmlu
        metrics:
          - type: acc
            value: 86.1
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Chemistry
          type: cais/mmlu
        metrics:
          - type: acc
            value: 70.4
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Computer Science
          type: cais/mmlu
        metrics:
          - type: acc
            value: 86.0
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Mathematics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 42.6
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Physics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 62.9
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu High School Statistics
          type: cais/mmlu
        metrics:
          - type: acc
            value: 71.3
            name: accuracy
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Mmlu Machine Learning
          type: cais/mmlu
        metrics:
          - type: acc
            value: 57.1
            name: accuracy

pipeline_tag: text-generation

library_name: transformers

license: apache-2.0

license_link: https://huggingface.co/hadadxyz/Qwen3-4B-Diversity/blob/main/LICENSE
---

# Introduction

![MMLU](evaluations/mmlu.png)

Qwen3-4B-Diversity is a fine-tuned language model based on [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) that has been trained on a diverse collection of high-quality reasoning datasets. This model combines knowledge distilled from various state-of-the-art AI systems to provide enhanced reasoning capabilities across multiple domains including mathematics, coding, general problem-solving, and multi-turn conversations.

### Training Configuration

The model was trained using supervised fine-tuning techniques with parameter-efficient methods to optimize performance while maintaining computational efficiency. Key training parameters include:

| Parameter        | Value  |
|------------------|--------|
| Number of Epochs | 2      |
| Context Length   | 40,960 |

### Hardware and Resources

| Resource          | Specification          |
|-------------------|------------------------|
| GPU               | A100-80GB              |
| Training Duration | Approximately 17 hours |
| Estimated Cost    | $27 to $30             |

### Training Data

| Dataset                                                                                                                                    | Rows Used  | Model                              |
|--------------------------------------------------------------------------------------------------------------------------------------------|------------|------------------------------------|
| [ianncity/KIMI-K2.5-550000x](https://huggingface.co/datasets/ianncity/KIMI-K2.5-550000x) (General-Distillation)                            | 1,000      | Kimi K2.5                          |
| [Jackrong/Qwen3.5-reasoning-700x](https://huggingface.co/datasets/Jackrong/Qwen3.5-reasoning-700x)                                         | 633        | Qwen3.5                            |
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered)                     | 2,326      | Claude Opus 4.6                    |
| [TeichAI/claude-4.5-opus-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-4.5-opus-high-reasoning-250x)                 | 250        | Claude Opus 4.5                    |
| [TeichAI/gemini-3-pro-preview-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/gemini-3-pro-preview-high-reasoning-250x)       | 248        | Gemini 3 Pro                       |
| [TeichAI/claude-haiku-4.5-high-reasoning-1700x](https://huggingface.co/datasets/TeichAI/claude-haiku-4.5-high-reasoning-1700x)             | 1,688      | Claude Haiku 4.5                   |
| [TeichAI/gpt-5.2-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/gpt-5.2-high-reasoning-250x)                                 | 249        | GPT-5.2                            |
| [Roman1111111/gemini-3.1-pro-hard-high-reasoning](https://huggingface.co/datasets/Roman1111111/gemini-3.1-pro-hard-high-reasoning)         | 3,150      | Gemini 3.1 Pro                     |
| [Jackrong/glm-4.7-multiturn-CoT](https://huggingface.co/datasets/Jackrong/glm-4.7-multiturn-CoT)                                           | 5,090      | GLM-4.7                            |
| [bmeyer2025/glm5-reasoning-traces](https://huggingface.co/datasets/bmeyer2025/glm5-reasoning-traces)                                       | 1,744      | GLM-5                              |
| [TeichAI/claude-sonnet-4.5-high-reasoning-250x](https://huggingface.co/datasets/TeichAI/claude-sonnet-4.5-high-reasoning-250x)             | 247        | Claude Sonnet 4.5                  |
| [TeichAI/deepseek-v3.2-speciale-openr1-math-3k](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-openr1-math-3k)             | 3,317      | DeepSeek V3.2-Speciale             |
| [TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-OpenCodeReasoning-3k) | 2,953      | DeepSeek V3.2-Speciale             |
| [TeichAI/deepseek-v3.2-speciale-1000x](https://huggingface.co/datasets/TeichAI/deepseek-v3.2-speciale-1000x)                               | 991        | DeepSeek V3.2-Speciale             |
| [TeichAI/gpt-5-codex-1000x](https://huggingface.co/datasets/TeichAI/gpt-5-codex-1000x)                                                     | 991        | GPT-5 Codex                        |
| **Total**                                                                                                                                  | **24,877** | Combined diverse reasoning dataset |

## Model Capabilities

This model excels in several key areas:

1. **Advanced Reasoning**: The model can break down complex problems into steps and provide detailed reasoning processes.

2. **Mathematical Problem Solving**: Enhanced capabilities for mathematical reasoning and problem-solving through dedicated math-focused datasets.

3. **Code Generation and Understanding**: Improved coding abilities from multiple code-reasoning datasets including DeepSeek and GPT-5 Codex data.

4. **Multi-Turn Conversations**: Better handling of extended dialogues and context-aware responses.

5. **Domain Versatility**: Exposure to reasoning patterns from various AI systems provides flexibility across different domains and task types.

## Usage

### Quick Demo

If you are looking for a quick demo that is completely free and without any cost, you can use [Google Colab](https://colab.research.google.com/drive/1qy1n9MigDuwT0cA1Y6ImHChAIlsZPIcC).

### Ollama (Local)

```bash
# https://ollama.com/hadad/qwen3-4bd

# hadad/qwen3-4bd:Q8_0  |  4.3GB
# hadad/qwen3-4bd:BF16  |  8.1GB

# ollama pull hadad/qwen3-4bd:Q8_0

ollama run hadad/qwen3-4bd:Q8_0
```

If you are using Ollama and are interested in **tools** or **function calling**, it is recommended to use the **OpenAI-compatible API** provided by Ollama. This approach is more powerful. 

Refer to the [Ollama documentation](https://docs.ollama.com/api/openai-compatibility).

### Python (Local)

```bash
#pip install transformers==4.56.2
```

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "hadadxyz/Qwen3-4B-Diversity"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)
```

## Inference Parameters

For optimal results, we recommend the following generation parameters:

### Thinking

| Parameter       | Recommended Value | Description                              |
|-----------------|-------------------|------------------------------------------|
| temperature     | 0.6               | Controls randomness in generation        |
| top_p           | 0.95              | Nucleus sampling threshold               |
| top_k           | 20                | Top-k sampling parameter                 |
| min_p           | 0                 | Minimum probability threshold            |

### Non-Thinking

| Parameter       | Recommended Value | Description                              |
|-----------------|-------------------|------------------------------------------|
| temperature     | 0.7               | Controls randomness in generation        |
| top_p           | 0.8               | Nucleus sampling threshold               |
| top_k           | 20                | Top-k sampling parameter                 |
| min_p           | 0                 | Minimum probability threshold            |

## Citation

If you use this model in your research or applications, please cite both this model and the base model:

```bibtex
@misc{qwen3-4b-diversity,
  author = {hadadxyz},
  title  = {Qwen3-4B-Diversity},
  year   = {2026},
  url    = {https://huggingface.co/hadadxyz/Qwen3-4B-Diversity}
}
```

## Acknowledgments

This model was made possible through the combination of multiple high-quality datasets from the community. We acknowledge and thank all dataset creators and the Qwen team for providing the excellent base model.