---
license: other
license_name: ztech-license
license_link: https://huggingface.co/ZirTech/OmniMath-2B/resolve/main/LICENSE
language:
- en
pipeline_tag: text-generation
---

<div align="center">

![image](https://cdn-uploads.huggingface.co/production/uploads/69b3129f02a20db8381db62e/HnRi-9kPNTvDJEO71OO82.png)

</div>

---

# 🧮 OmniMath-2B

OmniMath-2B is a compact yet capable mathematical reasoning model, fine‑tuned on top of **Qwen3.5‑2B**'s hybrid architecture (Gated Delta Networks interleaved with standard attention). Trained on **10,000** carefully selected math problems from five diverse open‑source datasets, it excels at step‑by‑step solutions, arithmetic word problems, geometry reasoning, and error recovery.

Despite its small size, OmniMath-2B demonstrates strong chain‑of‑thought performance and is ideally suited for resource‑constrained environments, edge deployment, and fast prototyping.

---

## ✨ Key Features

- **Efficient 2B Scale** : Only 2 billion parameters – runs smoothly on a single T4 GPU or even CPU with quantization.
- **Multi‑Source Math Training** : Balanced mix of real‑world problems (`orca‑math`, `GSM8K`), synthetic reasoning (`MetaMathQA`), geometry (`Geo‑Thought`), and multi‑modal math (`DeepVision` text subset).
- **Step‑by‑Step Reasoning** : Trained with explicit `<think>...</think>`‑style chain‑of‑thought prompts.
- **Hybrid Architecture** : Inherits Qwen3.5's Gated Delta Networks for efficient long‑context processing.

---

## 📊 Benchmarks

*Preliminary results (evaluation ongoing).*

| Model | Size (params) | GSM8K Accuracy |
|-------|---------------|----------------|
| Qwen2.5-Math-1.5B | 1.5B | 54% |
| Phi-2 (0-shot CoT) | 2.7B | 50.0% |
| **OmniMath-2B (0-shot CoT)** | **2B** | **63.76%** |
| dolphin-2_6-phi-2 | 2.7B | 58.07% |
| Qwen2.5-0.5B-Instruct | 2.7B | 49.6% |
| gemma-3-1b-it | 1.1B | 62.8% |
| MobileLLM-R1.5 950M | 1B | 52.8% |
| Gemma 2 2B IT | 2B | 23.9% |

*Updates coming soon.*

---

## 🚀 Quickstart

### 🤗 Transformers

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "ZirTech/OmniMath-2B"  

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful math assistant. Solve problems step by step."},
    {"role": "user", "content": "A store sells apples for $2 each. If you buy 5 apples, how much do you pay?"}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(outputs[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))
```

---

## ⚡ vLLM

```
vllm serve ZirTech/OmniMath-2B --tensor-parallel-size 1 --max-model-len 4096
```

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ZirTech/OmniMath-2B"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)
model.eval()

def ask(question):
    prompt = f"<|im_start|>system\nYou are a helpful math assistant.<|im_end|>\n<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.0, do_sample=False)
    response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    if "user" in response:
        response = response.split("user")[0].strip()
    return response

print(ask("Find the degree for the given field extension Q(sqrt(2), sqrt(3), sqrt(18)) over Q. Give me the answer."))
```

---

## 🏗️ Architecture

OmniMath‑2B fully preserves Qwen3.5‑2B's design:

  * Gated Delta Networks : Linear attention layers interleaved with standard attention.

  * 262K Native Context : Supports up to 262,144 tokens (extendable with YaRN).

  * Built on Qwen3_5ForCausalLM : Seamless integration with Hugging Face ecosystem.

---

## ⚠️ Limitations

* Numerical accuracy may occasionally falter – always double‑check critical calculations.

* Geometry with visual elements was only trained on textual descriptions; performance on image‑based geometry is limited.

* Non‑English math problems are not thoroughly evaluated.

---

## 🙏 Acknowledgments

* Qwen Team for the outstanding Qwen3.5 base models.

* Hugging Face for dataset hosting and the Transformers library.

* Kaggle for providing free GPU hours.

---

## 📖 Citation

```bibtex
@misc{omnimath2b2026,
  title={OmniMath-2B: A Lightweight Open Mathematical Reasoning Model},
  author={Zirt Techniques},
  year={2026},
  url={https://huggingface.co/ZirTech/OmniMath-2B}
}
```

---

<div align="center">

**Built by [Zirt Tech](https://huggingface.co/ZirTech) ❤️**

</div>