| --- |
| license: apache-2.0 |
| base_model: Qwen/Qwen2.5-Math-7B-Instruct |
| tags: |
| - quantized |
| - 4-bit |
| - int4 |
| - qwen2.5 |
| - math |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # Qwen2.5-Math-7B-Instruct-HQQ-INT4 |
|
|
| INT4 quantization of [`Qwen/Qwen2.5-Math-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct). Calibration-free companion to [`drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4`](https://huggingface.co/drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4). |
|
|
| ## Footprint |
|
|
| | | | |
| |---|---| |
| | Source params | 7B (math-specialized) | |
| | Quantized weights | ~5.3 GB on disk | |
| | Inference VRAM (incl. KV cache @ 32K context) | ~10 GB | |
|
|
| ## Quick start |
|
|
| ```python |
| from transformers import AutoTokenizer |
| from hqq.models.hf.base import AutoHQQHFModel |
| |
| tok = AutoTokenizer.from_pretrained("drawais/Qwen2.5-Math-7B-Instruct-HQQ-INT4") |
| model = AutoHQQHFModel.from_quantized("drawais/Qwen2.5-Math-7B-Instruct-HQQ-INT4", device="cuda") |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 (inherits from base model). |
|
|