drawais
/

Qwen2.5-Math-7B-Instruct-HQQ-INT4

Text Generation

4-bit precision

Model card Files Files and versions

Qwen2.5-Math-7B-Instruct-HQQ-INT4 / README.md

drawais's picture

Initial HQQ INT4 release

dcd22b9 verified 9 days ago

|

history blame contribute delete

988 Bytes

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Math-7B-Instruct
	tags:
	- quantized
	- 4-bit
	- int4
	- qwen2.5
	- math
	language:
	- en
	pipeline_tag: text-generation
	---

	# Qwen2.5-Math-7B-Instruct-HQQ-INT4

	INT4 quantization of [`Qwen/Qwen2.5-Math-7B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct). Calibration-free companion to [`drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4`](https://huggingface.co/drawais/Qwen2.5-Math-7B-Instruct-AWQ-INT4).

	## Footprint

	\| \| \|
	\|---\|---\|
	\| Source params \| 7B (math-specialized) \|
	\| Quantized weights \| ~5.3 GB on disk \|
	\| Inference VRAM (incl. KV cache @ 32K context) \| ~10 GB \|

	## Quick start

	```python
	from transformers import AutoTokenizer
	from hqq.models.hf.base import AutoHQQHFModel

	tok = AutoTokenizer.from_pretrained("drawais/Qwen2.5-Math-7B-Instruct-HQQ-INT4")
	model = AutoHQQHFModel.from_quantized("drawais/Qwen2.5-Math-7B-Instruct-HQQ-INT4", device="cuda")
	```

	## License

	Apache 2.0 (inherits from base model).