drawais
/

DeepSeek-R1-Distill-Qwen-7B-HQQ-INT4

Text Generation

4-bit precision

Model card Files Files and versions

DeepSeek-R1-Distill-Qwen-7B-HQQ-INT4 / README.md

drawais's picture

Initial HQQ INT4 release

53b90c0 verified 13 days ago

|

history blame contribute delete

1.16 kB

	---
	license: mit
	base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
	tags:
	- quantized
	- 4-bit
	- int4
	- reasoning
	- distill
	language:
	- en
	pipeline_tag: text-generation
	---

	# DeepSeek-R1-Distill-Qwen-7B-HQQ-INT4

	INT4 quantization of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B). Calibration-free companion to [`drawais/DeepSeek-R1-Distill-Qwen-7B-AWQ-INT4`](https://huggingface.co/drawais/DeepSeek-R1-Distill-Qwen-7B-AWQ-INT4).

	## Footprint

	\| \| \|
	\|---\|---\|
	\| Source params \| 7B (distilled from R1) \|
	\| Quantized weights \| ~5.3 GB on disk \|
	\| Inference VRAM (incl. KV cache @ 32K context) \| ~10 GB \|

	Best at native ≤32K context. For long-context use the AWQ companion via vLLM.

	## Quick start

	```python
	from transformers import AutoTokenizer
	from hqq.models.hf.base import AutoHQQHFModel

	tok = AutoTokenizer.from_pretrained("drawais/DeepSeek-R1-Distill-Qwen-7B-HQQ-INT4")
	model = AutoHQQHFModel.from_quantized("drawais/DeepSeek-R1-Distill-Qwen-7B-HQQ-INT4", device="cuda")
	```

	## License

	MIT (DeepSeek-R1-Distill series). Underlying base model (Qwen2.5-Math-7B) is Apache 2.0.