drawais
/

Qwen2.5-Coder-14B-Instruct-HQQ-INT4

Text Generation

4-bit precision

Model card Files Files and versions

Qwen2.5-Coder-14B-Instruct-HQQ-INT4 / README.md

drawais's picture

Initial HQQ INT4 release

1cf0bba verified 10 days ago

|

history blame contribute delete

1.1 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen2.5-Coder-14B-Instruct
	tags:
	- quantized
	- 4-bit
	- int4
	- qwen2.5
	- coder
	- code
	language:
	- en
	pipeline_tag: text-generation
	---

	# Qwen2.5-Coder-14B-Instruct-HQQ-INT4

	INT4 quantization of [`Qwen/Qwen2.5-Coder-14B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-14B-Instruct). Calibration-free companion to [`drawais/Qwen2.5-Coder-14B-Instruct-AWQ-INT4`](https://huggingface.co/drawais/Qwen2.5-Coder-14B-Instruct-AWQ-INT4).

	## Footprint

	\| \| \|
	\|---\|---\|
	\| Source params \| 14B (code-specialized) \|
	\| Quantized weights \| ~9.5 GB on disk \|
	\| Inference VRAM (incl. KV cache @ 32K context) \| ~16 GB \|

	Best at native ≤32K context. For long-context use the AWQ companion via vLLM.

	## Quick start

	```python
	from transformers import AutoTokenizer
	from hqq.models.hf.base import AutoHQQHFModel

	tok = AutoTokenizer.from_pretrained("drawais/Qwen2.5-Coder-14B-Instruct-HQQ-INT4")
	model = AutoHQQHFModel.from_quantized("drawais/Qwen2.5-Coder-14B-Instruct-HQQ-INT4", device="cuda")
	```

	## License

	Apache 2.0 (inherits from base model).