| --- |
| license: apache-2.0 |
| license_link: https://www.apache.org/licenses/LICENSE-2.0 |
| base_model: Qwen/Qwen2.5-Coder-32B-Instruct |
| tags: |
| - quantized |
| - 4-bit |
| - int4 |
| - awq |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: text-generation |
| --- |
| |
| # Qwen2.5-Coder-32B-Instruct-AWQ-INT4 |
|
|
| INT4 weight-only quantization of [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct). |
|
|
| Qwen 2.5 Coder 32B-Instruct in INT4. About 17 GB on disk. Runs on a 24 GB consumer GPU. |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) | |
| | Quantization | INT4 weight-only | |
| | Approx. on-disk size | ~19.2 GB | |
| | License | Apache License, Version 2.0 | |
| | Languages | English | |
|
|
| ## Load (vLLM) |
|
|
| ```bash |
| vllm serve drawais/Qwen2.5-Coder-32B-Instruct-AWQ-INT4 \ |
| --max-model-len 32768 \ |
| --gpu-memory-utilization 0.94 |
| ``` |
|
|
| ```python |
| from vllm import LLM, SamplingParams |
| llm = LLM(model="drawais/Qwen2.5-Coder-32B-Instruct-AWQ-INT4", max_model_len=32768) |
| print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text) |
| ``` |
|
|
| ## Footprint |
|
|
| ~19.2 GB on disk. Recommended VRAM: enough headroom for KV cache. |
|
|
| ## License & attribution |
|
|
| This artifact is a derivative work of [`Qwen/Qwen2.5-Coder-32B-Instruct`](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct), |
| released by its original authors under the **Apache License, Version 2.0**. |
|
|
| This artifact is distributed under the same license. The full license text is |
| included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE). |
|
|
| License text: https://www.apache.org/licenses/LICENSE-2.0 |
| Source model: https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct |
|
|