| --- |
| license: mit |
| license_link: https://opensource.org/license/mit |
| base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
| tags: |
| - quantized |
| - 4-bit |
| - int4 |
| - awq |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: text-generation |
| --- |
| |
| # DeepSeek-R1-Distill-Qwen-14B-AWQ-INT4 |
|
|
| INT4 weight-only quantization of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B). |
|
|
| DeepSeek-R1 reasoning distilled into Qwen 14B, then INT4. About 9 GB on disk. Runs on a 12 GB consumer GPU. |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | [deepseek-ai/DeepSeek-R1-Distill-Qwen-14B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B) | |
| | Quantization | INT4 weight-only | |
| | Approx. on-disk size | ~9.9 GB | |
| | License | MIT License | |
| | Languages | English | |
|
|
| ## Load (vLLM) |
|
|
| ```bash |
| vllm serve drawais/DeepSeek-R1-Distill-Qwen-14B-AWQ-INT4 \ |
| --max-model-len 32768 \ |
| --gpu-memory-utilization 0.94 |
| ``` |
|
|
| ```python |
| from vllm import LLM, SamplingParams |
| llm = LLM(model="drawais/DeepSeek-R1-Distill-Qwen-14B-AWQ-INT4", max_model_len=32768) |
| print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text) |
| ``` |
|
|
| ## Footprint |
|
|
| ~9.9 GB on disk. Recommended VRAM: enough headroom for KV cache. |
|
|
| ## License & attribution |
|
|
| This artifact is a derivative work of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), |
| released by its original authors under the **MIT License**. |
|
|
| This artifact is distributed under the same license. The full license text is |
| included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE). |
|
|
| License text: https://opensource.org/license/mit |
| Source model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
|