--- license: mit license_link: https://opensource.org/license/mit base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B tags: - quantized - 4-bit - int4 - awq language: - en library_name: transformers pipeline_tag: text-generation --- # DeepSeek-R1-Distill-Qwen-7B-AWQ-INT4 INT4 weight-only quantization of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B). DeepSeek-R1 reasoning distilled into Qwen 7B, then INT4. About 5 GB on disk. Runs on an 8 GB consumer GPU. | Property | Value | |---|---| | Base model | [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B) | | Quantization | INT4 weight-only | | Approx. on-disk size | ~5.6 GB | | License | MIT License | | Languages | English | ## Load (vLLM) ```bash vllm serve drawais/DeepSeek-R1-Distill-Qwen-7B-AWQ-INT4 \ --max-model-len 32768 \ --gpu-memory-utilization 0.94 ``` ```python from vllm import LLM, SamplingParams llm = LLM(model="drawais/DeepSeek-R1-Distill-Qwen-7B-AWQ-INT4", max_model_len=32768) print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text) ``` ## Footprint ~5.6 GB on disk. Recommended VRAM: enough headroom for KV cache. ## License & attribution This artifact is a derivative work of [`deepseek-ai/DeepSeek-R1-Distill-Qwen-7B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B), released by its original authors under the **MIT License**. This artifact is distributed under the same license. The full license text is included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE). License text: https://opensource.org/license/mit Source model: https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B