| --- |
| license: mit |
| license_link: https://opensource.org/license/mit |
| base_model: microsoft/Phi-4-mini-instruct |
| tags: |
| - quantized |
| - 4-bit |
| - int4 |
| - awq |
| language: |
| - en |
| library_name: transformers |
| pipeline_tag: text-generation |
| --- |
| |
| # Phi-4-mini-instruct-AWQ-INT4 |
|
|
| INT4 weight-only quantization of [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct). |
|
|
| Microsoft Phi-4-mini-instruct in INT4. About 2.5 GB on disk. Runs on a 4 GB consumer GPU. |
|
|
| | Property | Value | |
| |---|---| |
| | Base model | [microsoft/Phi-4-mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | |
| | Quantization | INT4 weight-only | |
| | Approx. on-disk size | ~2.9 GB | |
| | License | MIT License | |
| | Languages | English | |
|
|
| ## Load (vLLM) |
|
|
| ```bash |
| vllm serve drawais/Phi-4-mini-instruct-AWQ-INT4 \ |
| --max-model-len 32768 \ |
| --gpu-memory-utilization 0.94 |
| ``` |
|
|
| ```python |
| from vllm import LLM, SamplingParams |
| llm = LLM(model="drawais/Phi-4-mini-instruct-AWQ-INT4", max_model_len=32768) |
| print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text) |
| ``` |
|
|
| ## Footprint |
|
|
| ~2.9 GB on disk. Recommended VRAM: enough headroom for KV cache. |
|
|
| ## License & attribution |
|
|
| This artifact is a derivative work of [`microsoft/Phi-4-mini-instruct`](https://huggingface.co/microsoft/Phi-4-mini-instruct), |
| released by its original authors under the **MIT License**. |
|
|
| This artifact is distributed under the same license. The full license text is |
| included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE). |
|
|
| License text: https://opensource.org/license/mit |
| Source model: https://huggingface.co/microsoft/Phi-4-mini-instruct |
|
|