--- license: apache-2.0 license_link: https://www.apache.org/licenses/LICENSE-2.0 base_model: mistralai/Mistral-7B-Instruct-v0.3 tags: - quantized - 4-bit - int4 - awq language: - en library_name: transformers pipeline_tag: text-generation --- # Mistral-7B-Instruct-v0.3-SpinQuant-NVFP4 INT4 weight-only quantization of [`mistralai/Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3). Mistral-7B-Instruct-v0.3 in NVFP4 W4A4 (4-bit weights + 4-bit activations). Native vLLM compressed-tensors. Hopper/Ada/Blackwell tensor cores supported. About 4 GB on disk. | Property | Value | |---|---| | Base model | [mistralai/Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) | | Quantization | INT4 weight-only | | Approx. on-disk size | ~4.5 GB | | License | Apache License, Version 2.0 | | Languages | English | ## Load (vLLM) ```bash vllm serve drawais/Mistral-7B-Instruct-v0.3-SpinQuant-NVFP4 \ --max-model-len 32768 \ --gpu-memory-utilization 0.94 ``` ```python from vllm import LLM, SamplingParams llm = LLM(model="drawais/Mistral-7B-Instruct-v0.3-SpinQuant-NVFP4", max_model_len=32768) print(llm.generate(["Hello!"], SamplingParams(max_tokens=128))[0].outputs[0].text) ``` ## Footprint ~4.5 GB on disk. Recommended VRAM: enough headroom for KV cache. ## License & attribution This artifact is a derivative work of [`mistralai/Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), released by its original authors under the **Apache License, Version 2.0**. This artifact is distributed under the same license. The full license text is included in [`LICENSE`](LICENSE), and required attribution is in [`NOTICE`](NOTICE). License text: https://www.apache.org/licenses/LICENSE-2.0 Source model: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3