Mistral-Small-24B-Instruct-2501-AWQ

4-bit AWQ quantization of mistralai/Mistral-Small-24B-Instruct-2501, produced for efficient inference on consumer/prosumer GPUs with vLLM.

Quantization details

Field Value
Method AWQ (Activation-aware Weight Quantization)
Scheme W4A16_ASYM
Group size 128
Ignored layers lm_head (kept at full precision)
Format compressed-tensors (pack-quantized)
Tool llmcompressor 0.6.0
Calibration dataset HuggingFaceH4/ultrachat_200k (train_sft split)
Calibration samples 256
Max sequence length 512 tokens

The weights are saved in compressed-tensors format, which vLLM supports natively. No separate autoawq package is needed.

W4A16_ASYM: weights are stored as 4-bit integers; activations remain in 16-bit (fp16/bf16) during inference. Asymmetric quantization allows an independent zero-point per group, giving better coverage of skewed weight distributions. Groups of 128 consecutive weights share a single scale and zero-point.

Usage

vllm serve dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ --dtype auto
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
    model="dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Limitations

  • Quantization introduces a small accuracy degradation compared to the bf16 base model.
  • This model inherits all limitations and intended-use restrictions from the base model. Refer to the base model card for details.

License

Mistral Small is released under the Mistral Research License (MRL-0.1). These quantized weights are a derivative work — verify that your intended use complies with the license before use.

Downloads last month
60
Safetensors
Model size
4B params
Tensor type
I64
·
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ