Mistral-Small-24B-Instruct-2501-AWQ

4-bit AWQ quantization of mistralai/Mistral-Small-24B-Instruct-2501, produced for efficient inference on consumer/prosumer GPUs with vLLM.

Quantization details

Field	Value
Method	AWQ (Activation-aware Weight Quantization)
Scheme	W4A16_ASYM
Group size	128
Ignored layers	`lm_head` (kept at full precision)
Format	`compressed-tensors` (`pack-quantized`)
Tool	llmcompressor 0.6.0
Calibration dataset	`HuggingFaceH4/ultrachat_200k` (train_sft split)
Calibration samples	256
Max sequence length	512 tokens

The weights are saved in compressed-tensors format, which vLLM supports natively. No separate autoawq package is needed.

W4A16_ASYM: weights are stored as 4-bit integers; activations remain in 16-bit (fp16/bf16) during inference. Asymmetric quantization allows an independent zero-point per group, giving better coverage of skewed weight distributions. Groups of 128 consecutive weights share a single scale and zero-point.

Usage

vllm serve dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ --dtype auto

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
response = client.chat.completions.create(
    model="dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Limitations

Quantization introduces a small accuracy degradation compared to the bf16 base model.
This model inherits all limitations and intended-use restrictions from the base model. Refer to the base model card for details.

License

Mistral Small is released under the Mistral Research License (MRL-0.1). These quantized weights are a derivative work — verify that your intended use complies with the license before use.

Downloads last month: 60

Safetensors

Model size

4B params

Tensor type

I64

I32

BF16

Model tree for dark-side-of-the-code/Mistral-Small-24B-Instruct-2501-AWQ

Base model

mistralai/Mistral-Small-24B-Base-2501

Finetuned

mistralai/Mistral-Small-24B-Instruct-2501

Quantized

(102)

this model