mattbucci
/

Devstral-24B-AWQ

4-bit precision

Model card Files Files and versions

Devstral-24B-AWQ / README.md

mattbucci's picture

Vision tested and working

b68f4c9 verified 23 days ago

|

history blame contribute delete

1.72 kB

	---
	base_model: mistralai/Devstral-Small-2507
	tags:
	- awq
	- 4-bit
	- rdna4
	- gfx1201
	- rocm
	- sglang
	- quantized
	license: apache-2.0
	---

	# Devstral-24B AWQ 4-bit

	AWQ 4-bit quantization of [Devstral Small 24B](https://huggingface.co/mistralai/Devstral-Small-2507) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang).

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Base model \| [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) \|
	\| Architecture \| Dense \|
	\| Parameters \| 24B \|
	\| Layers \| 40 \|
	\| Context \| 32K (tested), 393K (max) \|
	\| Quantization \| AWQ 4-bit, group_size=128 \|

	## Performance (2x AMD Radeon AI PRO R9700, TP=2)

	- Decode speed: 37 tok/s single-user on 2x R9700
	- Launch: `scripts/launch.sh devstral`

	## Notes

	GPTQ-calibrated with 128 samples. BOS token removed from chat template (fixes `<unk>` output). Text-only warmup to avoid radix cache pollution from vision tokens.

	## Known Limitations

	- Vision: WORKING. Vision tower weights preserved in original precision (`modules_to_not_convert` includes `vision_tower`, `multi_modal_projector`). Tested: correctly identifies a red square image.

	## Usage with SGLang

	```bash
	git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
	cd 2x-R9700-RDNA4-GFX1201-sglang-inference
	./scripts/setup.sh
	scripts/launch.sh devstral
	```

	See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks.

	## Hardware

	Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches.