| --- |
| base_model: mistralai/Devstral-Small-2507 |
| tags: |
| - awq |
| - 4-bit |
| - rdna4 |
| - gfx1201 |
| - rocm |
| - sglang |
| - quantized |
| license: apache-2.0 |
| --- |
| |
| # Devstral-24B AWQ 4-bit |
|
|
| AWQ 4-bit quantization of [Devstral Small 24B](https://huggingface.co/mistralai/Devstral-Small-2507) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang). |
|
|
| ## Model Details |
|
|
| | | | |
| |---|---| |
| | **Base model** | [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) | |
| | **Architecture** | Dense | |
| | **Parameters** | 24B | |
| | **Layers** | 40 | |
| | **Context** | 32K (tested), 393K (max) | |
| | **Quantization** | AWQ 4-bit, group_size=128 | |
| |
| ## Performance (2x AMD Radeon AI PRO R9700, TP=2) |
| |
| - **Decode speed**: 37 tok/s single-user on 2x R9700 |
| - **Launch**: `scripts/launch.sh devstral` |
| |
| ## Notes |
| |
| GPTQ-calibrated with 128 samples. BOS token removed from chat template (fixes `<unk>` output). Text-only warmup to avoid radix cache pollution from vision tokens. |
| |
| ## Known Limitations |
| |
| - **Vision**: WORKING. Vision tower weights preserved in original precision (`modules_to_not_convert` includes `vision_tower`, `multi_modal_projector`). Tested: correctly identifies a red square image. |
| |
| ## Usage with SGLang |
| |
| ```bash |
| git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference |
| cd 2x-R9700-RDNA4-GFX1201-sglang-inference |
| ./scripts/setup.sh |
| scripts/launch.sh devstral |
| ``` |
| |
| See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks. |
| |
| ## Hardware |
| |
| Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches. |
| |