--- base_model: mistralai/Devstral-Small-2507 tags: - awq - 4-bit - rdna4 - gfx1201 - rocm - sglang - quantized license: apache-2.0 --- # Devstral-24B AWQ 4-bit AWQ 4-bit quantization of [Devstral Small 24B](https://huggingface.co/mistralai/Devstral-Small-2507) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang). ## Model Details | | | |---|---| | **Base model** | [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) | | **Architecture** | Dense | | **Parameters** | 24B | | **Layers** | 40 | | **Context** | 32K (tested), 393K (max) | | **Quantization** | AWQ 4-bit, group_size=128 | ## Performance (2x AMD Radeon AI PRO R9700, TP=2) - **Decode speed**: 37 tok/s single-user on 2x R9700 - **Launch**: `scripts/launch.sh devstral` ## Notes GPTQ-calibrated with 128 samples. BOS token removed from chat template (fixes `` output). Text-only warmup to avoid radix cache pollution from vision tokens. ## Known Limitations - **Vision**: WORKING. Vision tower weights preserved in original precision (`modules_to_not_convert` includes `vision_tower`, `multi_modal_projector`). Tested: correctly identifies a red square image. ## Usage with SGLang ```bash git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference cd 2x-R9700-RDNA4-GFX1201-sglang-inference ./scripts/setup.sh scripts/launch.sh devstral ``` See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks. ## Hardware Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches.