File size: 1,715 Bytes
b456643 c4c56b9 b68f4c9 c4c56b9 b456643 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | ---
base_model: mistralai/Devstral-Small-2507
tags:
- awq
- 4-bit
- rdna4
- gfx1201
- rocm
- sglang
- quantized
license: apache-2.0
---
# Devstral-24B AWQ 4-bit
AWQ 4-bit quantization of [Devstral Small 24B](https://huggingface.co/mistralai/Devstral-Small-2507) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang).
## Model Details
| | |
|---|---|
| **Base model** | [mistralai/Devstral-Small-2507](https://huggingface.co/mistralai/Devstral-Small-2507) |
| **Architecture** | Dense |
| **Parameters** | 24B |
| **Layers** | 40 |
| **Context** | 32K (tested), 393K (max) |
| **Quantization** | AWQ 4-bit, group_size=128 |
## Performance (2x AMD Radeon AI PRO R9700, TP=2)
- **Decode speed**: 37 tok/s single-user on 2x R9700
- **Launch**: `scripts/launch.sh devstral`
## Notes
GPTQ-calibrated with 128 samples. BOS token removed from chat template (fixes `<unk>` output). Text-only warmup to avoid radix cache pollution from vision tokens.
## Known Limitations
- **Vision**: WORKING. Vision tower weights preserved in original precision (`modules_to_not_convert` includes `vision_tower`, `multi_modal_projector`). Tested: correctly identifies a red square image.
## Usage with SGLang
```bash
git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
cd 2x-R9700-RDNA4-GFX1201-sglang-inference
./scripts/setup.sh
scripts/launch.sh devstral
```
See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks.
## Hardware
Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches.
|