mattbucci commited on
Commit
7704836
·
verified ·
1 Parent(s): 5b529db

Add model card for Gemma 4 26B MoE AWQ 4-bit

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-4-26b-a4b-it
3
+ tags:
4
+ - awq
5
+ - 4-bit
6
+ - rdna4
7
+ - gfx1201
8
+ - rocm
9
+ - sglang
10
+ - quantized
11
+ license: apache-2.0
12
+ ---
13
+
14
+ # Gemma 4 26B MoE AWQ 4-bit
15
+
16
+ AWQ 4-bit quantization of [Gemma 4 26B-A4B-it](https://huggingface.co/google/gemma-4-26b-a4b-it) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang).
17
+
18
+ ## Model Details
19
+
20
+ | | |
21
+ |---|---|
22
+ | **Base model** | [google/gemma-4-26b-a4b-it](https://huggingface.co/google/gemma-4-26b-a4b-it) |
23
+ | **Architecture** | MoE (128 experts, top-8) |
24
+ | **Parameters** | 26B total / 4B active |
25
+ | **Layers** | 30 |
26
+ | **Context** | 4K (tested) |
27
+ | **Quantization** | AWQ 4-bit, group_size=32. Forced-routing GPTQ calibration covers all 128 experts (standard GPTQ only calibrates ~1/128). |
28
+
29
+ ## Performance (2x AMD Radeon AI PRO R9700, TP=2)
30
+
31
+ - **Decode speed**: 30 tok/s single-user on 2x R9700
32
+ - **Launch**: `scripts/launch.sh gemma4`
33
+
34
+ ## Notes
35
+
36
+ Standard community GPTQ under-calibrates rare experts due to routing imbalance. This model uses forced-routing calibration to ensure all 128 experts are properly quantized.
37
+
38
+ ## Usage with SGLang
39
+
40
+ ```bash
41
+ git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
42
+ cd 2x-R9700-RDNA4-GFX1201-sglang-inference
43
+ ./scripts/setup.sh
44
+ scripts/launch.sh gemma4
45
+ ```
46
+
47
+ See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks.
48
+
49
+ ## Hardware
50
+
51
+ Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches.