Update vision status: untestable due to server crash

d47e216 verified 27 days ago

1.99 kB

	---
	base_model: google/gemma-4-26b-a4b-it
	tags:
	- awq
	- 4-bit
	- rdna4
	- gfx1201
	- rocm
	- sglang
	- quantized
	license: apache-2.0
	---

	# Gemma 4 26B MoE AWQ 4-bit

	AWQ 4-bit quantization of [Gemma 4 26B-A4B-it](https://huggingface.co/google/gemma-4-26b-a4b-it) optimized for AMD RDNA4 (gfx1201) inference with [SGLang](https://github.com/sgl-project/sglang).

	## Model Details

	\| \| \|
	\|---\|---\|
	\| Base model \| [google/gemma-4-26b-a4b-it](https://huggingface.co/google/gemma-4-26b-a4b-it) \|
	\| Architecture \| MoE (128 experts, top-8) \|
	\| Parameters \| 26B total / 4B active \|
	\| Layers \| 30 \|
	\| Context \| 4K (tested) \|
	\| Quantization \| AWQ 4-bit, group_size=32. Forced-routing GPTQ calibration covers all 128 experts (standard GPTQ only calibrates ~1/128). \|

	## Performance (2x AMD Radeon AI PRO R9700, TP=2)

	- Decode speed: 30 tok/s single-user on 2x R9700
	- Launch: `scripts/launch.sh gemma4`

	## Notes

	Standard community GPTQ under-calibrates rare experts due to routing imbalance. This model uses forced-routing calibration to ensure all 128 experts are properly quantized.

	## Known Limitations

	- Vision: UNTESTABLE — Vision encoder layers (`embed_vision.`) were quantized to INT4, which likely degrades vision quality. Server crashes on first request (pre-existing RDNA4 triton issue with this model's SWA configuration, not vision-specific). Text-only inference recommended.* A future version should add vision layers to `modules_to_not_convert`.

	## Usage with SGLang

	```bash
	git clone https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference
	cd 2x-R9700-RDNA4-GFX1201-sglang-inference
	./scripts/setup.sh
	scripts/launch.sh gemma4
	```

	See the [RDNA4 Inference Repository](https://github.com/mattbucci/2x-R9700-RDNA4-GFX1201-sglang-inference) for full setup instructions, patches, and benchmarks.

	## Hardware

	Tested on 2x AMD Radeon AI PRO R9700 (gfx1201, RDNA4, 32+34 GB VRAM) with ROCm 7.2 and SGLang v0.5.10 + RDNA4 patches.