Add known limitations (vision status)
Browse files
README.md
CHANGED
|
@@ -35,6 +35,10 @@ AWQ 4-bit quantization of [Gemma 4 26B-A4B-it](https://huggingface.co/google/gem
|
|
| 35 |
|
| 36 |
Standard community GPTQ under-calibrates rare experts due to routing imbalance. This model uses forced-routing calibration to ensure all 128 experts are properly quantized.
|
| 37 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
## Usage with SGLang
|
| 39 |
|
| 40 |
```bash
|
|
|
|
| 35 |
|
| 36 |
Standard community GPTQ under-calibrates rare experts due to routing imbalance. This model uses forced-routing calibration to ensure all 128 experts are properly quantized.
|
| 37 |
|
| 38 |
+
## Known Limitations
|
| 39 |
+
|
| 40 |
+
- **Vision: BROKEN** — Vision encoder layers (`embed_vision.*`) were quantized to INT4. This likely degrades or destroys vision quality. **Use for text-only inference.** A future version should add vision layers to `modules_to_not_convert`.
|
| 41 |
+
|
| 42 |
## Usage with SGLang
|
| 43 |
|
| 44 |
```bash
|