Gemma-4-abliterated
Collection
2 items • Updated
NVFP4 quantization of huihui-ai/Huihui-gemma-4-E2B-it-abliterated, quantized using NVIDIA ModelOpt with NVFP4_MLP_ONLY strategy (only MLP layers quantized, attention preserved in higher precision).
| Item | Value |
|---|---|
| Architecture | Dense, Per-Layer Embeddings (PLE), ~2.3B effective parameters |
| Base model | google/gemma-4-E2B-it |
| Fine-tuned by | huihui-ai (abliteration) |
| Quantized by | YuYu1015 |
| Model size | ~7.4 GB (NVFP4) |
| Context length | Up to 128,000 tokens |
| Multimodal | Vision + Audio supported |
| Item | Value |
|---|---|
| Method | NVIDIA ModelOpt v0.42.0 |
| Scheme | NVFP4 (E2M1 + FP8 per-group scaling, group size 16) |
| Strategy | NVFP4_MLP_ONLY — only MLP/FFN layers quantized, all attention layers preserved |
| Calibration dataset | abisee/cnn_dailymail |
| Calibration samples | 512 |
| Hardware | NVIDIA DGX Spark (GB10, 128GB unified memory) |
| Layer | Reason |
|---|---|
self_attn.* (all layers) |
Attention layers preserved for accuracy (MLP_ONLY strategy) |
lm_head |
Output head |
vision_tower.* |
Vision encoder |
audio_tower.* |
Audio encoder |
multi_modal_projector.* |
Multimodal projection |
embed_tokens |
Input embeddings |
vllm serve /path/to/model \
--quantization modelopt \
--served-model-name gemma-4-e2b \
--trust-remote-code \
--gpu-memory-utilization 0.90 \
--max-model-len 32768 \
--enable-prefix-caching \
--enable-chunked-prefill \
--language-model-only
cvt.e2m1x2 instruction)--quantization modelopt (not compressed-tensors)--language-model-only skips vision/audio encoder profiling for text-only inferencesudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'This model has safety filtering removed (abliterated) and may generate inappropriate content. Users are solely responsible for all consequences arising from its use.
huihui-ai/Huihui-gemma-4-E2B-it-abliterated 的 NVFP4 量化版本,使用 NVIDIA ModelOpt 的 NVFP4_MLP_ONLY 策略量化(僅量化 MLP 層,attention 保留高精度)。
| 項目 | 數值 |
|---|---|
| 架構 | Dense,Per-Layer Embeddings (PLE),約 2.3B 有效參數 |
| 基礎模型 | google/gemma-4-E2B-it |
| 微調者 | huihui-ai(abliteration) |
| 量化者 | YuYu1015 |
| 模型大小 | ~7.4 GB(NVFP4) |
| Context 長度 | 最高 128,000 tokens |
| 多模態 | 支援視覺 + 音訊 |
| 項目 | 數值 |
|---|---|
| 方法 | NVIDIA ModelOpt v0.42.0 |
| 方案 | NVFP4(E2M1 + FP8 逐群縮放,群組大小 16) |
| 策略 | NVFP4_MLP_ONLY — 僅量化 MLP/FFN 層,所有 attention 層保留高精度 |
| 校準資料集 | abisee/cnn_dailymail |
| 校準樣本數 | 512 |
| 量化硬體 | NVIDIA DGX Spark(GB10, 128GB 統一記憶體) |
| 層 | 原因 |
|---|---|
self_attn.*(所有層) |
Attention 層保留以確保精度(MLP_ONLY 策略) |
lm_head |
輸出頭 |
vision_tower.* |
視覺編碼器 |
audio_tower.* |
音訊編碼器 |
multi_modal_projector.* |
多模態投影層 |
embed_tokens |
輸入嵌入 |
cvt.e2m1x2 指令)--quantization modelopt(非 compressed-tensors)--language-model-only 跳過視覺/音訊編碼器 profiling,加速純文字推理sudo sh -c 'echo 3 > /proc/sys/vm/drop_caches'此模型已移除安全過濾機制(abliterated),可能產生不當內容。使用者須自行承擔所有風險與法律責任。
Base model
google/gemma-4-E2B-it