Experimental global target bits‑per‑weight quantization of HauhauCS/GLM-4.7-Flash-Uncensored-HauhauCS-Aggressive

  • Using non-standard (forked) LLaMA C++ branch for quantization.
  • Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
  • Using dataset sources: tools, math, code, text_en, text_ru.
  • Using dataset chunks: 750.
  • Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
  • Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS PPL correlation PPL mean ratio ΔPPL Mean KLD Median KLD Maximum KLD 99.9% KLD Mean Δp RMS Δp
3.50 93.18% 1.205301 ± 0.003693 3.066378 ± 0.059410 0.346790 ± 0.003130 0.113532 36.079124 17.036463 -1.548 ± 0.027 % 12.313 ± 0.063 %
4.00 92.50% 1.081636 ± 0.003399 1.219317 ± 0.050468 0.350924 ± 0.003492 0.094924 35.051384 19.111687 -0.644 ± 0.027 % 11.874 ± 0.066 %
4.50 93.92% 1.159525 ± 0.003402 2.382673 ± 0.054462 0.212138 ± 0.003037 0.035486 37.016945 19.404484 -0.722 ± 0.019 % 8.384 ± 0.070 %
5.00 94.32% 1.030399 ± 0.002807 0.454046 ± 0.041667 0.223456 ± 0.003202 0.029363 33.802094 20.041710 -0.307 ± 0.018 % 7.986 ± 0.072 %
5.50 93.59% 0.970038 ± 0.002771 -0.447509 ± 0.042194 0.234948 ± 0.003451 0.024535 32.256840 19.587420 0.123 ± 0.020 % 8.691 ± 0.081 %
6.00 96.85% 1.028155 ± 0.002101 0.420519 ± 0.031506 0.107335 ± 0.002290 0.008626 36.048412 17.149174 -0.060 ± 0.012 % 5.211 ± 0.072 %
6.50 97.55% 1.037597 ± 0.001880 0.561555 ± 0.028552 0.080116 ± 0.001919 0.007975 32.534607 14.545952 -0.128 ± 0.011 % 4.691 ± 0.069 %
7.00 96.92% 1.015746 ± 0.002049 0.235178 ± 0.030606 0.099637 ± 0.002328 0.003383 35.733624 17.560083 0.007 ± 0.010 % 4.330 ± 0.078 %
7.50 97.57% 1.030245 ± 0.001857 0.451735 ± 0.028043 0.067106 ± 0.001900 0.002916 37.250160 14.828805 -0.069 ± 0.009 % 3.827 ± 0.077 %
8.00 97.42% 1.022089 ± 0.001892 0.329923 ± 0.028377 0.077545 ± 0.002062 0.002760 33.393574 16.344349 -0.035 ± 0.009 % 3.835 ± 0.077 %
8.50 97.26% 1.026630 ± 0.001957 0.397746 ± 0.029396 0.082303 ± 0.002123 0.002307 31.664230 17.153591 -0.049 ± 0.009 % 3.815 ± 0.079 %
9.00 98.34% 1.019983 ± 0.001520 0.298461 ± 0.022937 0.044058 ± 0.001500 0.001003 35.295940 10.766310 -0.013 ± 0.007 % 3.132 ± 0.075 %
9.50 98.27% 1.010330 ± 0.001530 0.154286 ± 0.022915 0.051995 ± 0.001669 0.000858 32.079849 13.546452 0.032 ± 0.007 % 3.195 ± 0.078 %
10.00 98.40% 1.013286 ± 0.001478 0.198433 ± 0.022200 0.044456 ± 0.001528 0.000833 31.551548 12.255160 0.002 ± 0.007 % 3.022 ± 0.076 %
10.50 98.30% 1.012990 ± 0.001525 0.194020 ± 0.022882 0.047429 ± 0.001597 0.000826 33.701038 13.457508 0.019 ± 0.007 % 3.073 ± 0.078 %
11.00 98.35% 1.019113 ± 0.001514 0.285470 ± 0.022865 0.042238 ± 0.001490 0.000819 31.194330 11.399836 0.001 ± 0.006 % 2.878 ± 0.075 %
Downloads last month
9,430
GGUF
Model size
30B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF

Quantized
(1)
this model

Dataset used to train ENOSYS/GLM-4.7-Flash-Uncensored-750-v1-GGUF