Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-4B
- Using non-standard (forked) LLaMA C++ branch for quantization.
- Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
- Using dataset sources: text_en, text_ru.
- Using dataset chunks: 750.
- Small set of patches added.
- Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
- Small set of patches added.
Many thanks to Ed Addario for an impressive job.
Quantization comparison
| BPW/TGS | PPL correlation | PPL mean ratio | ΔPPL | Mean KLD | Maximum KLD | 99.9% KLD | Mean Δp | RMS Δp |
|---|---|---|---|---|---|---|---|---|
| 3.50 | 88.91% | 1.059237 ± 0.005791 | 17.814284 ± 1.729713 | 1.168223 ± 0.005542 | 42.036110 | 25.354170 | -1.391 ± 0.037 % | 17.793 ± 0.064 % |
| 4.00 | 91.97% | 1.124329 ± 0.005392 | 37.389267 ± 1.678855 | 0.809469 ± 0.004837 | 44.454895 | 24.103703 | -0.450 ± 0.030 % | 14.205 ± 0.061 % |
| 4.50 | 94.56% | 1.312286 ± 0.005436 | 93.912806 ± 2.084847 | 0.449165 ± 0.003776 | 36.050694 | 23.000446 | -0.433 ± 0.022 % | 10.308 ± 0.056 % |
| 5.00 | 95.14% | 1.287066 ± 0.005078 | 86.328447 ± 1.959110 | 0.353164 ± 0.003435 | 40.124615 | 22.546082 | -0.282 ± 0.019 % | 8.891 ± 0.056 % |
| 5.50 | 95.95% | 1.185050 ± 0.004268 | 55.649627 ± 1.544628 | 0.234975 ± 0.002845 | 35.043465 | 20.616793 | 0.127 ± 0.015 % | 7.105 ± 0.054 % |
| 6.00 | 96.31% | 1.181371 ± 0.004077 | 54.543325 ± 1.493333 | 0.187329 ± 0.002574 | 34.484653 | 20.085846 | 0.138 ± 0.013 % | 6.297 ± 0.056 % |
| 6.50 | 96.58% | 1.192068 ± 0.004003 | 57.760093 ± 1.508316 | 0.156484 ± 0.002392 | 32.430752 | 19.101099 | 0.152 ± 0.012 % | 5.546 ± 0.056 % |
| 7.00 | 96.60% | 1.212270 ± 0.004081 | 63.835350 ± 1.579181 | 0.146085 ± 0.002389 | 33.821136 | 19.701403 | 0.111 ± 0.011 % | 5.325 ± 0.057 % |
| 7.50 | 96.63% | 1.208265 ± 0.004054 | 62.630910 ± 1.564522 | 0.139075 ± 0.002314 | 36.313965 | 19.207874 | 0.110 ± 0.011 % | 5.203 ± 0.057 % |
| 8.00 | 96.68% | 1.209469 ± 0.004038 | 62.993184 ± 1.565838 | 0.134810 ± 0.002289 | 34.888683 | 19.164957 | 0.137 ± 0.011 % | 5.067 ± 0.058 % |
| 8.50 | 96.75% | 1.212194 ± 0.004012 | 63.812622 ± 1.568517 | 0.125588 ± 0.002229 | 36.403027 | 18.989708 | 0.131 ± 0.010 % | 4.896 ± 0.057 % |
| 9.00 | 96.76% | 1.204192 ± 0.003980 | 61.406117 ± 1.541008 | 0.123881 ± 0.002209 | 36.165089 | 18.446331 | 0.154 ± 0.010 % | 4.867 ± 0.058 % |
| 9.50 | 96.74% | 1.206242 ± 0.003997 | 62.022543 ± 1.550053 | 0.123900 ± 0.002210 | 36.027378 | 18.774128 | 0.146 ± 0.010 % | 4.872 ± 0.057 % |
| 10.00 | 96.75% | 1.203023 ± 0.003979 | 61.054587 ± 1.537726 | 0.123887 ± 0.002199 | 36.569180 | 18.522949 | 0.153 ± 0.010 % | 4.819 ± 0.057 % |
| 10.50 | 96.74% | 1.210538 ± 0.004012 | 63.314626 ± 1.564145 | 0.122872 ± 0.002194 | 36.219513 | 18.574490 | 0.138 ± 0.010 % | 4.842 ± 0.058 % |
| 11.00 | 96.74% | 1.213551 ± 0.004025 | 64.220636 ± 1.575624 | 0.122960 ± 0.002205 | 37.248238 | 18.881664 | 0.125 ± 0.010 % | 4.840 ± 0.057 % |
| 11.50 | 96.75% | 1.209483 ± 0.004002 | 62.997316 ± 1.559427 | 0.123296 ± 0.002196 | 36.439632 | 18.905373 | 0.137 ± 0.010 % | 4.850 ± 0.057 % |
| 12.00 | 96.75% | 1.207165 ± 0.003990 | 62.300073 ± 1.550363 | 0.123031 ± 0.002189 | 36.319935 | 18.708921 | 0.141 ± 0.010 % | 4.828 ± 0.057 % |
| 12.50 | 96.73% | 1.203487 ± 0.003989 | 61.194047 ± 1.540763 | 0.122924 ± 0.002186 | 36.546139 | 18.402393 | 0.157 ± 0.010 % | 4.856 ± 0.058 % |
| 13.00 | 96.73% | 1.207328 ± 0.004005 | 62.349166 ± 1.554780 | 0.122282 ± 0.002186 | 34.890934 | 18.439240 | 0.147 ± 0.010 % | 4.846 ± 0.058 % |
| 13.50 | 96.73% | 1.201897 ± 0.003983 | 60.715976 ± 1.534983 | 0.123082 ± 0.002199 | 35.561474 | 18.604710 | 0.150 ± 0.010 % | 4.833 ± 0.058 % |
| 14.00 | 96.76% | 1.206603 ± 0.003988 | 62.131097 ± 1.548757 | 0.122074 ± 0.002183 | 36.555859 | 18.393436 | 0.148 ± 0.010 % | 4.865 ± 0.059 % |
| 14.50 | 96.77% | 1.207201 ± 0.003984 | 62.311014 ± 1.549158 | 0.120715 ± 0.002176 | 37.773457 | 18.576935 | 0.131 ± 0.010 % | 4.763 ± 0.057 % |
| 15.00 | 96.75% | 1.207969 ± 0.004000 | 62.541896 ± 1.555455 | 0.123216 ± 0.002222 | 36.987923 | 19.083401 | 0.150 ± 0.010 % | 4.777 ± 0.057 % |
- Downloads last month
- 9,370
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.