Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-4B

  • Using non-standard (forked) LLaMA C++ branch for quantization.
  • Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
  • Using dataset sources: text_en, text_ru.
  • Using dataset chunks: 750.
  • Small set of patches added.
  • Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
  • Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS PPL correlation PPL mean ratio ΔPPL Mean KLD Maximum KLD 99.9% KLD Mean Δp RMS Δp
3.50 88.91% 1.059237 ± 0.005791 17.814284 ± 1.729713 1.168223 ± 0.005542 42.036110 25.354170 -1.391 ± 0.037 % 17.793 ± 0.064 %
4.00 91.97% 1.124329 ± 0.005392 37.389267 ± 1.678855 0.809469 ± 0.004837 44.454895 24.103703 -0.450 ± 0.030 % 14.205 ± 0.061 %
4.50 94.56% 1.312286 ± 0.005436 93.912806 ± 2.084847 0.449165 ± 0.003776 36.050694 23.000446 -0.433 ± 0.022 % 10.308 ± 0.056 %
5.00 95.14% 1.287066 ± 0.005078 86.328447 ± 1.959110 0.353164 ± 0.003435 40.124615 22.546082 -0.282 ± 0.019 % 8.891 ± 0.056 %
5.50 95.95% 1.185050 ± 0.004268 55.649627 ± 1.544628 0.234975 ± 0.002845 35.043465 20.616793 0.127 ± 0.015 % 7.105 ± 0.054 %
6.00 96.31% 1.181371 ± 0.004077 54.543325 ± 1.493333 0.187329 ± 0.002574 34.484653 20.085846 0.138 ± 0.013 % 6.297 ± 0.056 %
6.50 96.58% 1.192068 ± 0.004003 57.760093 ± 1.508316 0.156484 ± 0.002392 32.430752 19.101099 0.152 ± 0.012 % 5.546 ± 0.056 %
7.00 96.60% 1.212270 ± 0.004081 63.835350 ± 1.579181 0.146085 ± 0.002389 33.821136 19.701403 0.111 ± 0.011 % 5.325 ± 0.057 %
7.50 96.63% 1.208265 ± 0.004054 62.630910 ± 1.564522 0.139075 ± 0.002314 36.313965 19.207874 0.110 ± 0.011 % 5.203 ± 0.057 %
8.00 96.68% 1.209469 ± 0.004038 62.993184 ± 1.565838 0.134810 ± 0.002289 34.888683 19.164957 0.137 ± 0.011 % 5.067 ± 0.058 %
8.50 96.75% 1.212194 ± 0.004012 63.812622 ± 1.568517 0.125588 ± 0.002229 36.403027 18.989708 0.131 ± 0.010 % 4.896 ± 0.057 %
9.00 96.76% 1.204192 ± 0.003980 61.406117 ± 1.541008 0.123881 ± 0.002209 36.165089 18.446331 0.154 ± 0.010 % 4.867 ± 0.058 %
9.50 96.74% 1.206242 ± 0.003997 62.022543 ± 1.550053 0.123900 ± 0.002210 36.027378 18.774128 0.146 ± 0.010 % 4.872 ± 0.057 %
10.00 96.75% 1.203023 ± 0.003979 61.054587 ± 1.537726 0.123887 ± 0.002199 36.569180 18.522949 0.153 ± 0.010 % 4.819 ± 0.057 %
10.50 96.74% 1.210538 ± 0.004012 63.314626 ± 1.564145 0.122872 ± 0.002194 36.219513 18.574490 0.138 ± 0.010 % 4.842 ± 0.058 %
11.00 96.74% 1.213551 ± 0.004025 64.220636 ± 1.575624 0.122960 ± 0.002205 37.248238 18.881664 0.125 ± 0.010 % 4.840 ± 0.057 %
11.50 96.75% 1.209483 ± 0.004002 62.997316 ± 1.559427 0.123296 ± 0.002196 36.439632 18.905373 0.137 ± 0.010 % 4.850 ± 0.057 %
12.00 96.75% 1.207165 ± 0.003990 62.300073 ± 1.550363 0.123031 ± 0.002189 36.319935 18.708921 0.141 ± 0.010 % 4.828 ± 0.057 %
12.50 96.73% 1.203487 ± 0.003989 61.194047 ± 1.540763 0.122924 ± 0.002186 36.546139 18.402393 0.157 ± 0.010 % 4.856 ± 0.058 %
13.00 96.73% 1.207328 ± 0.004005 62.349166 ± 1.554780 0.122282 ± 0.002186 34.890934 18.439240 0.147 ± 0.010 % 4.846 ± 0.058 %
13.50 96.73% 1.201897 ± 0.003983 60.715976 ± 1.534983 0.123082 ± 0.002199 35.561474 18.604710 0.150 ± 0.010 % 4.833 ± 0.058 %
14.00 96.76% 1.206603 ± 0.003988 62.131097 ± 1.548757 0.122074 ± 0.002183 36.555859 18.393436 0.148 ± 0.010 % 4.865 ± 0.059 %
14.50 96.77% 1.207201 ± 0.003984 62.311014 ± 1.549158 0.120715 ± 0.002176 37.773457 18.576935 0.131 ± 0.010 % 4.763 ± 0.057 %
15.00 96.75% 1.207969 ± 0.004000 62.541896 ± 1.555455 0.123216 ± 0.002222 36.987923 19.083401 0.150 ± 0.010 % 4.777 ± 0.057 %
Downloads last month
9,370
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ENOSYS/Octen-Embedding-4B-750-v1-GGUF

Quantized
(3)
this model

Dataset used to train ENOSYS/Octen-Embedding-4B-750-v1-GGUF