Experimental global target bits‑per‑weight quantization of unsloth/Qwen3-Embedding-0.6B

  • Using non-standard (forked) LLaMA C++ branch for quantization.
  • Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
  • Using dataset sources: tools, text_en, text_ru.
  • Using dataset chunks: 250.
  • Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
  • Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW PPL correlation PPL mean ratio ΔPPL Mean KLD Maximum KLD 99.9% KLD Mean Δp RMS Δp
5.00 97.51% 1.229589 ± 0.004913 123.695152 ± 3.455547 0.254131 ± 0.001247 13.281865 3.295894 -0.945 ± 0.030 % 7.861 ± 0.073 %
5.25 97.93% 1.198584 ± 0.004387 106.990729 ± 3.089464 0.201051 ± 0.001020 16.191372 2.680833 -0.726 ± 0.027 % 7.070 ± 0.070 %
5.30 98.05% 1.199072 ± 0.004269 107.253986 ± 3.067627 0.181907 ± 0.000926 11.642469 2.371547 -0.680 ± 0.026 % 6.811 ± 0.067 %
5.50 98.42% 1.143577 ± 0.003665 77.354913 ± 2.488164 0.132384 ± 0.000693 9.056372 1.854777 -0.483 ± 0.022 % 5.860 ± 0.061 %
5.75 98.67% 1.107662 ± 0.003264 58.005137 ± 2.117964 0.097616 ± 0.000599 11.806184 1.765460 -0.368 ± 0.019 % 5.023 ± 0.062 %
5.80 98.72% 1.117510 ± 0.003238 63.310457 ± 2.168267 0.092513 ± 0.000578 10.540298 1.608747 -0.386 ± 0.019 % 4.906 ± 0.059 %
6.00 98.71% 1.125959 ± 0.003273 67.862792 ± 2.234984 0.092148 ± 0.000585 12.128985 1.559407 -0.435 ± 0.019 % 4.931 ± 0.061 %
6.25 98.89% 1.082024 ± 0.002912 44.191950 ± 1.831029 0.067634 ± 0.000431 7.849868 1.006397 -0.264 ± 0.016 % 4.210 ± 0.055 %
6.30 98.94% 1.089430 ± 0.002886 48.181944 ± 1.865568 0.062879 ± 0.000377 7.467944 0.950639 -0.262 ± 0.016 % 4.112 ± 0.055 %
6.50 99.07% 1.110195 ± 0.002773 59.369600 ± 1.962020 0.046909 ± 0.000273 5.759321 0.702734 -0.293 ± 0.013 % 3.510 ± 0.044 %
6.75 99.22% 1.071155 ± 0.002479 38.336145 ± 1.614146 0.027021 ± 0.000217 6.292867 0.423712 -0.069 ± 0.011 % 2.756 ± 0.051 %
6.80 99.23% 1.079566 ± 0.002483 42.867759 ± 1.667153 0.026098 ± 0.000205 6.737493 0.397282 -0.098 ± 0.010 % 2.714 ± 0.051 %
7.00 99.24% 1.083987 ± 0.002490 45.249386 ± 1.706124 0.023685 ± 0.000190 6.608492 0.377540 -0.084 ± 0.010 % 2.559 ± 0.049 %
7.25 99.27% 1.088304 ± 0.002447 47.575202 ± 1.717173 0.019738 ± 0.000130 2.865925 0.332578 -0.090 ± 0.009 % 2.290 ± 0.037 %
7.30 99.28% 1.084746 ± 0.002430 45.658594 ± 1.686292 0.019120 ± 0.000148 4.343514 0.295245 -0.092 ± 0.009 % 2.291 ± 0.048 %
7.50 99.29% 1.085544 ± 0.002411 46.088344 ± 1.685529 0.017551 ± 0.000142 4.905227 0.298486 -0.078 ± 0.008 % 2.215 ± 0.050 %
7.75 99.32% 1.091225 ± 0.002385 49.149079 ± 1.714694 0.014173 ± 0.000128 4.346512 0.237379 -0.101 ± 0.008 % 2.013 ± 0.052 %
7.80 99.29% 1.083209 ± 0.002410 44.830380 ± 1.669455 0.016992 ± 0.000133 3.100509 0.285571 -0.077 ± 0.008 % 2.162 ± 0.041 %
8.00 99.31% 1.080260 ± 0.002373 43.241571 ± 1.636024 0.015367 ± 0.000130 4.035001 0.269038 -0.060 ± 0.008 % 2.071 ± 0.049 %
8.25 99.33% 1.088309 ± 0.002352 47.578112 ± 1.683232 0.012024 ± 0.000105 4.370751 0.192501 -0.089 ± 0.007 % 1.860 ± 0.044 %
8.30 99.34% 1.081929 ± 0.002328 44.140973 ± 1.628892 0.011498 ± 0.000084 1.861967 0.182563 -0.075 ± 0.007 % 1.784 ± 0.037 %
8.50 99.36% 1.078487 ± 0.002286 42.286422 ± 1.587902 0.009095 ± 0.000081 2.475720 0.147575 -0.064 ± 0.006 % 1.635 ± 0.048 %
8.75 99.37% 1.078905 ± 0.002274 42.511791 ± 1.587059 0.007852 ± 0.000073 2.163608 0.118868 -0.054 ± 0.006 % 1.535 ± 0.049 %
8.80 99.37% 1.078538 ± 0.002273 42.314029 ± 1.583733 0.007742 ± 0.000079 2.609957 0.127498 -0.053 ± 0.006 % 1.533 ± 0.053 %
9.00 99.37% 1.077841 ± 0.002265 41.938514 ± 1.575722 0.007380 ± 0.000073 2.547039 0.125235 -0.050 ± 0.006 % 1.499 ± 0.053 %
9.25 99.37% 1.075351 ± 0.002256 40.596804 ± 1.555539 0.006905 ± 0.000066 2.041534 0.103068 -0.042 ± 0.006 % 1.448 ± 0.044 %
9.30 99.37% 1.071370 ± 0.002244 38.451639 ± 1.524417 0.006834 ± 0.000062 1.959199 0.105753 -0.031 ± 0.006 % 1.444 ± 0.040 %
9.50 99.38% 1.073834 ± 0.002244 39.779313 ± 1.540846 0.006470 ± 0.000062 1.991340 0.098428 -0.041 ± 0.005 % 1.400 ± 0.041 %
9.75 99.38% 1.074517 ± 0.002240 40.147460 ± 1.543907 0.006206 ± 0.000068 2.638701 0.101805 -0.038 ± 0.005 % 1.388 ± 0.053 %
9.80 99.38% 1.076663 ± 0.002244 41.303655 ± 1.559774 0.006146 ± 0.000069 2.405736 0.094807 -0.044 ± 0.005 % 1.401 ± 0.057 %
10.00 99.38% 1.076016 ± 0.002240 40.955259 ± 1.553864 0.005796 ± 0.000055 2.059185 0.095681 -0.049 ± 0.005 % 1.313 ± 0.039 %
10.25 99.39% 1.074654 ± 0.002230 40.220957 ± 1.540922 0.005526 ± 0.000061 2.114943 0.099634 -0.043 ± 0.005 % 1.315 ± 0.050 %
10.30 99.39% 1.072069 ± 0.002222 38.828737 ± 1.519978 0.005418 ± 0.000059 2.062696 0.094137 -0.038 ± 0.005 % 1.288 ± 0.047 %
10.50 99.39% 1.070241 ± 0.002212 37.843815 ± 1.503987 0.005140 ± 0.000048 1.621503 0.087822 -0.032 ± 0.005 % 1.218 ± 0.028 %
10.75 99.39% 1.066075 ± 0.002195 35.599290 ± 1.470596 0.004415 ± 0.000060 2.666409 0.074945 -0.011 ± 0.004 % 1.152 ± 0.049 %
10.80 99.40% 1.066699 ± 0.002195 35.935359 ± 1.473675 0.004566 ± 0.000051 2.669465 0.074244 -0.025 ± 0.004 % 1.109 ± 0.017 %
11.00 99.40% 1.071004 ± 0.002207 38.254894 ± 1.509298 0.004029 ± 0.000035 1.280535 0.060520 -0.019 ± 0.004 % 1.112 ± 0.045 %
11.25 99.40% 1.071390 ± 0.002201 38.462580 ± 1.508764 0.003687 ± 0.000024 0.614354 0.054238 -0.025 ± 0.004 % 0.995 ± 0.011 %
11.30 99.40% 1.071772 ± 0.002202 38.668624 ± 1.511744 0.003665 ± 0.000024 0.613789 0.055209 -0.023 ± 0.004 % 0.984 ± 0.011 %
11.50 99.40% 1.072891 ± 0.002200 39.271504 ± 1.518324 0.003498 ± 0.000022 0.426572 0.052573 -0.031 ± 0.004 % 0.977 ± 0.013 %
11.75 99.41% 1.067707 ± 0.002184 36.478332 ± 1.478998 0.002967 ± 0.000023 0.803354 0.045490 -0.001 ± 0.003 % 0.895 ± 0.013 %
11.80 99.41% 1.065799 ± 0.002180 35.450545 ± 1.465811 0.002931 ± 0.000021 0.697018 0.045669 0.006 ± 0.003 % 0.899 ± 0.018 %
Downloads last month
18,294
GGUF
Model size
0.6B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ENOSYS/Qwen3-Embedding-0.6B-250-v1-GGUF

Quantized
(2)
this model

Dataset used to train ENOSYS/Qwen3-Embedding-0.6B-250-v1-GGUF