Experimental global target bits‑per‑weight quantization of unsloth/Qwen3-Embedding-0.6B

Using non-standard (forked) LLaMA C++ branch for quantization.
Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
Using dataset sources: tools, text_en, text_ru.
Using dataset chunks: 250.
Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW	PPL correlation	PPL mean ratio	ΔPPL	Mean KLD	Maximum KLD	99.9% KLD	Mean Δp	RMS Δp
5.00	97.51%	1.229589 ± 0.004913	123.695152 ± 3.455547	0.254131 ± 0.001247	13.281865	3.295894	-0.945 ± 0.030 %	7.861 ± 0.073 %
5.25	97.93%	1.198584 ± 0.004387	106.990729 ± 3.089464	0.201051 ± 0.001020	16.191372	2.680833	-0.726 ± 0.027 %	7.070 ± 0.070 %
5.30	98.05%	1.199072 ± 0.004269	107.253986 ± 3.067627	0.181907 ± 0.000926	11.642469	2.371547	-0.680 ± 0.026 %	6.811 ± 0.067 %
5.50	98.42%	1.143577 ± 0.003665	77.354913 ± 2.488164	0.132384 ± 0.000693	9.056372	1.854777	-0.483 ± 0.022 %	5.860 ± 0.061 %
5.75	98.67%	1.107662 ± 0.003264	58.005137 ± 2.117964	0.097616 ± 0.000599	11.806184	1.765460	-0.368 ± 0.019 %	5.023 ± 0.062 %
5.80	98.72%	1.117510 ± 0.003238	63.310457 ± 2.168267	0.092513 ± 0.000578	10.540298	1.608747	-0.386 ± 0.019 %	4.906 ± 0.059 %
6.00	98.71%	1.125959 ± 0.003273	67.862792 ± 2.234984	0.092148 ± 0.000585	12.128985	1.559407	-0.435 ± 0.019 %	4.931 ± 0.061 %
6.25	98.89%	1.082024 ± 0.002912	44.191950 ± 1.831029	0.067634 ± 0.000431	7.849868	1.006397	-0.264 ± 0.016 %	4.210 ± 0.055 %
6.30	98.94%	1.089430 ± 0.002886	48.181944 ± 1.865568	0.062879 ± 0.000377	7.467944	0.950639	-0.262 ± 0.016 %	4.112 ± 0.055 %
6.50	99.07%	1.110195 ± 0.002773	59.369600 ± 1.962020	0.046909 ± 0.000273	5.759321	0.702734	-0.293 ± 0.013 %	3.510 ± 0.044 %
6.75	99.22%	1.071155 ± 0.002479	38.336145 ± 1.614146	0.027021 ± 0.000217	6.292867	0.423712	-0.069 ± 0.011 %	2.756 ± 0.051 %
6.80	99.23%	1.079566 ± 0.002483	42.867759 ± 1.667153	0.026098 ± 0.000205	6.737493	0.397282	-0.098 ± 0.010 %	2.714 ± 0.051 %
7.00	99.24%	1.083987 ± 0.002490	45.249386 ± 1.706124	0.023685 ± 0.000190	6.608492	0.377540	-0.084 ± 0.010 %	2.559 ± 0.049 %
7.25	99.27%	1.088304 ± 0.002447	47.575202 ± 1.717173	0.019738 ± 0.000130	2.865925	0.332578	-0.090 ± 0.009 %	2.290 ± 0.037 %
7.30	99.28%	1.084746 ± 0.002430	45.658594 ± 1.686292	0.019120 ± 0.000148	4.343514	0.295245	-0.092 ± 0.009 %	2.291 ± 0.048 %
7.50	99.29%	1.085544 ± 0.002411	46.088344 ± 1.685529	0.017551 ± 0.000142	4.905227	0.298486	-0.078 ± 0.008 %	2.215 ± 0.050 %
7.75	99.32%	1.091225 ± 0.002385	49.149079 ± 1.714694	0.014173 ± 0.000128	4.346512	0.237379	-0.101 ± 0.008 %	2.013 ± 0.052 %
7.80	99.29%	1.083209 ± 0.002410	44.830380 ± 1.669455	0.016992 ± 0.000133	3.100509	0.285571	-0.077 ± 0.008 %	2.162 ± 0.041 %
8.00	99.31%	1.080260 ± 0.002373	43.241571 ± 1.636024	0.015367 ± 0.000130	4.035001	0.269038	-0.060 ± 0.008 %	2.071 ± 0.049 %
8.25	99.33%	1.088309 ± 0.002352	47.578112 ± 1.683232	0.012024 ± 0.000105	4.370751	0.192501	-0.089 ± 0.007 %	1.860 ± 0.044 %
8.30	99.34%	1.081929 ± 0.002328	44.140973 ± 1.628892	0.011498 ± 0.000084	1.861967	0.182563	-0.075 ± 0.007 %	1.784 ± 0.037 %
8.50	99.36%	1.078487 ± 0.002286	42.286422 ± 1.587902	0.009095 ± 0.000081	2.475720	0.147575	-0.064 ± 0.006 %	1.635 ± 0.048 %
8.75	99.37%	1.078905 ± 0.002274	42.511791 ± 1.587059	0.007852 ± 0.000073	2.163608	0.118868	-0.054 ± 0.006 %	1.535 ± 0.049 %
8.80	99.37%	1.078538 ± 0.002273	42.314029 ± 1.583733	0.007742 ± 0.000079	2.609957	0.127498	-0.053 ± 0.006 %	1.533 ± 0.053 %
9.00	99.37%	1.077841 ± 0.002265	41.938514 ± 1.575722	0.007380 ± 0.000073	2.547039	0.125235	-0.050 ± 0.006 %	1.499 ± 0.053 %
9.25	99.37%	1.075351 ± 0.002256	40.596804 ± 1.555539	0.006905 ± 0.000066	2.041534	0.103068	-0.042 ± 0.006 %	1.448 ± 0.044 %
9.30	99.37%	1.071370 ± 0.002244	38.451639 ± 1.524417	0.006834 ± 0.000062	1.959199	0.105753	-0.031 ± 0.006 %	1.444 ± 0.040 %
9.50	99.38%	1.073834 ± 0.002244	39.779313 ± 1.540846	0.006470 ± 0.000062	1.991340	0.098428	-0.041 ± 0.005 %	1.400 ± 0.041 %
9.75	99.38%	1.074517 ± 0.002240	40.147460 ± 1.543907	0.006206 ± 0.000068	2.638701	0.101805	-0.038 ± 0.005 %	1.388 ± 0.053 %
9.80	99.38%	1.076663 ± 0.002244	41.303655 ± 1.559774	0.006146 ± 0.000069	2.405736	0.094807	-0.044 ± 0.005 %	1.401 ± 0.057 %
10.00	99.38%	1.076016 ± 0.002240	40.955259 ± 1.553864	0.005796 ± 0.000055	2.059185	0.095681	-0.049 ± 0.005 %	1.313 ± 0.039 %
10.25	99.39%	1.074654 ± 0.002230	40.220957 ± 1.540922	0.005526 ± 0.000061	2.114943	0.099634	-0.043 ± 0.005 %	1.315 ± 0.050 %
10.30	99.39%	1.072069 ± 0.002222	38.828737 ± 1.519978	0.005418 ± 0.000059	2.062696	0.094137	-0.038 ± 0.005 %	1.288 ± 0.047 %
10.50	99.39%	1.070241 ± 0.002212	37.843815 ± 1.503987	0.005140 ± 0.000048	1.621503	0.087822	-0.032 ± 0.005 %	1.218 ± 0.028 %
10.75	99.39%	1.066075 ± 0.002195	35.599290 ± 1.470596	0.004415 ± 0.000060	2.666409	0.074945	-0.011 ± 0.004 %	1.152 ± 0.049 %
10.80	99.40%	1.066699 ± 0.002195	35.935359 ± 1.473675	0.004566 ± 0.000051	2.669465	0.074244	-0.025 ± 0.004 %	1.109 ± 0.017 %
11.00	99.40%	1.071004 ± 0.002207	38.254894 ± 1.509298	0.004029 ± 0.000035	1.280535	0.060520	-0.019 ± 0.004 %	1.112 ± 0.045 %
11.25	99.40%	1.071390 ± 0.002201	38.462580 ± 1.508764	0.003687 ± 0.000024	0.614354	0.054238	-0.025 ± 0.004 %	0.995 ± 0.011 %
11.30	99.40%	1.071772 ± 0.002202	38.668624 ± 1.511744	0.003665 ± 0.000024	0.613789	0.055209	-0.023 ± 0.004 %	0.984 ± 0.011 %
11.50	99.40%	1.072891 ± 0.002200	39.271504 ± 1.518324	0.003498 ± 0.000022	0.426572	0.052573	-0.031 ± 0.004 %	0.977 ± 0.013 %
11.75	99.41%	1.067707 ± 0.002184	36.478332 ± 1.478998	0.002967 ± 0.000023	0.803354	0.045490	-0.001 ± 0.003 %	0.895 ± 0.013 %
11.80	99.41%	1.065799 ± 0.002180	35.450545 ± 1.465811	0.002931 ± 0.000021	0.697018	0.045669	0.006 ± 0.003 %	0.899 ± 0.018 %

Downloads last month: 18,294

GGUF

Model size

0.6B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for ENOSYS/Qwen3-Embedding-0.6B-250-v1-GGUF

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

unsloth/Qwen3-Embedding-0.6B

Quantized

(2)

this model

ENOSYS
/

Qwen3-Embedding-0.6B-250-v1-GGUF

Experimental global target bits‑per‑weight quantization of unsloth/Qwen3-Embedding-0.6B

Quantization comparison

Model tree for ENOSYS/Qwen3-Embedding-0.6B-250-v1-GGUF

Dataset used to train ENOSYS/Qwen3-Embedding-0.6B-250-v1-GGUF