Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-4B

Using non-standard (forked) LLaMA C++ branch for quantization.
Using a CLI tool to build KLD evaluation and imatrix calibration datasets for GGUF models, sourced from eaddario/imatrix-calibration.
Using dataset sources: text_en, text_ru.
Using dataset chunks: 750.
Small set of patches added.
Tensors quantinization F16 instead of BF16, Nvidia Pascal architecture friendly like P100.
Small set of patches added.

Many thanks to Ed Addario for an impressive job.

Quantization comparison

BPW/TGS	PPL correlation	PPL mean ratio	ΔPPL	Mean KLD	Maximum KLD	99.9% KLD	Mean Δp	RMS Δp
3.50	88.91%	1.059237 ± 0.005791	17.814284 ± 1.729713	1.168223 ± 0.005542	42.036110	25.354170	-1.391 ± 0.037 %	17.793 ± 0.064 %
4.00	91.97%	1.124329 ± 0.005392	37.389267 ± 1.678855	0.809469 ± 0.004837	44.454895	24.103703	-0.450 ± 0.030 %	14.205 ± 0.061 %
4.50	94.56%	1.312286 ± 0.005436	93.912806 ± 2.084847	0.449165 ± 0.003776	36.050694	23.000446	-0.433 ± 0.022 %	10.308 ± 0.056 %
5.00	95.14%	1.287066 ± 0.005078	86.328447 ± 1.959110	0.353164 ± 0.003435	40.124615	22.546082	-0.282 ± 0.019 %	8.891 ± 0.056 %
5.50	95.95%	1.185050 ± 0.004268	55.649627 ± 1.544628	0.234975 ± 0.002845	35.043465	20.616793	0.127 ± 0.015 %	7.105 ± 0.054 %
6.00	96.31%	1.181371 ± 0.004077	54.543325 ± 1.493333	0.187329 ± 0.002574	34.484653	20.085846	0.138 ± 0.013 %	6.297 ± 0.056 %
6.50	96.58%	1.192068 ± 0.004003	57.760093 ± 1.508316	0.156484 ± 0.002392	32.430752	19.101099	0.152 ± 0.012 %	5.546 ± 0.056 %
7.00	96.60%	1.212270 ± 0.004081	63.835350 ± 1.579181	0.146085 ± 0.002389	33.821136	19.701403	0.111 ± 0.011 %	5.325 ± 0.057 %
7.50	96.63%	1.208265 ± 0.004054	62.630910 ± 1.564522	0.139075 ± 0.002314	36.313965	19.207874	0.110 ± 0.011 %	5.203 ± 0.057 %
8.00	96.68%	1.209469 ± 0.004038	62.993184 ± 1.565838	0.134810 ± 0.002289	34.888683	19.164957	0.137 ± 0.011 %	5.067 ± 0.058 %
8.50	96.75%	1.212194 ± 0.004012	63.812622 ± 1.568517	0.125588 ± 0.002229	36.403027	18.989708	0.131 ± 0.010 %	4.896 ± 0.057 %
9.00	96.76%	1.204192 ± 0.003980	61.406117 ± 1.541008	0.123881 ± 0.002209	36.165089	18.446331	0.154 ± 0.010 %	4.867 ± 0.058 %
9.50	96.74%	1.206242 ± 0.003997	62.022543 ± 1.550053	0.123900 ± 0.002210	36.027378	18.774128	0.146 ± 0.010 %	4.872 ± 0.057 %
10.00	96.75%	1.203023 ± 0.003979	61.054587 ± 1.537726	0.123887 ± 0.002199	36.569180	18.522949	0.153 ± 0.010 %	4.819 ± 0.057 %
10.50	96.74%	1.210538 ± 0.004012	63.314626 ± 1.564145	0.122872 ± 0.002194	36.219513	18.574490	0.138 ± 0.010 %	4.842 ± 0.058 %
11.00	96.74%	1.213551 ± 0.004025	64.220636 ± 1.575624	0.122960 ± 0.002205	37.248238	18.881664	0.125 ± 0.010 %	4.840 ± 0.057 %
11.50	96.75%	1.209483 ± 0.004002	62.997316 ± 1.559427	0.123296 ± 0.002196	36.439632	18.905373	0.137 ± 0.010 %	4.850 ± 0.057 %
12.00	96.75%	1.207165 ± 0.003990	62.300073 ± 1.550363	0.123031 ± 0.002189	36.319935	18.708921	0.141 ± 0.010 %	4.828 ± 0.057 %
12.50	96.73%	1.203487 ± 0.003989	61.194047 ± 1.540763	0.122924 ± 0.002186	36.546139	18.402393	0.157 ± 0.010 %	4.856 ± 0.058 %
13.00	96.73%	1.207328 ± 0.004005	62.349166 ± 1.554780	0.122282 ± 0.002186	34.890934	18.439240	0.147 ± 0.010 %	4.846 ± 0.058 %
13.50	96.73%	1.201897 ± 0.003983	60.715976 ± 1.534983	0.123082 ± 0.002199	35.561474	18.604710	0.150 ± 0.010 %	4.833 ± 0.058 %
14.00	96.76%	1.206603 ± 0.003988	62.131097 ± 1.548757	0.122074 ± 0.002183	36.555859	18.393436	0.148 ± 0.010 %	4.865 ± 0.059 %
14.50	96.77%	1.207201 ± 0.003984	62.311014 ± 1.549158	0.120715 ± 0.002176	37.773457	18.576935	0.131 ± 0.010 %	4.763 ± 0.057 %
15.00	96.75%	1.207969 ± 0.004000	62.541896 ± 1.555455	0.123216 ± 0.002222	36.987923	19.083401	0.150 ± 0.010 %	4.777 ± 0.057 %

Downloads last month: 9,370

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Model tree for ENOSYS/Octen-Embedding-4B-750-v1-GGUF

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-Embedding-4B

Finetuned

Octen/Octen-Embedding-4B

Quantized

(3)

this model

ENOSYS
/

Octen-Embedding-4B-750-v1-GGUF

Experimental global target bits‑per‑weight quantization of Octen/Octen-Embedding-4B

Quantization comparison

Model tree for ENOSYS/Octen-Embedding-4B-750-v1-GGUF

Dataset used to train ENOSYS/Octen-Embedding-4B-750-v1-GGUF