Upload folder using huggingface_hub
Browse files- README.md +65 -108
- graph_new.png +0 -0
- graph_old.png +0 -0
- snowflake2_m_uint8.onnx +2 -2
README.md
CHANGED
|
@@ -87,145 +87,102 @@ language:
|
|
| 87 |
- yo
|
| 88 |
- zh
|
| 89 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 90 |
# snowflake2_m_uint8
|
| 91 |
|
| 92 |
This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
|
| 93 |
|
| 94 |
-
I have added a linear quantization node before the `
|
| 95 |
|
| 96 |
This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
|
| 97 |
|
|
|
|
|
|
|
| 98 |
# Quantization method
|
| 99 |
|
| 100 |
-
Linear quantization for the scale -
|
| 101 |
|
| 102 |
Here's what the graph of the original output looks like:
|
| 103 |
|
| 104 |
-
 I generate embeddings for each token in this model. I do this with the original model, and my quantized output model
|
| 119 |
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
3) I compare the models by querying a token on one model, then the other model, and seeing how different the results are
|
| 123 |
-
|
| 124 |
-
For instance:
|
| 125 |
-
|
| 126 |
-
When I query the embedding for token 0, limit=10 using `model_uint8.onnx` I get the top result here.
|
| 127 |
-
Same query for this model is the bottom result.
|
| 128 |
|
| 129 |
```
|
| 130 |
-
|
| 131 |
-
|
|
|
|
| 132 |
```
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
My benchmark here is measuring how often this happens.
|
| 137 |
-
|
| 138 |
-
The code for reproducing this benchmark is located in this repo in [benchmark.py](./benchmark.py)
|
| 139 |
-
|
| 140 |
-
...
|
| 141 |
-
|
| 142 |
-
Here are the results for [model_uint8.onnx](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0/blob/main/onnx/model_uint8.onnx) vs my model here. Exact means the same tokens were in the same position. 'off by 1' means the correct token was in the results, but it was in a position 1 away from the original position. 'missing' means that a token which was present in the original query wasn't found in the results for my model.
|
| 143 |
-
|
| 144 |
-
Note that discrepancies here don't necessarily mean *wrong* results, just *different* results. The best way to see differences is to test directly on your own data and see if the results are to your liking.
|
| 145 |
-
|
| 146 |
-
```
|
| 147 |
-
Stats for top 10 query results across entire token range:
|
| 148 |
-
exact : 76.18%
|
| 149 |
-
off by 1 : 19.77%
|
| 150 |
-
off by 2 : 2.72%
|
| 151 |
-
off by 3 : 0.54%
|
| 152 |
-
off by 4 : 0.12%
|
| 153 |
-
off by 5+: 0.04%
|
| 154 |
-
missing : 0.63%
|
| 155 |
-
|
| 156 |
-
Stats for top 20 query results across entire token range:
|
| 157 |
-
exact : 65.86%
|
| 158 |
-
off by 1 : 25.00%
|
| 159 |
-
off by 2 : 5.87%
|
| 160 |
-
off by 3 : 1.68%
|
| 161 |
-
off by 4 : 0.53%
|
| 162 |
-
off by 5+: 0.27%
|
| 163 |
-
missing : 0.78%
|
| 164 |
-
|
| 165 |
-
Stats for top 50 query results across entire token range:
|
| 166 |
-
exact : 48.54%
|
| 167 |
-
off by 1 : 29.09%
|
| 168 |
-
off by 2 : 11.35%
|
| 169 |
-
off by 3 : 5.02%
|
| 170 |
-
off by 4 : 2.38%
|
| 171 |
-
off by 5+: 2.36%
|
| 172 |
-
missing : 1.26%
|
| 173 |
-
```
|
| 174 |
-
|
| 175 |
-
Here are the results for [model_fp16.onnx](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0/blob/main/onnx/model_fp16.onnx) vs [model_uint8.onnx](https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0/blob/main/onnx/model_uint8.onnx):
|
| 176 |
|
| 177 |
```
|
| 178 |
-
|
|
|
|
|
|
|
| 179 |
```
|
| 180 |
|
| 181 |
-
|
| 182 |
|
| 183 |
-
|
| 184 |
-
tats for top 10 query results across entire token range:
|
| 185 |
-
exact : 86.65%
|
| 186 |
-
off by 1 : 12.45%
|
| 187 |
-
off by 2 : 0.44%
|
| 188 |
-
off by 3 : 0.06%
|
| 189 |
-
off by 4 : 0.01%
|
| 190 |
-
off by 5+: 0.01%
|
| 191 |
-
missing : 0.38%
|
| 192 |
-
|
| 193 |
-
Stats for top 20 query results across entire token range:
|
| 194 |
-
exact : 83.34%
|
| 195 |
-
off by 1 : 14.81%
|
| 196 |
-
off by 2 : 1.11%
|
| 197 |
-
off by 3 : 0.20%
|
| 198 |
-
off by 4 : 0.05%
|
| 199 |
-
off by 5+: 0.03%
|
| 200 |
-
missing : 0.47%
|
| 201 |
-
|
| 202 |
-
Stats for top 50 query results across entire token range:
|
| 203 |
-
exact : 75.57%
|
| 204 |
-
off by 1 : 19.34%
|
| 205 |
-
off by 2 : 3.08%
|
| 206 |
-
off by 3 : 0.85%
|
| 207 |
-
off by 4 : 0.28%
|
| 208 |
-
off by 5+: 0.19%
|
| 209 |
-
missing : 0.69%
|
| 210 |
-
```
|
| 211 |
-
# Example inference code
|
| 212 |
|
| 213 |
```python
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 219 |
)
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
)
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 227 |
)
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
|
|
|
|
|
|
| 231 |
```
|
|
|
|
| 87 |
- yo
|
| 88 |
- zh
|
| 89 |
---
|
| 90 |
+
# Update
|
| 91 |
+
|
| 92 |
+
I've updated this model to be compatible with Fastembed.
|
| 93 |
+
|
| 94 |
+
I removed the `sentence_embedding` output and quantized the main model output instead. This now outputs a shape 768 multivector.
|
| 95 |
+
|
| 96 |
+
To use the output you should use CLS pooling with normalization disabled.
|
| 97 |
+
|
| 98 |
# snowflake2_m_uint8
|
| 99 |
|
| 100 |
This is a slightly modified version of the uint8 quantized ONNX model from https://huggingface.co/Snowflake/snowflake-arctic-embed-m-v2.0
|
| 101 |
|
| 102 |
+
I have added a linear quantization node before the `token_embeddings` output so that it directly outputs a dimension 768 uint8 multivector.
|
| 103 |
|
| 104 |
This is compatible with the [qdrant](https://github.com/qdrant/qdrant) uint8 datatype for collections.
|
| 105 |
|
| 106 |
+
I took the liberty of removing the `sentence_embedding` output, I can add it back in if anybody wants it.
|
| 107 |
+
|
| 108 |
# Quantization method
|
| 109 |
|
| 110 |
+
Linear quantization for the scale -7 to 7.
|
| 111 |
|
| 112 |
Here's what the graph of the original output looks like:
|
| 113 |
|
| 114 |
+

|
| 115 |
|
| 116 |
Here's what the new graph in this model looks like:
|
| 117 |
|
| 118 |
+

|
| 119 |
|
| 120 |
# Benchmark
|
| 121 |
|
| 122 |
+
I used beir-qdrant with the scifact dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
|
|
|
|
| 124 |
|
| 125 |
+
quantized output (this model):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
```
|
| 128 |
+
ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.64619, 'NDCG@5': 0.6687, 'NDCG@10': 0.69228, 'NDCG@100': 0.72204, 'NDCG@1000': 0.72747}
|
| 129 |
+
recall: {'Recall@1': 0.56094, 'Recall@3': 0.68394, 'Recall@5': 0.73983, 'Recall@10': 0.80689, 'Recall@100': 0.94833, 'Recall@1000': 0.99333}
|
| 130 |
+
precision: {'P@1': 0.59333, 'P@3': 0.25, 'P@5': 0.16467, 'P@10': 0.09167, 'P@100': 0.01077, 'P@1000': 0.00112}
|
| 131 |
```
|
| 132 |
|
| 133 |
+
unquantized output (model_uint8.onnx):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 134 |
|
| 135 |
```
|
| 136 |
+
ndcg: {'NDCG@1': 0.59333, 'NDCG@3': 0.65417, 'NDCG@5': 0.6741, 'NDCG@10': 0.69675, 'NDCG@100': 0.7242, 'NDCG@1000': 0.7305}
|
| 137 |
+
recall: {'Recall@1': 0.56094, 'Recall@3': 0.69728, 'Recall@5': 0.74817, 'Recall@10': 0.81356, 'Recall@100': 0.945, 'Recall@1000': 0.99667}
|
| 138 |
+
precision: {'P@1': 0.59333, 'P@3': 0.25444, 'P@5': 0.16667, 'P@10': 0.09233, 'P@100': 0.01073, 'P@1000': 0.00113}
|
| 139 |
```
|
| 140 |
|
| 141 |
+
# Example inference/benchmark code and how to use the model with Fastembed
|
| 142 |
|
| 143 |
+
After installing beir-qdrant make sure to upgrade fastembed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
```python
|
| 146 |
+
# pip install qdrant_client beir-qdrant
|
| 147 |
+
# pip install -U fastembed
|
| 148 |
+
from fastembed import TextEmbedding
|
| 149 |
+
from fastembed.common.model_description import PoolingType, ModelSource
|
| 150 |
+
from beir import util
|
| 151 |
+
from beir.datasets.data_loader import GenericDataLoader
|
| 152 |
+
from beir.retrieval.evaluation import EvaluateRetrieval
|
| 153 |
+
from qdrant_client import QdrantClient
|
| 154 |
+
from qdrant_client.models import Datatype
|
| 155 |
+
from beir_qdrant.retrieval.models.fastembed import DenseFastEmbedModelAdapter
|
| 156 |
+
from beir_qdrant.retrieval.search.dense import DenseQdrantSearch
|
| 157 |
+
|
| 158 |
+
TextEmbedding.add_custom_model(
|
| 159 |
+
model="electroglyph/snowflake2_m_uint8",
|
| 160 |
+
pooling=PoolingType.CLS,
|
| 161 |
+
normalization=False,
|
| 162 |
+
sources=ModelSource(hf="electroglyph/snowflake2_m_uint8"),
|
| 163 |
+
dim=768,
|
| 164 |
+
model_file="snowflake2_m_uint8.onnx",
|
| 165 |
)
|
| 166 |
+
|
| 167 |
+
dataset = "scifact"
|
| 168 |
+
url = "https://public.ukp.informatik.tu-darmstadt.de/thakur/BEIR/datasets/{}.zip".format(dataset)
|
| 169 |
+
data_path = util.download_and_unzip(url, "datasets")
|
| 170 |
+
corpus, queries, qrels = GenericDataLoader(data_folder=data_path).load(split="test")
|
| 171 |
+
|
| 172 |
+
qdrant_client = QdrantClient("http://localhost:6333")
|
| 173 |
+
|
| 174 |
+
model = DenseQdrantSearch(
|
| 175 |
+
qdrant_client,
|
| 176 |
+
model=DenseFastEmbedModelAdapter(
|
| 177 |
+
model_name="electroglyph/snowflake2_m_uint8"
|
| 178 |
+
),
|
| 179 |
+
collection_name="scifact-uint8",
|
| 180 |
+
initialize=True,
|
| 181 |
+
datatype=Datatype.UINT8
|
| 182 |
)
|
| 183 |
+
retriever = EvaluateRetrieval(model)
|
| 184 |
+
results = retriever.retrieve(corpus, queries)
|
| 185 |
+
|
| 186 |
+
ndcg, _map, recall, precision = retriever.evaluate(qrels, results, retriever.k_values)
|
| 187 |
+
print(f"ndcg: {ndcg}\nrecall: {recall}\nprecision: {precision}")
|
| 188 |
```
|
graph_new.png
ADDED
|
graph_old.png
ADDED
|
snowflake2_m_uint8.onnx
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1c8c12c07ce3a6f23519c6db127a8129df264288b2a42457883308335bfbd901
|
| 3 |
+
size 310915658
|