Add a precisions table
Browse files
README.md
CHANGED
|
@@ -42,11 +42,23 @@ to end to test in Firefox on some vectors here was the cosine similarity for the
|
|
| 42 |
mean pooled result. Note that the vector math happens in the f32 space, but storage
|
| 43 |
for the embeddings is in a lower precision.
|
| 44 |
|
| 45 |
-
f32 vs f16: cosine similarity = 1.00000000<br/>
|
| 46 |
-
→ They are essentially identical in direction.
|
| 47 |
-
|
| 48 |
-
f32 vs f8: cosine similarity = 0.99956375<br/>
|
| 49 |
-
→ Very close, only tiny quantization effects.
|
| 50 |
|
| 51 |
Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
|
| 52 |
has more loss.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
mean pooled result. Note that the vector math happens in the f32 space, but storage
|
| 43 |
for the embeddings is in a lower precision.
|
| 44 |
|
| 45 |
+
> f32 vs f16: cosine similarity = 1.00000000<br/>
|
| 46 |
+
> → They are essentially identical in direction.
|
| 47 |
+
>
|
| 48 |
+
> f32 vs f8: cosine similarity = 0.99956375<br/>
|
| 49 |
+
> → Very close, only tiny quantization effects.
|
| 50 |
|
| 51 |
Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
|
| 52 |
has more loss.
|
| 53 |
+
|
| 54 |
+
Precision also affects download size. For instance with larger
|
| 55 |
+
[minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
|
| 56 |
+
model. The `fp32` is 228M compressed, while only 51M for `fp8_e4m3`, which has competetive
|
| 57 |
+
quantization values.
|
| 58 |
+
|
| 59 |
+
| precision | dimensions | size |
|
| 60 |
+
| ------------- | ---------- | ------- |
|
| 61 |
+
| fp32 | 128 | 228M |
|
| 62 |
+
| fp16 | 128 | 114M |
|
| 63 |
+
| **fp8_e4m3** | 128 | **51M** |
|
| 64 |
+
| fp8_e5m2 | 128 | 44M |
|