gregtatum
/

static-embeddings

static-embeddings

Model card Files Files and versions

gregtatum commited on Sep 23, 2025

Commit

0c5c68a

·

1 Parent(s): f2fce80

Add a precisions table

Files changed (1) hide show

README.md +17 -5

README.md CHANGED Viewed

@@ -42,11 +42,23 @@ to end to test in Firefox on some vectors here was the cosine similarity for the
 mean pooled result. Note that the vector math happens in the f32 space, but storage
 for the embeddings is in a lower precision.
-f32 vs f16: cosine similarity = 1.00000000<br/>
- → They are essentially identical in direction.
-f32 vs f8: cosine similarity = 0.99956375<br/>
- → Very close, only tiny quantization effects.
 Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
 has more loss.

 mean pooled result. Note that the vector math happens in the f32 space, but storage
 for the embeddings is in a lower precision.
+> f32 vs f16: cosine similarity = 1.00000000<br/>
+> → They are essentially identical in direction.
+>
+> f32 vs f8: cosine similarity = 0.99956375<br/>
+> → Very close, only tiny quantization effects.
 Note that this was done on the `torch.float8_e4m3fn`, while `torch.float8_e5m2` generally
 has more loss.
+Precision also affects download size. For instance with larger
+[minishlab/potion-multilingual-128M/](models/minishlab/potion-multilingual-128M/README.md)
+model. The `fp32` is 228M compressed, while only 51M for `fp8_e4m3`, which has competetive
+quantization values.
+| precision     | dimensions | size    |
+| ------------- | ---------- | ------- |
+| fp32          | 128        | 228M    |
+| fp16          | 128        | 114M    |
+| **fp8_e4m3**  | 128        | **51M** |
+| fp8_e5m2      | 128        |  44M    |