Add YAML metadata to model card
Browse files
README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# PIXIE-Rune-v1.0 β ONNX Quantized Variants
|
| 2 |
|
| 3 |
ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
|
|
@@ -33,48 +53,46 @@ retrieval across 74 languages with specialization in Korean/English aerospace do
|
|
| 33 |
| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
|
| 34 |
| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
|
| 35 |
|
| 36 |
-
**Metrics** measured on 8 semantically diverse
|
| 37 |
-
Pearson r
|
| 38 |
MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved.
|
| 39 |
|
| 40 |
### Quantization methodology
|
| 41 |
|
| 42 |
-
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
| 48 |
- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
|
| 49 |
-
`DequantizeLinear(axis=0)` node insertion packs the
|
| 50 |
-
|
| 51 |
-
The 977 MB FP32 embedding table becomes 122 MB packed INT4.
|
| 52 |
|
| 53 |
---
|
| 54 |
|
| 55 |
## Usage
|
| 56 |
|
| 57 |
-
### fastembed (Rust
|
| 58 |
|
| 59 |
This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
|
| 60 |
|
| 61 |
```rust
|
| 62 |
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
// EmbeddingModel::PixieRuneV1Int4 // INT4+INT8 emb
|
| 67 |
-
// EmbeddingModel::PixieRuneV1Int4Full // INT4 full
|
| 68 |
-
)?;
|
| 69 |
|
| 70 |
-
|
| 71 |
-
|
| 72 |
|
| 73 |
-
|
| 74 |
-
|
| 75 |
|
| 76 |
-
|
| 77 |
-
embeddings = list(model.embed(["Hello", "World"]))
|
| 78 |
```
|
| 79 |
|
| 80 |
### ONNX Runtime (Python)
|
|
@@ -88,53 +106,68 @@ tokenizer = Tokenizer.from_file("tokenizer.json")
|
|
| 88 |
tokenizer.enable_truncation(max_length=512)
|
| 89 |
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
|
| 90 |
|
| 91 |
-
session = ort.InferenceSession("onnx/model_quantized.onnx"
|
|
|
|
| 92 |
|
| 93 |
-
texts = ["
|
| 94 |
-
|
|
|
|
|
|
|
| 95 |
ids = np.array([e.ids for e in enc], dtype=np.int64)
|
| 96 |
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
|
| 97 |
|
| 98 |
-
out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
|
| 99 |
|
| 100 |
# Mean pooling + L2 normalize
|
| 101 |
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
|
| 102 |
norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
|
| 103 |
embeddings = pooled / norms.clip(1e-12)
|
|
|
|
|
|
|
|
|
|
| 104 |
```
|
| 105 |
|
| 106 |
-
### sentence-transformers (original weights)
|
| 107 |
|
| 108 |
```python
|
| 109 |
from sentence_transformers import SentenceTransformer
|
| 110 |
|
| 111 |
model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
|
| 112 |
|
| 113 |
-
queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?"
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
|
| 116 |
q_emb = model.encode(queries, prompt_name="query")
|
| 117 |
d_emb = model.encode(documents)
|
| 118 |
scores = model.similarity(q_emb, d_emb)
|
|
|
|
| 119 |
```
|
| 120 |
|
| 121 |
---
|
| 122 |
|
| 123 |
## Quality Benchmarks (original model)
|
| 124 |
|
| 125 |
-
Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0)
|
|
|
|
| 126 |
|
| 127 |
### 6 Datasets of MTEB (Korean)
|
| 128 |
|
| 129 |
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 130 |
|---|---|---|---|---|---|---|
|
| 131 |
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
|
|
|
|
| 132 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
|
|
|
|
| 133 |
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
|
| 134 |
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
|
| 135 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
|
| 136 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
|
| 137 |
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
### 7 Datasets of BEIR (English)
|
| 140 |
|
|
@@ -142,30 +175,32 @@ Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune
|
|
| 142 |
|---|---|---|---|---|---|---|
|
| 143 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
|
| 144 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|
|
|
|
| 145 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
|
|
|
|
| 146 |
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
|
| 147 |
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
|
| 148 |
|
| 149 |
-
Benchmarks
|
| 150 |
|
| 151 |
---
|
| 152 |
|
| 153 |
## License
|
| 154 |
|
| 155 |
-
Apache 2.0 β same as the original
|
| 156 |
|
| 157 |
## Citation
|
| 158 |
|
| 159 |
```bibtex
|
| 160 |
@software{TelePIX-PIXIE-Rune-v1,
|
| 161 |
-
title={PIXIE-Rune-v1.0},
|
| 162 |
-
author={TelePIX AI Research Team and Bongmin Kim},
|
| 163 |
-
year={2025},
|
| 164 |
-
url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
|
| 165 |
}
|
| 166 |
```
|
| 167 |
|
| 168 |
## Contact
|
| 169 |
|
| 170 |
-
Original model: bmkim@telepix.net
|
| 171 |
-
ONNX quantization: [cstr](https://huggingface.co/cstr) β
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- multilingual
|
| 4 |
+
- ko
|
| 5 |
+
- en
|
| 6 |
+
license: apache-2.0
|
| 7 |
+
tags:
|
| 8 |
+
- sentence-transformers
|
| 9 |
+
- feature-extraction
|
| 10 |
+
- sentence-similarity
|
| 11 |
+
- onnx
|
| 12 |
+
- quantized
|
| 13 |
+
- xlm-roberta
|
| 14 |
+
- dense-encoder
|
| 15 |
+
- dense
|
| 16 |
+
- fastembed
|
| 17 |
+
base_model: telepix/PIXIE-Rune-v1.0
|
| 18 |
+
pipeline_tag: feature-extraction
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
# PIXIE-Rune-v1.0 β ONNX Quantized Variants
|
| 22 |
|
| 23 |
ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
|
|
|
|
| 53 |
| `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
|
| 54 |
| `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
|
| 55 |
|
| 56 |
+
**Metrics** measured on 8 semantically diverse sentences vs FP32 reference.
|
| 57 |
+
Pearson r = correlation of pairwise cosine similarity matrices (structure preservation).
|
| 58 |
MRR = Mean Reciprocal Rank on a retrieval probe β 1.00 = perfect retrieval ranking preserved.
|
| 59 |
|
| 60 |
### Quantization methodology
|
| 61 |
|
| 62 |
+
The XLM-RoBERTa vocabulary has 250,002 tokens Γ 1024 dimensions, making the word embedding
|
| 63 |
+
table the dominant weight (~977 MB FP32). Each variant handles it differently:
|
| 64 |
+
|
| 65 |
+
- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` β
|
| 66 |
+
quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
|
| 67 |
+
- **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass.
|
| 68 |
+
Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights
|
| 69 |
+
to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings
|
| 70 |
+
the embedding table from 977 MB FP32 β 244 MB INT8.
|
| 71 |
- **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
|
| 72 |
+
`DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric
|
| 73 |
+
INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β21. Embedding: 977 MB β 122 MB.
|
|
|
|
| 74 |
|
| 75 |
---
|
| 76 |
|
| 77 |
## Usage
|
| 78 |
|
| 79 |
+
### fastembed (Rust)
|
| 80 |
|
| 81 |
This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
|
| 82 |
|
| 83 |
```rust
|
| 84 |
use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
|
| 85 |
|
| 86 |
+
// INT8 β most compatible, 542 MB
|
| 87 |
+
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
+
// INT4 + INT8 embedding β 434 MB
|
| 90 |
+
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;
|
| 91 |
|
| 92 |
+
// INT4 full β smallest, 337 MB
|
| 93 |
+
let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;
|
| 94 |
|
| 95 |
+
let embeddings = model.embed(vec!["μλ
νμΈμ", "Hello world"], None)?;
|
|
|
|
| 96 |
```
|
| 97 |
|
| 98 |
### ONNX Runtime (Python)
|
|
|
|
| 106 |
tokenizer.enable_truncation(max_length=512)
|
| 107 |
tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
|
| 108 |
|
| 109 |
+
session = ort.InferenceSession("onnx/model_quantized.onnx",
|
| 110 |
+
providers=["CPUExecutionProvider"])
|
| 111 |
|
| 112 |
+
texts = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
|
| 113 |
+
"ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
|
| 114 |
+
|
| 115 |
+
enc = tokenizer.encode_batch(texts)
|
| 116 |
ids = np.array([e.ids for e in enc], dtype=np.int64)
|
| 117 |
mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
|
| 118 |
|
| 119 |
+
out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # (batch, seq, 1024)
|
| 120 |
|
| 121 |
# Mean pooling + L2 normalize
|
| 122 |
pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
|
| 123 |
norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
|
| 124 |
embeddings = pooled / norms.clip(1e-12)
|
| 125 |
+
# cosine similarity
|
| 126 |
+
scores = embeddings @ embeddings.T
|
| 127 |
+
print(scores)
|
| 128 |
```
|
| 129 |
|
| 130 |
+
### sentence-transformers (original FP32 weights)
|
| 131 |
|
| 132 |
```python
|
| 133 |
from sentence_transformers import SentenceTransformer
|
| 134 |
|
| 135 |
model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
|
| 136 |
|
| 137 |
+
queries = ["ν
λ ν½μ€λ μ΄λ€ μ°μ
λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό νμ©νλμ?",
|
| 138 |
+
"κ΅λ°© λΆμΌμ μ΄λ€ μμ± μλΉμ€κ° μ 곡λλμ?"]
|
| 139 |
+
documents = ["ν
λ ν½μ€λ ν΄μ, μμ, λμ
λ± λ€μν λΆμΌμμ μμ± λ°μ΄ν°λ₯Ό λΆμνμ¬ μλΉμ€λ₯Ό μ 곡ν©λλ€.",
|
| 140 |
+
"μ μ°° λ° κ°μ λͺ©μ μ μμ± μμμ ν΅ν΄ κ΅λ°© κ΄λ ¨ μ λ° λΆμ μλΉμ€λ₯Ό μ 곡ν©λλ€."]
|
| 141 |
|
| 142 |
q_emb = model.encode(queries, prompt_name="query")
|
| 143 |
d_emb = model.encode(documents)
|
| 144 |
scores = model.similarity(q_emb, d_emb)
|
| 145 |
+
print(scores)
|
| 146 |
```
|
| 147 |
|
| 148 |
---
|
| 149 |
|
| 150 |
## Quality Benchmarks (original model)
|
| 151 |
|
| 152 |
+
Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
|
| 153 |
+
evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
|
| 154 |
|
| 155 |
### 6 Datasets of MTEB (Korean)
|
| 156 |
|
| 157 |
| Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
|
| 158 |
|---|---|---|---|---|---|---|
|
| 159 |
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
|
| 160 |
+
| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
|
| 161 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
|
| 162 |
+
| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
|
| 163 |
| nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
|
| 164 |
| BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
|
| 165 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
|
| 166 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
|
| 167 |
| jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
|
| 168 |
+
| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
|
| 169 |
+
|
| 170 |
+
Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.
|
| 171 |
|
| 172 |
### 7 Datasets of BEIR (English)
|
| 173 |
|
|
|
|
| 175 |
|---|---|---|---|---|---|---|
|
| 176 |
| Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
|
| 177 |
| **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
|
| 178 |
+
| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
|
| 179 |
| Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
|
| 180 |
+
| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
|
| 181 |
| BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
|
| 182 |
| jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
|
| 183 |
|
| 184 |
+
Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.
|
| 185 |
|
| 186 |
---
|
| 187 |
|
| 188 |
## License
|
| 189 |
|
| 190 |
+
Apache 2.0 β same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
|
| 191 |
|
| 192 |
## Citation
|
| 193 |
|
| 194 |
```bibtex
|
| 195 |
@software{TelePIX-PIXIE-Rune-v1,
|
| 196 |
+
title = {PIXIE-Rune-v1.0},
|
| 197 |
+
author = {TelePIX AI Research Team and Bongmin Kim},
|
| 198 |
+
year = {2025},
|
| 199 |
+
url = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
|
| 200 |
}
|
| 201 |
```
|
| 202 |
|
| 203 |
## Contact
|
| 204 |
|
| 205 |
+
Original model authors: bmkim@telepix.net
|
| 206 |
+
ONNX quantization: [cstr](https://huggingface.co/cstr) β open an issue on this repo for questions.
|