cstr commited on
Commit
8a5712e
Β·
verified Β·
1 Parent(s): c3ff3a4

Add YAML metadata to model card

Browse files
Files changed (1) hide show
  1. README.md +74 -39
README.md CHANGED
@@ -1,3 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # PIXIE-Rune-v1.0 β€” ONNX Quantized Variants
2
 
3
  ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
@@ -33,48 +53,46 @@ retrieval across 74 languages with specialization in Korean/English aerospace do
33
  | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
34
  | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
35
 
36
- **Metrics** measured on 8 semantically diverse English sentences vs the FP32 reference.
37
- Pearson r is the correlation of pairwise cosine similarity matrices (structure preservation).
38
  MRR = Mean Reciprocal Rank on a retrieval probe β€” 1.00 = perfect retrieval ranking preserved.
39
 
40
  ### Quantization methodology
41
 
42
- - **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic` with
43
- `weight_type=QInt8` β€” quantizes all weight tensors (MatMul + embedding Gather) to INT8.
44
- - **INT4+INT8 emb** (`model_int4.onnx`): Two-pass approach.
45
- Pass 1: `MatMulNBitsQuantizer(block_size=32, is_symmetric=True)` quantizes transformer
46
- MatMul weights to 4-bit. Pass 2: `quantize_dynamic` with `op_types_to_quantize=["Gather"]`
47
- compresses the 250K-token embedding table to INT8. Net: 977 MB FP32 embedding β†’ 244 MB INT8.
 
 
 
48
  - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
49
- `DequantizeLinear(axis=0)` node insertion packs the word embedding table as INT4 nibbles
50
- (per-row symmetric, scale = max(|row|)/7). Requires opset 21 for INT4 DequantizeLinear.
51
- The 977 MB FP32 embedding table becomes 122 MB packed INT4.
52
 
53
  ---
54
 
55
  ## Usage
56
 
57
- ### fastembed (Rust / Python)
58
 
59
  This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
60
 
61
  ```rust
62
  use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
63
 
64
- let model = TextEmbedding::try_new(
65
- InitOptions::new(EmbeddingModel::PixieRuneV1Q) // INT8
66
- // EmbeddingModel::PixieRuneV1Int4 // INT4+INT8 emb
67
- // EmbeddingModel::PixieRuneV1Int4Full // INT4 full
68
- )?;
69
 
70
- let embeddings = model.embed(vec!["Hello", "World"], None)?;
71
- ```
72
 
73
- ```python
74
- from fastembed import TextEmbedding
75
 
76
- model = TextEmbedding("telepix/PIXIE-Rune-v1.0", model_file="onnx/model_quantized.onnx")
77
- embeddings = list(model.embed(["Hello", "World"]))
78
  ```
79
 
80
  ### ONNX Runtime (Python)
@@ -88,53 +106,68 @@ tokenizer = Tokenizer.from_file("tokenizer.json")
88
  tokenizer.enable_truncation(max_length=512)
89
  tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
90
 
91
- session = ort.InferenceSession("onnx/model_quantized.onnx")
 
92
 
93
- texts = ["Hello, world!", "This is a test."]
94
- enc = tokenizer.encode_batch(texts)
 
 
95
  ids = np.array([e.ids for e in enc], dtype=np.int64)
96
  mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
97
 
98
- out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
99
 
100
  # Mean pooling + L2 normalize
101
  pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
102
  norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
103
  embeddings = pooled / norms.clip(1e-12)
 
 
 
104
  ```
105
 
106
- ### sentence-transformers (original weights)
107
 
108
  ```python
109
  from sentence_transformers import SentenceTransformer
110
 
111
  model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
112
 
113
- queries = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?"]
114
- documents = ["ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]
 
 
115
 
116
  q_emb = model.encode(queries, prompt_name="query")
117
  d_emb = model.encode(documents)
118
  scores = model.similarity(q_emb, d_emb)
 
119
  ```
120
 
121
  ---
122
 
123
  ## Quality Benchmarks (original model)
124
 
125
- Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
 
126
 
127
  ### 6 Datasets of MTEB (Korean)
128
 
129
  | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
130
  |---|---|---|---|---|---|---|
131
  | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
 
132
  | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
 
133
  | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
134
  | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
135
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
136
  | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
137
  | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
 
 
 
138
 
139
  ### 7 Datasets of BEIR (English)
140
 
@@ -142,30 +175,32 @@ Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune
142
  |---|---|---|---|---|---|---|
143
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
144
  | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
 
145
  | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
 
146
  | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
147
  | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
148
 
149
- Benchmarks from [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
150
 
151
  ---
152
 
153
  ## License
154
 
155
- Apache 2.0 β€” same as the original model.
156
 
157
  ## Citation
158
 
159
  ```bibtex
160
  @software{TelePIX-PIXIE-Rune-v1,
161
- title={PIXIE-Rune-v1.0},
162
- author={TelePIX AI Research Team and Bongmin Kim},
163
- year={2025},
164
- url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
165
  }
166
  ```
167
 
168
  ## Contact
169
 
170
- Original model: bmkim@telepix.net
171
- ONNX quantization: [cstr](https://huggingface.co/cstr) β€” issues welcome.
 
1
+ ---
2
+ language:
3
+ - multilingual
4
+ - ko
5
+ - en
6
+ license: apache-2.0
7
+ tags:
8
+ - sentence-transformers
9
+ - feature-extraction
10
+ - sentence-similarity
11
+ - onnx
12
+ - quantized
13
+ - xlm-roberta
14
+ - dense-encoder
15
+ - dense
16
+ - fastembed
17
+ base_model: telepix/PIXIE-Rune-v1.0
18
+ pipeline_tag: feature-extraction
19
+ ---
20
+
21
  # PIXIE-Rune-v1.0 β€” ONNX Quantized Variants
22
 
23
  ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
 
53
  | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
54
  | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
55
 
56
+ **Metrics** measured on 8 semantically diverse sentences vs FP32 reference.
57
+ Pearson r = correlation of pairwise cosine similarity matrices (structure preservation).
58
  MRR = Mean Reciprocal Rank on a retrieval probe β€” 1.00 = perfect retrieval ranking preserved.
59
 
60
  ### Quantization methodology
61
 
62
+ The XLM-RoBERTa vocabulary has 250,002 tokens Γ— 1024 dimensions, making the word embedding
63
+ table the dominant weight (~977 MB FP32). Each variant handles it differently:
64
+
65
+ - **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` β€”
66
+ quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
67
+ - **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass.
68
+ Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights
69
+ to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings
70
+ the embedding table from 977 MB FP32 β†’ 244 MB INT8.
71
  - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
72
+ `DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric
73
+ INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14β†’21. Embedding: 977 MB β†’ 122 MB.
 
74
 
75
  ---
76
 
77
  ## Usage
78
 
79
+ ### fastembed (Rust)
80
 
81
  This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
82
 
83
  ```rust
84
  use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
85
 
86
+ // INT8 β€” most compatible, 542 MB
87
+ let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;
 
 
 
88
 
89
+ // INT4 + INT8 embedding β€” 434 MB
90
+ let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;
91
 
92
+ // INT4 full β€” smallest, 337 MB
93
+ let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;
94
 
95
+ let embeddings = model.embed(vec!["μ•ˆλ…•ν•˜μ„Έμš”", "Hello world"], None)?;
 
96
  ```
97
 
98
  ### ONNX Runtime (Python)
 
106
  tokenizer.enable_truncation(max_length=512)
107
  tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
108
 
109
+ session = ort.InferenceSession("onnx/model_quantized.onnx",
110
+ providers=["CPUExecutionProvider"])
111
 
112
+ texts = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
113
+ "ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]
114
+
115
+ enc = tokenizer.encode_batch(texts)
116
  ids = np.array([e.ids for e in enc], dtype=np.int64)
117
  mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
118
 
119
+ out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0] # (batch, seq, 1024)
120
 
121
  # Mean pooling + L2 normalize
122
  pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
123
  norms = np.linalg.norm(pooled, axis=-1, keepdims=True)
124
  embeddings = pooled / norms.clip(1e-12)
125
+ # cosine similarity
126
+ scores = embeddings @ embeddings.T
127
+ print(scores)
128
  ```
129
 
130
+ ### sentence-transformers (original FP32 weights)
131
 
132
  ```python
133
  from sentence_transformers import SentenceTransformer
134
 
135
  model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
136
 
137
+ queries = ["ν…”λ ˆν”½μŠ€λŠ” μ–΄λ–€ μ‚°μ—… λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό ν™œμš©ν•˜λ‚˜μš”?",
138
+ "κ΅­λ°© 뢄야에 μ–΄λ–€ μœ„μ„± μ„œλΉ„μŠ€κ°€ μ œκ³΅λ˜λ‚˜μš”?"]
139
+ documents = ["ν…”λ ˆν”½μŠ€λŠ” ν•΄μ–‘, μžμ›, 농업 λ“± λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μœ„μ„± 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€.",
140
+ "μ •μ°° 및 κ°μ‹œ λͺ©μ μ˜ μœ„μ„± μ˜μƒμ„ 톡해 κ΅­λ°© κ΄€λ ¨ μ •λ°€ 뢄석 μ„œλΉ„μŠ€λ₯Ό μ œκ³΅ν•©λ‹ˆλ‹€."]
141
 
142
  q_emb = model.encode(queries, prompt_name="query")
143
  d_emb = model.encode(documents)
144
  scores = model.similarity(q_emb, d_emb)
145
+ print(scores)
146
  ```
147
 
148
  ---
149
 
150
  ## Quality Benchmarks (original model)
151
 
152
+ Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
153
+ evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
154
 
155
  ### 6 Datasets of MTEB (Korean)
156
 
157
  | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
158
  |---|---|---|---|---|---|---|
159
  | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
160
+ | telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
161
  | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
162
+ | telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
163
  | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
164
  | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
165
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
166
  | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
167
  | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
168
+ | openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
169
+
170
+ Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.
171
 
172
  ### 7 Datasets of BEIR (English)
173
 
 
175
  |---|---|---|---|---|---|---|
176
  | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
177
  | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
178
+ | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
179
  | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
180
+ | Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
181
  | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
182
  | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
183
 
184
+ Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.
185
 
186
  ---
187
 
188
  ## License
189
 
190
+ Apache 2.0 β€” same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
191
 
192
  ## Citation
193
 
194
  ```bibtex
195
  @software{TelePIX-PIXIE-Rune-v1,
196
+ title = {PIXIE-Rune-v1.0},
197
+ author = {TelePIX AI Research Team and Bongmin Kim},
198
+ year = {2025},
199
+ url = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
200
  }
201
  ```
202
 
203
  ## Contact
204
 
205
+ Original model authors: bmkim@telepix.net
206
+ ONNX quantization: [cstr](https://huggingface.co/cstr) β€” open an issue on this repo for questions.