cstr
/

PIXIE-Rune-v1.0-ONNX

@@ -1,3 +1,23 @@
 # PIXIE-Rune-v1.0 — ONNX Quantized Variants
 ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
@@ -33,48 +53,46 @@ retrieval across 74 languages with specialization in Korean/English aerospace do
 | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
 | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
-**Metrics** measured on 8 semantically diverse English sentences vs the FP32 reference.
-Pearson r is the correlation of pairwise cosine similarity matrices (structure preservation).
 MRR = Mean Reciprocal Rank on a retrieval probe — 1.00 = perfect retrieval ranking preserved.
 ### Quantization methodology
-- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic` with
-  `weight_type=QInt8` — quantizes all weight tensors (MatMul + embedding Gather) to INT8.
-- **INT4+INT8 emb** (`model_int4.onnx`): Two-pass approach.
-  Pass 1: `MatMulNBitsQuantizer(block_size=32, is_symmetric=True)` quantizes transformer
-  MatMul weights to 4-bit. Pass 2: `quantize_dynamic` with `op_types_to_quantize=["Gather"]`
-  compresses the 250K-token embedding table to INT8. Net: 977 MB FP32 embedding → 244 MB INT8.
 - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
-  `DequantizeLinear(axis=0)` node insertion packs the word embedding table as INT4 nibbles
-  (per-row symmetric, scale = max(|row|)/7). Requires opset 21 for INT4 DequantizeLinear.
-  The 977 MB FP32 embedding table becomes 122 MB packed INT4.
 ---
 ## Usage
-### fastembed (Rust / Python)
 This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
 ```rust
 use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
-let model = TextEmbedding::try_new(
-    InitOptions::new(EmbeddingModel::PixieRuneV1Q)   // INT8
-    // EmbeddingModel::PixieRuneV1Int4                // INT4+INT8 emb
-    // EmbeddingModel::PixieRuneV1Int4Full            // INT4 full
-)?;
-let embeddings = model.embed(vec!["Hello", "World"], None)?;
-```
-```python
-from fastembed import TextEmbedding
-model = TextEmbedding("telepix/PIXIE-Rune-v1.0", model_file="onnx/model_quantized.onnx")
-embeddings = list(model.embed(["Hello", "World"]))
 ```
 ### ONNX Runtime (Python)
@@ -88,53 +106,68 @@ tokenizer = Tokenizer.from_file("tokenizer.json")
 tokenizer.enable_truncation(max_length=512)
 tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
-session = ort.InferenceSession("onnx/model_quantized.onnx")
-texts = ["Hello, world!", "This is a test."]
-enc = tokenizer.encode_batch(texts)
 ids  = np.array([e.ids            for e in enc], dtype=np.int64)
 mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
-out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]
 # Mean pooling + L2 normalize
 pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
 norms  = np.linalg.norm(pooled, axis=-1, keepdims=True)
 embeddings = pooled / norms.clip(1e-12)
 ```
-### sentence-transformers (original weights)
 ```python
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
-queries   = ["텔레픽스는 어떤 산업 분야에서 위성 데이터를 활용하나요?"]
-documents = ["텔레픽스는 해양, 자원, 농업 등 다양한 분야에서 위성 데이터를 분석하여 서비스를 제공합니다."]
 q_emb = model.encode(queries,   prompt_name="query")
 d_emb = model.encode(documents)
 scores = model.similarity(q_emb, d_emb)
 ```
 ---
 ## Quality Benchmarks (original model)
-Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
 ### 6 Datasets of MTEB (Korean)
 | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
 |---|---|---|---|---|---|---|
 | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
 | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
 | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
 | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
 | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
 | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
 | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
 ### 7 Datasets of BEIR (English)
@@ -142,30 +175,32 @@ Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune
 |---|---|---|---|---|---|---|
 | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
 | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
 | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
 | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
 | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
-Benchmarks from [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
 ---
 ## License
-Apache 2.0 — same as the original model.
 ## Citation
 ```bibtex
 @software{TelePIX-PIXIE-Rune-v1,
-  title={PIXIE-Rune-v1.0},
-  author={TelePIX AI Research Team and Bongmin Kim},
-  year={2025},
-  url={https://huggingface.co/telepix/PIXIE-Rune-v1.0}
 }
 ```
 ## Contact
-Original model: bmkim@telepix.net
-ONNX quantization: [cstr](https://huggingface.co/cstr) — issues welcome.

+---
+language:
+  - multilingual
+  - ko
+  - en
+license: apache-2.0
+tags:
+  - sentence-transformers
+  - feature-extraction
+  - sentence-similarity
+  - onnx
+  - quantized
+  - xlm-roberta
+  - dense-encoder
+  - dense
+  - fastembed
+base_model: telepix/PIXIE-Rune-v1.0
+pipeline_tag: feature-extraction
+---
 # PIXIE-Rune-v1.0 — ONNX Quantized Variants
 ONNX-quantized derivatives of [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
 | `onnx/model_int4.onnx` | INT4 + INT8 emb | 434 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT8 Gather |
 | `onnx/model_int4_full.onnx` | INT4 full | 337 MB | 0.941 | 0.998 | 1.00 | `MatMulNBits` + INT4 Gather (opset 21) |
+**Metrics** measured on 8 semantically diverse sentences vs FP32 reference.
+Pearson r = correlation of pairwise cosine similarity matrices (structure preservation).
 MRR = Mean Reciprocal Rank on a retrieval probe — 1.00 = perfect retrieval ranking preserved.
 ### Quantization methodology
+The XLM-RoBERTa vocabulary has 250,002 tokens × 1024 dimensions, making the word embedding
+table the dominant weight (~977 MB FP32). Each variant handles it differently:
+- **INT8** (`model_quantized.onnx`): `onnxruntime.quantization.quantize_dynamic(weight_type=QInt8)` —
+  quantizes all weight tensors including the embedding Gather to INT8. Compact, maximum compatibility.
+- **INT4 + INT8 emb** (`model_int4.onnx`): Two-pass.
+  Pass 1: `MatMulNBitsQuantizer(block_size=32, symmetric=True)` packs transformer MatMul weights
+  to 4-bit nibbles. Pass 2: `quantize_dynamic(op_types=["Gather"], weight_type=QInt8)` brings
+  the embedding table from 977 MB FP32 → 244 MB INT8.
 - **INT4 full** (`model_int4_full.onnx`): Same MatMulNBits pass, then manual
+  `DequantizeLinear(axis=0)` node insertion packs the embedding table as per-row symmetric
+  INT4 nibbles (scale = max(|row|)/7). Requires opset upgrade 14→21. Embedding: 977 MB → 122 MB.
 ---
 ## Usage
+### fastembed (Rust)
 This repo is integrated in [fastembed-rs](https://github.com/Anush008/fastembed-rs):
 ```rust
 use fastembed::{EmbeddingModel, InitOptions, TextEmbedding};
+// INT8 — most compatible, 542 MB
+let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Q))?;
+// INT4 + INT8 embedding — 434 MB
+let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4))?;
+// INT4 full — smallest, 337 MB
+let model = TextEmbedding::try_new(InitOptions::new(EmbeddingModel::PixieRuneV1Int4Full))?;
+let embeddings = model.embed(vec!["안녕하세요", "Hello world"], None)?;
 ```
 ### ONNX Runtime (Python)
 tokenizer.enable_truncation(max_length=512)
 tokenizer.enable_padding(pad_token="<pad>", pad_id=1)
+session = ort.InferenceSession("onnx/model_quantized.onnx",
+                                providers=["CPUExecutionProvider"])
+texts = ["텔레픽스는 어떤 산업 분야에서 위성 데이터를 활용하나요?",
+         "텔레픽스는 해양, 자원, 농업 등 다양한 분야에서 위성 데이터를 분석하여 서비스를 제공합니다."]
+enc  = tokenizer.encode_batch(texts)
 ids  = np.array([e.ids            for e in enc], dtype=np.int64)
 mask = np.array([e.attention_mask for e in enc], dtype=np.int64)
+out = session.run(None, {"input_ids": ids, "attention_mask": mask})[0]  # (batch, seq, 1024)
 # Mean pooling + L2 normalize
 pooled = (out * mask[..., None]).sum(1) / mask.sum(1, keepdims=True).clip(1e-9)
 norms  = np.linalg.norm(pooled, axis=-1, keepdims=True)
 embeddings = pooled / norms.clip(1e-12)
+# cosine similarity
+scores = embeddings @ embeddings.T
+print(scores)
 ```
+### sentence-transformers (original FP32 weights)
 ```python
 from sentence_transformers import SentenceTransformer
 model = SentenceTransformer("telepix/PIXIE-Rune-v1.0")
+queries   = ["텔레픽스는 어떤 산업 분야에서 위성 데이터를 활용하나요?",
+             "국방 분야에 어떤 위성 서비스가 제공되나요?"]
+documents = ["텔레픽스는 해양, 자원, 농업 등 다양한 분야에서 위성 데이터를 분석하여 서비스를 제공합니다.",
+             "정찰 및 감시 목적의 위성 영상을 통해 국방 관련 정밀 분석 서비스를 제공합니다."]
 q_emb = model.encode(queries,   prompt_name="query")
 d_emb = model.encode(documents)
 scores = model.similarity(q_emb, d_emb)
+print(scores)
 ```
 ---
 ## Quality Benchmarks (original model)
+Results from [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0),
+evaluated using [Korean-MTEB-Retrieval-Evaluators](https://github.com/nlpai-lab/KURE/tree/main/evaluation).
 ### 6 Datasets of MTEB (Korean)
 | Model | # params | Avg. NDCG | NDCG@1 | NDCG@3 | NDCG@5 | NDCG@10 |
 |---|---|---|---|---|---|---|
 | telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.7567 | 0.7149 | 0.7541 | 0.7696 | 0.7882 |
+| telepix/PIXIE-Spell-Preview-0.6B | 0.6B | 0.7280 | 0.6804 | 0.7258 | 0.7448 | 0.7612 |
 | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.7383** | **0.6936** | **0.7356** | **0.7545** | **0.7698** |
+| telepix/PIXIE-Splade-Preview | 0.1B | 0.7253 | 0.6799 | 0.7217 | 0.7416 | 0.7579 |
 | nlpai-lab/KURE-v1 | 0.5B | 0.7312 | 0.6826 | 0.7303 | 0.7478 | 0.7642 |
 | BAAI/bge-m3 | 0.5B | 0.7126 | 0.6613 | 0.7107 | 0.7301 | 0.7483 |
 | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.7050 | 0.6570 | 0.7015 | 0.7226 | 0.7390 |
 | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.6872 | 0.6423 | 0.6833 | 0.7017 | 0.7215 |
 | jinaai/jina-embeddings-v3 | 0.5B | 0.6731 | 0.6224 | 0.6715 | 0.6899 | 0.7088 |
+| openai/text-embedding-3-large | N/A | 0.6465 | 0.5895 | 0.6467 | 0.6646 | 0.6853 |
+Benchmarks: Ko-StrategyQA, AutoRAGRetrieval, MIRACLRetrieval, PublicHealthQA, BelebeleRetrieval, MultiLongDocRetrieval.
 ### 7 Datasets of BEIR (English)
 |---|---|---|---|---|---|---|
 | Snowflake/snowflake-arctic-embed-l-v2.0 | 0.5B | 0.5812 | 0.5725 | 0.5705 | 0.5811 | 0.6006 |
 | **telepix/PIXIE-Rune-v1.0** | **0.5B** | **0.5781** | **0.5691** | **0.5663** | **0.5791** | **0.5979** |
+| telepix/PIXIE-Spell-Preview-1.7B | 1.7B | 0.5630 | 0.5446 | 0.5529 | 0.5660 | 0.5885 |
 | Qwen/Qwen3-Embedding-0.6B | 0.6B | 0.5558 | 0.5321 | 0.5451 | 0.5620 | 0.5839 |
+| Alibaba-NLP/gte-multilingual-base | 0.3B | 0.5541 | 0.5446 | 0.5426 | 0.5574 | 0.5746 |
 | BAAI/bge-m3 | 0.5B | 0.5318 | 0.5078 | 0.5231 | 0.5389 | 0.5573 |
 | jinaai/jina-embeddings-v3 | 0.6B | 0.4482 | 0.4116 | 0.4379 | 0.4573 | 0.4861 |
+Benchmarks: ArguAna, FEVER, FiQA-2018, HotpotQA, MSMARCO, NQ, SCIDOCS.
 ---
 ## License
+Apache 2.0 — same as the original [telepix/PIXIE-Rune-v1.0](https://huggingface.co/telepix/PIXIE-Rune-v1.0).
 ## Citation
 ```bibtex
 @software{TelePIX-PIXIE-Rune-v1,
+  title  = {PIXIE-Rune-v1.0},
+  author = {TelePIX AI Research Team and Bongmin Kim},
+  year   = {2025},
+  url    = {https://huggingface.co/telepix/PIXIE-Rune-v1.0}
 }
 ```
 ## Contact
+Original model authors: bmkim@telepix.net
+ONNX quantization: [cstr](https://huggingface.co/cstr) — open an issue on this repo for questions.