| --- |
| language: zh |
| tags: |
| - embeddings |
| - retrieval |
| - numpy |
| - transformer-free |
| license: mit |
| --- |
| |
| # PipeOwl-1.0 (Geometric Embedding) |
|
|
| PipeOwl is a transformer-free geometric embedding package built on a **static embedding field** stored as NumPy arrays. |
|
|
| This repo provides: |
| - `L1_base_embeddings.npy`: float32 (V, 1024) embedding table (unit-normalized) |
| - `L1_base_vocab.json`: list of vocab strings aligned to embedding rows |
| - `delta_base_scalar.npy`: float32 (V,) optional scalar bias field |
| - minimal inference engine (`engine.py`) and usage script (`quickstart.py`) |
|
|
| --- |
|
|
| ## Attribution |
| The base embedding vectors were generated using **BGE (Apache-2.0)** via inference (model outputs). |
| This repository **does not redistribute any original BGE model weights**. |
|
|
| --- |
|
|
| ## Quickstart |
|
|
| ```bash |
| pip install numpy |
| python quickstart.py |
| ``` |
| Or minimal usage: |
| ```python |
| from engine import PipeOwlEngine, PipeOwlConfig |
| |
| engine = PipeOwlEngine(PipeOwlConfig()) |
| q = engine.encode("雪鴞好可愛") |
| ``` |
|
|
| --- |
|
|
| # use q for similarity / retrieval |
| Files |
| data/L1_base_embeddings.npy : embedding table (float32, V×1024) |
| data/L1_base_vocab.json : vocab aligned with rows |
| data/delta_base_scalar.npy : scalar bias (float32, V) |
| engine.py : minimal runtime |
| quickstart.py : example script |
|
|
| Notes |
| No safetensors / pytorch_model.bin is included because this model is distributed as a static NumPy embedding field. |
| |
| --- |
| |
| ## Stress Test Results (Hard Retrieval Setting) |
| |
| corpus size = 1200 |
| eval size = 200 |
| ood ratio = 0.28 |
| |
| | Model | in-domain MRR@10 | OOD MRR@10 | |
| |--------|-----------------|------------| |
| | MiniLM | 0.019 | 0.026 | |
| | BGE | 0.026 | 0.009 | |
| | PipeOwl | 0.013 | 0.023 | |
| |
| Note: This test uses a harder corpus and adversarial-style queries. |
| Absolute scores are low due to difficulty scaling. |
| |
| See full experimental notes here: |
| <https://hackmd.io/@galaxy4552/BkpUEnTwbl> |
| |
| --- |
| |
| ```bash |
| pipeowl/ |
| │ |
| ├─ README.md |
| ├─ model_card.md |
| ├─ LICENSE |
| │ |
| ├─ engine.py |
| ├─ quickstart.py |
| │ |
| └─ data/ |
| ├─ L1_base_embeddings.npy |
| ├─ delta_base_scalar.npy |
| └─ L1_base_vocab.json |
| ``` |