PipeOwl-1.8.1-jp-evalbpb (Geometric Embedding)

A transformer-free semantic retrieval engine.

PipeOwl performs deterministic vocabulary scoring over a static embedding field:

score = α⋅base + (1 - α⋅base)⋅Δfield

val bpb: 21.417518826563846

where:

base = cosine similarity in embedding space
Δfield = static scalar field bias

Features:

O(n) over vocabulary.
No attention.
No transformer weights.
CPU-friendly (<16MB model)

Architecture

Static embedding table (V × D)
Aligned vocabulary index
Optional scalar bias field (Δfield)
Linear scoring
Pluggable decoder stage
Targeted for CPU environments and low-latency systems (e.g. IME).

Model Specs

item	value
vocab size	26155
embedding dim	256
storage format	safetensors (FP16)
model size	~13.2 MB
languages	Japanese
startup time	<1s
query latency	~1 ms (CPU, full vocabulary scan)

Quickstart

git clone https://huggingface.co/WangKaiLin/PipeOwl-1.8.1-jp-evalbpb
cd PipeOwl-1.8.1-jp-evalbpb

pip install numpy safetensors

python quickstart.py

Example:

Example semantic retrieval results:

Please enter words： 東京

Top-K Tokens:
0.894 | は
0.739 | 東京
0.674 | 起
0.630 | リーズ
0.609 | ュニ

Please enter words： 大阪

Top-K Tokens:
0.898 | は
0.673 | 大阪
0.670 | 起
0.655 | 東京
0.623 | リーズ

Repository Structure

PipeOwl-1.8.1-jp-evalbpb/
 ├ README.md
 ├ config.json
 ├ DATA_SOURCES.md
 ├ eval_bpb.py
 ├ LICENSE
 ├ quickstart.py
 ├ engine.py
 ├ vocabulary.json
 └ pipeowl_fp16.safetensors