WebRank

A 3.14M-parameter transformer that scores web text on a [0, 1] scale where 1 = real content and 0 = boilerplate (cookie banners, navs, footers, CTAs, error pages, JS placeholders, paywalls).

Ships as a 3.2 MB INT8 ONNX file that runs anywhere ONNX Runtime runs — Python, JS (browser/node), Go, Rust, C++, Java, .NET.

Built as the post-processing filter for the Keiro Browser crawl pipeline, released as open source.

Files

File	Size	Description
`webrank.int8.onnx`	3.2 MB	INT8-quantized model — recommended
`webrank.onnx`	12 MB	FP32 model
`tokenizer.json`	1.1 MB	HuggingFace `tokenizers` BPE vocab

Architecture

input_ids [B, 256]   int64
   ↓
token + position embeddings (dim=128)
   ↓
5 × { LayerNorm → MHA(8 heads, SDPA) → residual
                → LayerNorm → FFN(512) → residual }
   ↓
LayerNorm → mean-pool over non-pad tokens
   ↓
Linear(128→128) → GELU → Dropout → Linear(128→1) → sigmoid
   ↓
score [B]   float32

Vocab: 16,384 byte-level BPE
Max seq length: 256 BPE tokens
Params: 3,135,617
Pretraining: masked language modeling
Fine-tuning: binary classification with BCE loss

Usage

Python

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

tok = Tokenizer.from_file("tokenizer.json")
sess = ort.InferenceSession("webrank.int8.onnx",
                            providers=["CPUExecutionProvider"])

def encode(text, max_len=256):
    pad_id = tok.token_to_id("[PAD]")
    ids = tok.encode(text).ids[:max_len]   # post-processor adds [CLS]/[SEP]
    ids += [pad_id] * (max_len - len(ids))
    return np.array([ids], dtype=np.int64)

def score(text):
    out = sess.run(["score"], {"input_ids": encode(text)})[0]
    return float(out.flatten()[0])

print(score("Mitochondria are membrane-bound organelles found in eukaryotic cells."))
# 0.93

print(score("We use cookies to improve your experience. Accept all cookies."))
# 0.08

Batched:

def score_batch(texts):
    ids = np.concatenate([encode(t) for t in texts], axis=0)
    return sess.run(["score"], {"input_ids": ids})[0].flatten()

JavaScript (browser / Node)

import * as ort from "onnxruntime-web";

const session = await ort.InferenceSession.create("/webrank.int8.onnx");
// tokenize text into a BigInt64Array of length 256 using a JS BPE
// library that loads tokenizer.json
const tensor = new ort.Tensor("int64", ids, [1, 256]);
const out = await session.run({ input_ids: tensor });
console.log(out.score.data[0]);  // 0..1

Go

import ort "github.com/yalue/onnxruntime_go"

ort.SetSharedLibraryPath("libonnxruntime.so")
ort.InitializeEnvironment()
defer ort.DestroyEnvironment()

input, _  := ort.NewTensor(ort.NewShape(1, 256), ids /* []int64 */)
output, _ := ort.NewEmptyTensor[float32](ort.NewShape(1))
sess, _   := ort.NewAdvancedSession(
    "webrank.int8.onnx",
    []string{"input_ids"}, []string{"score"},
    []ort.Value{input}, []ort.Value{output}, nil,
)
sess.Run()
fmt.Println(output.GetData()[0])

Performance

Measured on a Ryzen 7 (CPU only, ONNX Runtime 1.20):

Variant	Single-row	Batch-18	Size
FP32	5.9 ms	238 ms	12 MB
INT8	6.6 ms	222 ms	3.2 MB

INT8 is 3.8× smaller with ≤0.024 score drift and identical predictions on every test case. Quantization overhead cancels matmul savings at 3M params, so single-row latency is roughly equivalent — INT8 wins on size and on batched throughput.

Training data

Pretraining: Salesforce/wikitext wikitext-103-raw-v1, ~29k articles, ~110M tokens.
Fine-tuning: 30k labeled examples (15k positive / 15k negative).
- Positives: 7.5k paragraph-level + 7.5k sentence-level extracts from wikitext articles, filtered for prose-like structure.
- Negatives: synthetically generated boilerplate from 40+ templates (cookie banners, navs, footers, CTAs, JS placeholders, error pages, paywall stubs), with deliberately varied length (40% single template, 30% pair, 20% triple, 10% stack of 4–6).

The mixed-length sampling on both sides is important — without it the model learns to use sequence length as a shortcut.

Training procedure

Pretraining: masked language modeling (BERT-style 80/10/10 mask), AdamW (lr 3e-4, betas 0.9/0.95, wd 0.01), cosine schedule with 100-step warmup, gradient clipping 1.0, batch size 32, 800 steps total. ~75 minutes on CPU.
Fine-tuning: binary classification head with BCE loss, AdamW (lr 5e-5), 3 epochs over 12k training rows, batch size 64. ~38 minutes on CPU.
Training framework: PyTorch (vanilla, no HuggingFace transformers for the model itself).

Evaluation

On a held-out 3,000-row validation split:

Metric	Value
Accuracy	1.000
Precision	1.000
Recall	0.999
F1	1.000
Loss	0.0074

Held-out val is trivially separable because synthetic boilerplate vs wikitext prose is a fairly easy decision boundary. For a more honest read, on 18 hand-written real-world snippets (none from the training distribution):

16 / 18 correct on the binary cutoff.
The 2 failures are:
- 404 - Page not found. The page you are looking for might have been removed... → 0.75 (false positive for content)
- This article is for subscribers only. Subscribe now to read the full story... → 0.72 (false positive for content)

Both are paywall/error pages styled as natural prose — the synthetic templated negatives never showed the model that prose-shaped boilerplate exists. Closing this gap requires real-world hard-negative mining.

Limitations

English only. The byte-level tokenizer tolerates other scripts but the classifier was never trained on them.
Domain shift. Trained on wikitext-103 (encyclopedic English). Short technical statements like "PostgreSQL uses MVCC for transactions" or casual writing score lower than they should because they don't match wikitext prose style.
Prose-shaped boilerplate. Paywall walls, well-written 404 pages, and "subscribe to read" stubs can confuse it because the synthetic negatives are templated, not naturalistic.
Sequence cap of 256 tokens. Long documents must be chunked by the caller. The intended use is per-paragraph scoring during crawl post-processing, not whole-page classification.
Pretraining cap of 800 steps. Final MLM loss ~7.18 (16K vocab unigram baseline ≈ 7.2). The classifier still works fine because the binary task is easy enough that the trunk doesn't need a deeply converged language model — but a longer pretraining run would help the borderline cases.

Intended use

Drop into a web crawler / scraper as a post-extraction quality filter. Score each paragraph or block, drop anything below ~0.5, keep the rest. Cheap enough (≈6 ms/paragraph on CPU) to run inline at crawl time.

Not intended as a general-purpose text classifier, content moderator, toxicity detector, or anything else. It does one thing.

Reproducing

The full training pipeline is in the GitHub repo. End-to-end on a Ryzen 7 takes ~115 minutes:

python collect.py        #  1 min   download wikitext, build labels
python tokenizer.py      #  1 min   train 16K BPE
python pretrain.py       # 75 min   MLM pretraining
python finetune.py       # 38 min   binary classification
python export.py         #  2 sec   PyTorch → ONNX FP32
python quantize_onnx.py  #  5 sec   ONNX FP32 → INT8

License

MIT. Do whatever you want with it.

Citation

@misc{webrank2026,
  title  = {WebRank: a 3M-parameter boilerplate classifier for web text},
  author = {Keirolabs},
  year   = {2026},
  url    = {https://huggingface.co/mannybr/Webrank-nano}
}

Downloads last month: 50

GGUF

Model size

3.14M params

Architecture

webrank

Hardware compatibility

We're not able to determine the quantization variants.

View all variants