search-reranker-irishgov-l5-k10-v1

Fast 5-layer bilingual reranker tuned for the real Irish government serving path: RRF -> reranker over roughly top10 candidates.

This release is the first reranker line in this repo that is explicitly optimized, benchmarked, and promoted for the actual deployed K=10 target rather than the earlier K=80 stress case.

Intended use

  • Best use: direct CPU reranking of the RRF -> top10 candidate set.
  • Recommended runtime profile: ONNX dynamic q8, max_length=140, threads=10.
  • Secondary use: fast in-domain reranking on short gov search candidate lists where Irish/English query quality matters more than broad web-scale generality.

Why this line exists

The original broad reranker is still strong, but too expensive in the critical path. The previous fast 6-layer line improved latency substantially, but the K=10-specific cycle showed a better tradeoff:

  • keep perfect bilingual office-holder ranking
  • match broad on the new hard K=10 gov holdout
  • remain close enough on the broader finephrase gov test
  • hit the ~25 RPS CPU target on the bilingual proxy path with the right serving config

Key q8 benchmarks

This model

Suite Config Overall MRR@10 Irish MRR@10 English MRR@10 Top-1 QPS
Office-holder bilingual 140, threads=8 1.0000 1.0000 1.0000 1.0000 71.66
Bilingual proxy 140, threads=8 0.8872 0.8549 0.9195 0.8100 27.38
Finephrase gov test 140, threads=8 0.9762 n/a 0.9762 0.9524 44.73
Hard gov K=10 holdout 140, threads=8 0.9815 n/a 0.9815 0.9630 45.76

Comparison to the other public reranker lines

Model Suite Config Overall MRR@10 Top-1 QPS
search-reranker-broad-v1 q8 Office-holder bilingual 160, threads=8 0.8056 0.6667 7.30
search-reranker-irishgov-l6-fast-v1 q8 Office-holder bilingual 128, threads=8 1.0000 1.0000 16.67
search-reranker-irishgov-l5-k10-v1 q8 Office-holder bilingual 140, threads=8 1.0000 1.0000 71.66
search-reranker-broad-v1 q8 Hard gov K=10 holdout 160, threads=8 0.9815 0.9630 3.01
search-reranker-irishgov-l6-fast-v1 q8 Hard gov K=10 holdout 128, threads=8 0.9753 0.9630 6.75
search-reranker-irishgov-l5-k10-v1 q8 Hard gov K=10 holdout 140, threads=8 0.9815 0.9630 45.76
search-reranker-broad-v1 q8 Finephrase gov test 192, threads=8 1.0000 1.0000 1.04
search-reranker-irishgov-l6-fast-v1 q8 Finephrase gov test 192, threads=8 1.0000 1.0000 2.33
search-reranker-irishgov-l5-k10-v1 q8 Finephrase gov test 140, threads=8 0.9762 0.9524 44.73

Runtime note

Best measured proxy runtime for this release:

  • q8 ONNX
  • max_length=140
  • threads=10
  • proxy throughput = 28.72 qps
  • average query latency = 34.80 ms
  • p95 query latency = 38.38 ms

This is the configuration that cleared the ~25 RPS target on the current bilingual proxy benchmark.

Artifacts

  • Raw HF checkpoint: ./
  • ONNX dynamic q8 artifact: onnx/model_quantized.onnx

Training provenance

  • Base lineage: cross-encoder/mmarco-mMiniLMv2-L12-H384-v1
  • Immediate parent: reranker_mmarco_mminilm_l5_irishgov_bd35a
  • Training mix: pairmix_k10_stage5_v1
    • bilingual Irish/English office-holder supervision
    • teacher-scored finephrase gov data
    • harder teacher-selected K=10 gov negatives
  • Training metadata: training_info.json

Caveats

  • This release is tuned for K=10 serving and should be evaluated that way.
  • The finephrase gov test is still slightly below the broad/L6 lines on pure accuracy, but the CPU speed gain is large enough to matter in the critical path.
  • max_length=128 is not the right deployment length for this checkpoint. Use 140.

Portfolio comparison

Updated 2026-03-20 from local reranker reports only.

Use this section as the side-by-side public temsa reranker view. Cells are intentionally left out or summarized when the local report set does not contain a trustworthy non-quantized benchmark for the same public path.

General bilingual rerankers

Repo Primary role Non-quantized path Quantized path Extra trustworthy signals
temsa/search-reranker-broad-v1 Broad final-stage reranker ONNX fp32 l256: proxy 0.9490 / 2.08 qps Sibling q8 l160: proxy 0.9458 / 8.41 qps office 0.8056; hard-k10 0.9815; holdout-a03 0.8759
temsa/search-reranker-broad-v1-qint8 Broad CPU q8 sibling See temsa/search-reranker-broad-v1 ONNX q8 l160: proxy 0.9458 / 8.41 qps office 0.8056; hard-k10 0.9815; holdout-a03 0.8759
temsa/search-reranker-irishgov-l6-fast-v1 Fast stage-1 / cascade prefilter PT l192: proxy 0.9192 / 7.99 qps ONNX q8 l128: proxy 0.9218 / 27.25 qps office 1.0000; holdout-a03 0.9259
temsa/search-reranker-irishgov-l6-fast-v2 Fast K=10 policy serving release Same raw checkpoint family as v1; q8 with corrected temporal + office policy is the recommended path ONNX q8 l160/t10: corrected holdout-v5 1.0000 / 47.68 qps office-holdout-v3 1.0000; office-valid 1.0000; legacy-holdout-v4 0.9375
temsa/search-reranker-irishgov-l6-fast-v3 Fastest public K=10 serving profile Same raw checkpoint family as v2; the value is the shorter recommended serving length on corrected gates ONNX q8 l128/t10: corrected holdout-v5 1.0000 / 48.27 qps office-holdout-v3 1.0000; office-valid 1.0000; legacy-holdout-v4 0.9444
temsa/search-reranker-irishgov-l6-fast-v4 Current best fast K=10 reranker Margin-MSE style continuation over v3; q8 per-channel is the recommended deployed artifact ONNX q8 per-channel l128/t10: corrected holdout-v5 1.0000 / 50.78 qps office-holdout-v3 1.0000; office-valid 1.0000; legacy-holdout-v4 0.9444
temsa/search-reranker-irishgov-l6-fast-v5 Fast K=10 serving release with shorter broad-policy route Same raw checkpoint family as v4; the value is a shorter broad-policy serving profile while keeping the office route unchanged ONNX q8 per-channel gov_broad_v1 l104/t10: corrected holdout-v5 1.0000 / 55.77 qps fresh-holdout-v6 0.8917; office-holdout-v3 1.0000; office-valid 1.0000
temsa/search-reranker-irishgov-l5-k10-v1 Fast K=10 direct reranker PT l160: proxy 0.8800 / 1.32 qps ONNX q8 l140: proxy 0.8872 / 28.72 qps office 1.0000; hard-k10 0.9815; holdout-a03 0.9630
temsa/search-reranker-irishgov-l5-k10-v2 Current K=10 successor PT l128: office 1.0000, finephrase 0.9405 ONNX q8 l140: proxy 0.8853 / 28.27 qps office 1.0000; holdout-a03 1.0000

Policy rerankers

Repo Primary role Non-quantized path Quantized path Extra trustworthy signals
temsa/search-reranker-broad-policy-v1 Broad policy-tuned reranker PT l224: policy-all 0.9270 / 7.94 qps ONNX q8 l224: policy-all 0.9205 / 27.55 qps office 0.9537; holdout-a04 0.9049
temsa/search-reranker-broad-policy-v3 Current broad policy successor ONNX fp32 l224: policy-all 0.9259 / 15.75 qps ONNX q8 reduce_range l224: policy-all 0.9268 / 30.12 qps office 0.9676; holdout-v3 0.9286
temsa/search-reranker-broad-policy-v4 Current broad policy serving release Same raw checkpoint family as v3; q8 gov_broad_v1 is the recommended path ONNX q8 reduce_range + gov_broad_v1 l224: policy-all 0.9257 / 31.64 qps office 0.9676; holdout-v3 0.9271; holdout-v4 0.9583
temsa/search-reranker-broad-policy-v5 Current broad policy serving release Same raw checkpoint family as v4; q8 gov_broad_v1 is the recommended path ONNX q8 reduce_range + gov_broad_v1 l224/t10: policy-all 0.9711 / 26.12 qps office 1.0000; holdout-v3 1.0000; holdout-v4 1.0000
temsa/search-reranker-broad-policy-v6 Broad policy serving release with corrected temporal gate Same raw checkpoint family as v5; q8 gov_broad_v1 + office_holder_v1 is the recommended path ONNX q8 gov_broad_v1 l176/t10: corrected holdout-v5 0.8657 / 25.74 qps office-l160 1.0000; legacy-holdout-v4 0.9444; legacy-policy-all 0.9167
temsa/search-reranker-broad-policy-v7 Broad policy serving release with stronger latest-turn and topic-specific news routing Same raw checkpoint family as v6; q8 gov_broad_v1 + office_holder_v1 is the recommended path ONNX q8 gov_broad_v1 l176/t10: fresh holdout-v6 0.9375 / 15.65 qps office-l224 1.0000; office-holdout-v4 0.9659; corrected-holdout-v5 1.0000; legacy-policy-all 0.9196
temsa/search-reranker-broad-policy-v8 Broad policy serving-profile release with shorter broad and office routes Same raw checkpoint family as v7; q8 gov_broad_v1 + office_holder_v1 is the recommended path ONNX q8 gov_broad_v1 l160/t10: fresh holdout-v7 0.9500 / 29.85 qps office-l208 1.0000; office-holdout-v3 1.0000; office-holdout-v4 0.9659; fresh-holdout-v6 0.9500
temsa/search-reranker-broad-policy-v9 Broad policy serving release with stronger year-specific change-history routing Same raw checkpoint family as v8; q8 gov_broad_v1 + office_holder_v1 is the recommended path ONNX q8 gov_broad_v1 l160/t10: gov.ie category-policy corrected 1.0000 / 26.01 qps legacy-policy-all 0.9282; fresh-holdout-v6 0.9500; office-valid 1.0000; office-holdout-v4 0.9659

Dynamic q8 CPU at K=20

These rows use the current local K=20 benchmark harness:

  • proxy_k20: policy=none
  • broad_policy_k20: policy=gov_broad_v1
  • office_holder_k20: policy=office_holder_v1
  • fixed threads=10, batch_size=32
Repo Broad len Office len Proxy K=20 Broad-policy K=20 Office-holder K=20 Notes
temsa/search-reranker-broad-v1 160 160 0.9458 / 6.93 qps 0.9437 / 9.10 qps 0.6073 / 8.23 qps broad baseline
temsa/search-reranker-broad-v1-qint8 160 160 0.9458 / 6.65 qps 0.9437 / 6.62 qps 0.6073 / 5.91 qps q8 sibling repo
temsa/search-reranker-broad-policy-v1 224 224 0.9414 / 7.20 qps 0.9125 / 11.32 qps 0.6966 / 6.49 qps first policy model
temsa/search-reranker-broad-policy-v2 224 224 0.9414 / 6.44 qps 0.9125 / 9.40 qps 0.6966 / 7.75 qps office-policy serving fix era
temsa/search-reranker-broad-policy-v3 224 224 0.9364 / 7.38 qps 0.9250 / 12.20 qps 0.7451 / 7.15 qps custom CPU continuation
temsa/search-reranker-broad-policy-v4 224 224 0.9364 / 7.16 qps 0.9250 / 11.40 qps 0.7451 / 6.90 qps gov_broad_v1 serving route
temsa/search-reranker-broad-policy-v5 224 224 0.9364 / 7.00 qps 0.9250 / 12.35 qps 0.7451 / 6.47 qps stronger serving release
temsa/search-reranker-broad-policy-v6 176 160 0.9433 / 8.09 qps 0.9250 / 9.97 qps 0.7186 / 9.77 qps corrected temporal gate
temsa/search-reranker-broad-policy-v7 176 224 0.9433 / 7.91 qps 0.9250 / 12.11 qps 0.7451 / 7.49 qps stronger latest-turn routing
temsa/search-reranker-broad-policy-v8 160 208 0.9371 / 9.49 qps 0.9250 / 11.21 qps 0.7527 / 7.01 qps current broad fallback
temsa/search-reranker-broad-policy-v9 160 208 0.9371 / 8.97 qps 0.9250 / 11.56 qps 0.7527 / 6.62 qps current broad fallback + change-history fix
temsa/search-reranker-irishgov-l5-k10-v1 140 140 0.8872 / 15.56 qps 0.8938 / 21.68 qps 0.6751 / 21.01 qps fast K=10 direct
temsa/search-reranker-irishgov-l5-k10-v2 140 140 0.8853 / 18.06 qps 0.8854 / 18.87 qps 0.6327 / 17.93 qps fast K=10 successor
temsa/search-reranker-irishgov-l6-fast-v1 128 128 0.9286 / 18.70 qps 0.9042 / 17.65 qps 0.6934 / 20.36 qps fast stage-1 v1
temsa/search-reranker-irishgov-l6-fast-v2 128 128 0.9286 / 18.59 qps 0.9042 / 19.67 qps 0.6934 / 19.30 qps fast stage-1 v2
temsa/search-reranker-irishgov-l6-fast-v3 128 128 0.9286 / 18.48 qps 0.9042 / 18.60 qps 0.6934 / 19.30 qps shorter serving profile
temsa/search-reranker-irishgov-l6-fast-v4 128 128 0.9184 / 17.66 qps 0.9008 / 21.83 qps 0.7197 / 17.31 qps margin-MSE per-channel
temsa/search-reranker-irishgov-l6-fast-v5 104 128 0.8987 / 20.85 qps 0.9133 / 20.29 qps 0.7197 / 15.45 qps current fast route

Intentional gaps:

  • search-reranker-broad-v1: the local reports include strong fp32 ONNX proxy and office data, but not a matching fp32 in-domain finephrase / hard-K10 / holdout-A03 set, so those are not claimed here.
  • search-reranker-irishgov-l5-k10-v2: the local non-quantized reports are trustworthy for office / finephrase / hard-K10, but not for the same proxy runtime shape as the shipped q8 path.
  • search-reranker-irishgov-l6-fast-v1: the local non-quantized reports cover proxy and office, but not the fresh holdout-A03 slice used for the q8 K=10 comparison.
Downloads last month
11
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for temsa/search-reranker-irishgov-l5-k10-v1