Fix CPU benchmark (ran on fp32 this time)

b33e5ce verified 3 days ago

61.6 kB

language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - cross-encoder
  - reranker
  - generated_from_trainer
  - dataset_size:143393475
  - loss:MSELoss
base_model: jhu-clsp/ettin-encoder-32m
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
  - map
  - mrr@10
  - ndcg@10
model-index:
  - name: ettin-reranker-32m-v1
    results:
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoMSMARCO R100
          type: NanoMSMARCO_R100
        metrics:
          - type: map
            value: 0.6366
            name: Map
          - type: mrr@10
            value: 0.6537
            name: Mrr@10
          - type: ndcg@10
            value: 0.7111
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNFCorpus R100
          type: NanoNFCorpus_R100
        metrics:
          - type: map
            value: 0.3534
            name: Map
          - type: mrr@10
            value: 0.5453
            name: Mrr@10
          - type: ndcg@10
            value: 0.3777
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNQ R100
          type: NanoNQ_R100
        metrics:
          - type: map
            value: 0.7338
            name: Map
          - type: mrr@10
            value: 0.7672
            name: Mrr@10
          - type: ndcg@10
            value: 0.7873
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoFiQA2018 R100
          type: NanoFiQA2018_R100
        metrics:
          - type: map
            value: 0.4774
            name: Map
          - type: mrr@10
            value: 0.604
            name: Mrr@10
          - type: ndcg@10
            value: 0.5401
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoTouche2020 R100
          type: NanoTouche2020_R100
        metrics:
          - type: map
            value: 0.4899
            name: Map
          - type: mrr@10
            value: 0.7815
            name: Mrr@10
          - type: ndcg@10
            value: 0.5663
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoSciFact R100
          type: NanoSciFact_R100
        metrics:
          - type: map
            value: 0.7029
            name: Map
          - type: mrr@10
            value: 0.7147
            name: Mrr@10
          - type: ndcg@10
            value: 0.7433
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoHotpotQA R100
          type: NanoHotpotQA_R100
        metrics:
          - type: map
            value: 0.9193
            name: Map
          - type: mrr@10
            value: 0.98
            name: Mrr@10
          - type: ndcg@10
            value: 0.9501
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoArguAna R100
          type: NanoArguAna_R100
        metrics:
          - type: map
            value: 0.5671
            name: Map
          - type: mrr@10
            value: 0.5932
            name: Mrr@10
          - type: ndcg@10
            value: 0.6787
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoFEVER R100
          type: NanoFEVER_R100
        metrics:
          - type: map
            value: 0.9325
            name: Map
          - type: mrr@10
            value: 0.9567
            name: Mrr@10
          - type: ndcg@10
            value: 0.9512
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoDBPedia R100
          type: NanoDBPedia_R100
        metrics:
          - type: map
            value: 0.6413
            name: Map
          - type: mrr@10
            value: 0.8847
            name: Mrr@10
          - type: ndcg@10
            value: 0.7178
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoClimateFEVER R100
          type: NanoClimateFEVER_R100
        metrics:
          - type: map
            value: 0.4464
            name: Map
          - type: mrr@10
            value: 0.696
            name: Mrr@10
          - type: ndcg@10
            value: 0.5251
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoSCIDOCS R100
          type: NanoSCIDOCS_R100
        metrics:
          - type: map
            value: 0.2919
            name: Map
          - type: mrr@10
            value: 0.5483
            name: Mrr@10
          - type: ndcg@10
            value: 0.3569
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoQuoraRetrieval R100
          type: NanoQuoraRetrieval_R100
        metrics:
          - type: map
            value: 0.9297
            name: Map
          - type: mrr@10
            value: 0.9617
            name: Mrr@10
          - type: ndcg@10
            value: 0.9538
            name: Ndcg@10
      - task:
          type: cross-encoder-nano-beir
          name: Cross Encoder Nano BEIR
        dataset:
          name: NanoBEIR R100 mean
          type: NanoBEIR_R100_mean
        metrics:
          - type: map
            value: 0.6248
            name: Map
          - type: mrr@10
            value: 0.7452
            name: Mrr@10
          - type: ndcg@10
            value: 0.6815
            name: Ndcg@10

ettin-reranker-32m-v1

This is a Cross Encoder model finetuned from jhu-clsp/ettin-encoder-32m on the cross-encoder/ettin-reranker-v1-data dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

See the release blogpost for details on the training recipe, evaluation results, and speed benchmarks against other public rerankers. The Evaluation section below also has the headline numbers.

Model Details

Model Description

Model Type: Cross Encoder
Base model: jhu-clsp/ettin-encoder-32m
Maximum Sequence Length: 7999 tokens
Number of Output Labels: 1 label
Supported Modality: Text
Training Dataset: cross-encoder/ettin-reranker-v1-data
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Full Model Architecture

CrossEncoder(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
  (1): Pooling({'embedding_dimension': 256, 'pooling_mode': 'cls', 'include_prompt': True})
  (2): Dense({'in_features': 256, 'out_features': 256, 'bias': False, 'activation_function': 'torch.nn.modules.activation.GELU', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (3): LayerNorm({'dimension': 256})
  (4): Dense({'in_features': 256, 'out_features': 1, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'scores'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder(
    "cross-encoder/ettin-reranker-32m-v1",
    model_kwargs={"dtype": "bfloat16", "attn_implementation": "flash_attention_2"},  # Optional: pip install kernels
)

# Get scores for pairs of inputs
query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# [ 6.21875 10.8125   8.5625   9.875  ]

# Or rank passages by relevance to a single query
ranked = model.rank(query, passages)
print(ranked)
# [{'corpus_id': 1, 'score': np.float32(10.8125)}, ...]

Evaluation

MTEB(eng, v2) Retrieval

Each model in the ettin-reranker-v1 family was evaluated on the full MTEB(eng, v2) Retrieval benchmark (10 tasks, top-100 reranked) using MTEB's two-stage reranking flow, pairing each reranker with six embedding models that span the speed/quality spectrum. The dashed retriever-only line in each chart below is the headline number to beat. Anything below it means the reranker actively hurts the pipeline on average:

Full table of results (click to expand)

Mean NDCG@10 over the 6 embedder pairings, sorted by MTEB. The released ettin-reranker-v1 family is in bold, and the teacher mixedbread-ai/mxbai-rerank-large-v2 is underlined.

Reranker	Params	MTEB(eng, v2) Retrieval NDCG@10
`Qwen/Qwen3-Reranker-4B`^†	4.02B	0.6367
`mixedbread-ai/mxbai-rerank-large-v2`	1.54B	0.6115
`cross-encoder/ettin-reranker-1b-v1`	1.00B	0.6114
`cross-encoder/ettin-reranker-400m-v1`	401M	0.6091
`cross-encoder/ettin-reranker-150m-v1`	151M	0.5994
`Qwen/Qwen3-Reranker-0.6B`	596M	0.5940
`mixedbread-ai/mxbai-rerank-base-v2`	494M	0.5920
`cross-encoder/ettin-reranker-68m-v1`	68.6M	0.5915
`jinaai/jina-reranker-m0`	2.44B	0.5856
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	0.5843
`cross-encoder/ettin-reranker-32m-v1`	32.8M	0.5779
`ibm-granite/granite-embedding-reranker-english-r2`	150M	0.5656
`cross-encoder/ettin-reranker-17m-v1`	17.6M	0.5576
`BAAI/bge-reranker-v2-m3`	568M	0.5526
`zeroentropy/zerank-2-reranker`^†	4.02B	0.5300
`BAAI/bge-reranker-large`	560M	0.5098
`cross-encoder/ms-marco-MiniLM-L6-v2`	22.7M	0.5082
`cross-encoder/ms-marco-MiniLM-L12-v2`	33.4M	0.5066
`mixedbread-ai/mxbai-rerank-large-v1`	435M	0.5063
`cross-encoder/ms-marco-MiniLM-L4-v2`	19.2M	0.4979
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70.8M	0.4968
`BAAI/bge-reranker-base`	278M	0.4890
`mixedbread-ai/mxbai-rerank-base-v1`	184M	0.4865

^† Capped to max_seq_length=8192 (the 4B Qwen3-based rerankers don't fit on a single H100 80GB at native context). Native-context evaluation is likely higher.

See the release blogpost for the full analysis and per-model commentary.

Speed

All six released models were benchmarked against thirteen public rerankers on three hardware tiers, using sentence-transformers/natural-questions at max_length=512 with each model's best supported attention implementation. The full sweep over fp32+SDPA, bf16+SDPA, padded bf16+FA2, and unpadded bf16+FA2 (showing why the ettin-reranker-v1 family is faster than other ModernBERT-based rerankers) is in the release blogpost. This table shows the throughput in pairs per second on a NVIDIA H100 80GB, all in bfloat16:

Model	Params	Attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	FA2	7517
`cross-encoder/ettin-reranker-32m-v1`	32M	FA2	6602
`cross-encoder/ettin-reranker-68m-v1`	68M	FA2	4913
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	FA2	4029
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	FA2	3817
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	FA2	3311
`cross-encoder/ettin-reranker-150m-v1`	150M	FA2	3237
`BAAI/bge-reranker-base`	278M	FA2	2858
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	2636
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	1953
`cross-encoder/ettin-reranker-400m-v1`	400M	FA2	1738
`BAAI/bge-reranker-large`	560M	FA2	1659
`BAAI/bge-reranker-v2-m3`	568M	FA2	1569
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	FA2	1418
`ibm-granite/granite-embedding-reranker-english-r2`	150M	FA2	1404
`cross-encoder/ettin-reranker-1b-v1`	1B	FA2	928
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	867
`mixedbread-ai/mxbai-rerank-base-v2`	494M	FA2	809
`mixedbread-ai/mxbai-rerank-large-v2`	1.5B	FA2	387

Same benchmark on a consumer GPU (RTX 3090, 24 GB)

Model	Params	Best attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	FA2	9008
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	FA2	5071
`cross-encoder/ettin-reranker-32m-v1`	32M	FA2	4497
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	FA2	4234
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	FA2	2847
`cross-encoder/ettin-reranker-68m-v1`	68M	FA2	1916
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	1677
`BAAI/bge-reranker-base`	278M	FA2	1329
`cross-encoder/ettin-reranker-150m-v1`	150M	FA2	982
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	772
`ibm-granite/granite-embedding-reranker-english-r2`	150M	FA2	598
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	FA2	586
`BAAI/bge-reranker-large`	560M	FA2	448
`BAAI/bge-reranker-v2-m3`	568M	FA2	436
`cross-encoder/ettin-reranker-400m-v1`	400M	FA2	429
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	266
`mixedbread-ai/mxbai-rerank-base-v2`	494M	FA2	221
`cross-encoder/ettin-reranker-1b-v1`	1B	FA2	189
`mixedbread-ai/mxbai-rerank-large-v2`	1.5B	FA2	69

Same benchmark on CPU (Intel Core i7-13700K)

Model	Params	Best attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	SDPA	267.4
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	SDPA	206.2
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	SDPA	143.9
`cross-encoder/ettin-reranker-32m-v1`	32M	SDPA	92.5
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	SDPA	75.9
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	38.9
`cross-encoder/ettin-reranker-68m-v1`	68M	SDPA	31.2
`BAAI/bge-reranker-base`	278M	SDPA	19.2
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	SDPA	14.7
`ibm-granite/granite-embedding-reranker-english-r2`	150M	SDPA	14.5
`cross-encoder/ettin-reranker-150m-v1`	150M	SDPA	14.0
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	13.4
`BAAI/bge-reranker-large`	560M	SDPA	6.2
`BAAI/bge-reranker-v2-m3`	568M	SDPA	6.0
`cross-encoder/ettin-reranker-400m-v1`	400M	SDPA	5.2
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	4.3
`mixedbread-ai/mxbai-rerank-base-v2`	494M	SDPA	3.5
`cross-encoder/ettin-reranker-1b-v1`	1B	SDPA	2.1

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100, NanoNQ_R100, NanoFiQA2018_R100, NanoTouche2020_R100, NanoSciFact_R100, NanoHotpotQA_R100, NanoArguAna_R100, NanoFEVER_R100, NanoDBPedia_R100, NanoClimateFEVER_R100, NanoSCIDOCS_R100 and NanoQuoraRetrieval_R100

Evaluated with CrossEncoderRerankingEvaluator with these parameters:

{
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	NanoMSMARCO_R100	NanoNFCorpus_R100	NanoNQ_R100	NanoFiQA2018_R100	NanoTouche2020_R100	NanoSciFact_R100	NanoHotpotQA_R100	NanoArguAna_R100	NanoFEVER_R100	NanoDBPedia_R100	NanoClimateFEVER_R100	NanoSCIDOCS_R100	NanoQuoraRetrieval_R100
map	0.6366 (+0.1470)	0.3534 (+0.0924)	0.7338 (+0.3142)	0.4774 (+0.1124)	0.4899 (-0.0600)	0.7029 (+0.0332)	0.9193 (+0.1510)	0.5671 (+0.1564)	0.9325 (+0.1606)	0.6413 (+0.1295)	0.4464 (+0.2061)	0.2919 (+0.0176)	0.9297 (+0.0988)
mrr@10	0.6537 (+0.1762)	0.5453 (+0.0454)	0.7672 (+0.3406)	0.6040 (+0.1132)	0.7815 (-0.1257)	0.7147 (+0.0366)	0.9800 (+0.0571)	0.5932 (+0.2001)	0.9567 (+0.1766)	0.8847 (+0.0840)	0.6960 (+0.2922)	0.5483 (-0.0112)	0.9617 (+0.0935)
ndcg@10	0.7111 (+0.1707)	0.3777 (+0.0526)	0.7873 (+0.2866)	0.5401 (+0.1027)	0.5663 (-0.1275)	0.7433 (+0.0334)	0.9501 (+0.1224)	0.6787 (+0.1898)	0.9512 (+0.1418)	0.7178 (+0.1034)	0.5251 (+0.2074)	0.3569 (+0.0218)	0.9538 (+0.0852)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean

Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq",
        "fiqa2018",
        "touche2020",
        "scifact",
        "hotpotqa",
        "arguana",
        "fever",
        "dbpedia",
        "climatefever",
        "scidocs",
        "quoraretrieval"
    ],
    "dataset_id": "sentence-transformers/NanoBEIR-en",
    "rerank_k": 100,
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	Value
map	0.6248 (+0.1199)
mrr@10	0.7452 (+0.1137)
ndcg@10	0.6815 (+0.1069)

The release blogpost quotes a slightly higher NanoBEIR mean NDCG@10 of 0.6825 for this model, computed in fp32 rather than the bfloat16 used by the training-time evaluation above. Both numbers are valid.

Training Details

Training Dataset

ettin-reranker-v1-data

Dataset: cross-encoder/ettin-reranker-v1-data
Size: 143,393,475 training samples
Columns: query, document, and label

Approximate statistics based on the first 1000 samples:

	query	document	label
type	string	string	float
details	min: 26 characters mean: 55.52 characters max: 249 characters	min: 63 characters mean: 659.91 characters max: 3975 characters	min: -2.94 mean: 8.51 max: 13.88

Samples:

query	document	label
`Help me with my Reborn performance`	I was reading the comment section for Dotacinema's world of dota video, and a bunch of people were complaining how there were a lot of bugs and some talked about PERFORMANCE ISSUES. But there were also people saying that reborn has actually IMPROVED their gameplay? I am one of those people who is running into performance issues and would desperately like to know how some are getting BETTER performance while others like me are getting worse. I'm not complaining about bugs, I'm complaing about framerate, I use to get 60 fps solid in source 1 but I now have 40 or at worst 30 fps in source 2. I have an i3 processor/gtx560ti/16gb RAM i dont think it's a potato pc, so I dont know what's happening, I cleaned my computer recently so dust isnt affecting anything in anyway. So if you gained or had IMPROVED performance in source 2 please list the settings you are enabling, so I can see where I am at fault. (v sync is off btw) TLDR: Have bad performance now from source 2, if you have good p...	`9.5`
`Really wanna try out the game and expansion, ~$60 is hefty. Likelihood of sales?`	`As per title, steam sells the game and its expansions for $60 total. Heavy price to drop. Are there sales on any other website? This game looks fantastic to immerse in otherwise and I'm pleased that this subreddit has at least some attention to help out new folks!`	`9.25`
`Your Avatar. [MGSV Spoilers]`	`Was anyone else suprised he actually replaces the snake model in some cutscenes. I've only tried the first Quiet cutscenes, i was just amazed I haven't seen anybody else say this yet. Sorry if repost.`	`5.25`

Loss: MSELoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity"
}

Evaluation Dataset

ettin-reranker-v1-data

Dataset: cross-encoder/ettin-reranker-v1-data
Size: 5,000 evaluation samples
Columns: query, document, and label

Approximate statistics based on the first 1000 samples:

	query	document	label
type	string	string	float
details	min: 14 characters mean: 52.62 characters max: 168 characters	min: 11 characters mean: 50.12 characters max: 184 characters	min: 4.44 mean: 13.49 max: 18.62

Samples:

query	document	label
`Why do we need binomial distribution?`	`Why is the binomial distribution important?`	`11.375`
`I already have Windows 10, can I delete Windows.old?`	`After resetting windows 10, can I safely delete the "old windows" folder?`	`10.875`
`How can guys last longer during sex?`	`How do men last longer in bed?`	`10.8125`

Loss: MSELoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 64
num_train_epochs: 1
learning_rate: 0.00012
warmup_steps: 0.03
bf16: True
per_device_eval_batch_size: 64
load_best_model_at_end: True
seed: 12
dataloader_num_workers: 4

All Hyperparameters

Click to expand

per_device_train_batch_size: 64
num_train_epochs: 1
max_steps: -1
learning_rate: 0.00012
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.03
optim: adamw_torch
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: None
trackio_bucket_id: None
trackio_static_space_id: None
per_device_eval_batch_size: 64
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 12
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: True
dataloader_num_workers: 4
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_static_graph: None
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_R100_ndcg@10	NanoNFCorpus_R100_ndcg@10	NanoNQ_R100_ndcg@10	NanoFiQA2018_R100_ndcg@10	NanoTouche2020_R100_ndcg@10	NanoSciFact_R100_ndcg@10	NanoHotpotQA_R100_ndcg@10	NanoArguAna_R100_ndcg@10	NanoFEVER_R100_ndcg@10	NanoDBPedia_R100_ndcg@10	NanoClimateFEVER_R100_ndcg@10	NanoSCIDOCS_R100_ndcg@10	NanoQuoraRetrieval_R100_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0320 (-0.5085)	0.2565 (-0.0686)	0.0418 (-0.4588)	0.0438 (-0.3936)	0.1218 (-0.5720)	0.0425 (-0.6674)	0.1154 (-0.7123)	0.0673 (-0.4215)	0.0519 (-0.7575)	0.1926 (-0.4217)	0.0770 (-0.2408)	0.0483 (-0.2868)	0.0550 (-0.8137)	0.0881 (-0.4864)
0.0000	1	73.0701	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0250	7002	3.8772	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0500	14004	1.6962	1.3073	0.6372 (+0.0968)	0.3971 (+0.0721)	0.6898 (+0.1892)	0.4969 (+0.0595)	0.5659 (-0.1279)	0.7500 (+0.0401)	0.9324 (+0.1047)	0.6589 (+0.1701)	0.9412 (+0.1318)	0.6867 (+0.0723)	0.4683 (+0.1506)	0.3544 (+0.0192)	0.9462 (+0.0775)	0.6558 (+0.0812)
0.0750	21006	1.5204	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1000	28008	1.4272	1.1940	0.6842 (+0.1438)	0.4023 (+0.0773)	0.7159 (+0.2152)	0.5034 (+0.0659)	0.5782 (-0.1156)	0.7249 (+0.0150)	0.9338 (+0.1061)	0.6179 (+0.1291)	0.9208 (+0.1114)	0.6951 (+0.0808)	0.5051 (+0.1873)	0.3628 (+0.0277)	0.9451 (+0.0764)	0.6607 (+0.0862)
0.1250	35010	1.3634	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1500	42012	1.3163	1.0090	0.6605 (+0.1201)	0.4069 (+0.0819)	0.7260 (+0.2254)	0.5327 (+0.0952)	0.5682 (-0.1256)	0.7211 (+0.0112)	0.9302 (+0.1025)	0.6496 (+0.1608)	0.9053 (+0.0959)	0.7033 (+0.0890)	0.5055 (+0.1877)	0.4028 (+0.0677)	0.9455 (+0.0768)	0.6660 (+0.0914)
0.1750	49014	1.2788	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2000	56016	1.2434	0.8689	0.6827 (+0.1422)	0.4053 (+0.0803)	0.7653 (+0.2647)	0.4848 (+0.0473)	0.5759 (-0.1179)	0.7387 (+0.0288)	0.9464 (+0.1187)	0.6446 (+0.1558)	0.9160 (+0.1066)	0.7077 (+0.0934)	0.5025 (+0.1848)	0.3707 (+0.0356)	0.9464 (+0.0777)	0.6682 (+0.0937)
0.2250	63018	1.2135	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2500	70020	1.1868	0.8525	0.6714 (+0.1310)	0.4041 (+0.0791)	0.7533 (+0.2526)	0.5034 (+0.0660)	0.5982 (-0.0956)	0.7458 (+0.0358)	0.9383 (+0.1106)	0.6642 (+0.1753)	0.9295 (+0.1201)	0.7090 (+0.0947)	0.5150 (+0.1973)	0.3674 (+0.0323)	0.9483 (+0.0796)	0.6729 (+0.0984)
0.2750	77022	1.1614	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3000	84024	1.1426	0.8028	0.6649 (+0.1245)	0.4001 (+0.0751)	0.7688 (+0.2682)	0.5109 (+0.0735)	0.5692 (-0.1246)	0.7681 (+0.0582)	0.9433 (+0.1156)	0.7080 (+0.2191)	0.9277 (+0.1183)	0.6873 (+0.0730)	0.5279 (+0.2102)	0.3766 (+0.0415)	0.9583 (+0.0896)	0.6778 (+0.1032)
0.3250	91026	1.1219	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3500	98028	1.1001	0.7750	0.6879 (+0.1474)	0.3860 (+0.0610)	0.7553 (+0.2546)	0.5386 (+0.1012)	0.5828 (-0.1110)	0.7089 (-0.0010)	0.9363 (+0.1086)	0.6587 (+0.1699)	0.9346 (+0.1252)	0.6903 (+0.0760)	0.5269 (+0.2092)	0.3950 (+0.0599)	0.9485 (+0.0798)	0.6731 (+0.0985)
0.3750	105030	1.0853	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4000	112032	1.0658	0.7609	0.6882 (+0.1478)	0.4008 (+0.0757)	0.7679 (+0.2673)	0.5054 (+0.0680)	0.5749 (-0.1189)	0.7378 (+0.0279)	0.9460 (+0.1182)	0.6375 (+0.1486)	0.9375 (+0.1281)	0.6996 (+0.0852)	0.5160 (+0.1983)	0.3653 (+0.0302)	0.9561 (+0.0874)	0.6718 (+0.0972)
0.4250	119034	1.0506	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4500	126036	1.0362	0.8689	0.7079 (+0.1675)	0.4064 (+0.0813)	0.7693 (+0.2686)	0.5118 (+0.0744)	0.5916 (-0.1022)	0.7200 (+0.0101)	0.9439 (+0.1162)	0.6686 (+0.1798)	0.9202 (+0.1108)	0.7056 (+0.0913)	0.5211 (+0.2034)	0.3740 (+0.0389)	0.9611 (+0.0924)	0.6770 (+0.1025)
0.4750	133038	1.0205	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5000	140040	1.0067	0.7718	0.6867 (+0.1463)	0.3975 (+0.0725)	0.7751 (+0.2745)	0.4923 (+0.0548)	0.5812 (-0.1126)	0.7203 (+0.0104)	0.9445 (+0.1168)	0.6774 (+0.1886)	0.9376 (+0.1282)	0.6991 (+0.0847)	0.5132 (+0.1954)	0.3640 (+0.0289)	0.9527 (+0.0841)	0.6724 (+0.0979)
0.5250	147042	0.9960	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5500	154044	0.9820	0.7580	0.7104 (+0.1700)	0.4012 (+0.0761)	0.7787 (+0.2781)	0.5099 (+0.0725)	0.5716 (-0.1223)	0.7429 (+0.0330)	0.9416 (+0.1139)	0.6791 (+0.1903)	0.9490 (+0.1396)	0.7085 (+0.0942)	0.5066 (+0.1888)	0.3756 (+0.0404)	0.9569 (+0.0883)	0.6794 (+0.1048)
0.5750	161046	0.9715	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6000	168048	0.9597	0.6707	0.6819 (+0.1415)	0.3924 (+0.0674)	0.7767 (+0.2760)	0.5066 (+0.0692)	0.5618 (-0.1320)	0.7440 (+0.0341)	0.9462 (+0.1185)	0.6393 (+0.1505)	0.9293 (+0.1199)	0.7106 (+0.0963)	0.5128 (+0.1951)	0.3777 (+0.0426)	0.9521 (+0.0834)	0.6717 (+0.0971)
0.6250	175050	0.9477	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6500	182052	0.9395	0.7027	0.6909 (+0.1505)	0.3866 (+0.0616)	0.8037 (+0.3031)	0.4867 (+0.0493)	0.5710 (-0.1228)	0.7571 (+0.0472)	0.9436 (+0.1159)	0.6785 (+0.1897)	0.9329 (+0.1235)	0.7195 (+0.1051)	0.5204 (+0.2026)	0.3750 (+0.0398)	0.9480 (+0.0794)	0.6780 (+0.1034)
0.6750	189054	0.9304	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7000	196056	0.9182	0.6734	0.7061 (+0.1657)	0.3774 (+0.0523)	0.7877 (+0.2871)	0.5269 (+0.0895)	0.5726 (-0.1212)	0.7434 (+0.0335)	0.9486 (+0.1209)	0.6663 (+0.1775)	0.9414 (+0.1320)	0.7109 (+0.0965)	0.5032 (+0.1855)	0.3767 (+0.0416)	0.9527 (+0.0841)	0.6780 (+0.1035)
0.7250	203058	0.9105	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7500	210060	0.9006	0.6761	0.7077 (+0.1673)	0.3643 (+0.0393)	0.7947 (+0.2941)	0.5503 (+0.1129)	0.5676 (-0.1262)	0.7360 (+0.0261)	0.9515 (+0.1237)	0.6716 (+0.1827)	0.9415 (+0.1320)	0.7142 (+0.0998)	0.5285 (+0.2107)	0.3721 (+0.0370)	0.9510 (+0.0823)	0.6808 (+0.1063)
0.7750	217062	0.8912	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8000	224064	0.8840	0.6350	0.7071 (+0.1667)	0.3827 (+0.0577)	0.7847 (+0.2841)	0.5194 (+0.0819)	0.5692 (-0.1246)	0.7333 (+0.0234)	0.9427 (+0.1150)	0.6952 (+0.2064)	0.9408 (+0.1314)	0.7113 (+0.0970)	0.5328 (+0.2150)	0.3606 (+0.0254)	0.9505 (+0.0819)	0.6793 (+0.1047)
0.8250	231066	0.8762	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8500	238068	0.8673	0.6276	0.7218 (+0.1814)	0.3709 (+0.0458)	0.7783 (+0.2777)	0.5525 (+0.1151)	0.5689 (-0.1249)	0.7517 (+0.0417)	0.9411 (+0.1134)	0.6744 (+0.1856)	0.9427 (+0.1332)	0.7094 (+0.0951)	0.5338 (+0.2161)	0.3612 (+0.0261)	0.9544 (+0.0857)	0.6816 (+0.1071)
0.8750	245070	0.8583	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9000	252072	0.8544	0.6251	0.7240 (+0.1836)	0.3695 (+0.0445)	0.7761 (+0.2755)	0.5362 (+0.0987)	0.5672 (-0.1267)	0.7364 (+0.0265)	0.9449 (+0.1172)	0.6903 (+0.2015)	0.9518 (+0.1424)	0.7196 (+0.1053)	0.5362 (+0.2185)	0.3648 (+0.0296)	0.9539 (+0.0852)	0.6824 (+0.1078)
0.9250	259074	0.8491	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9501	266076	0.8423	0.6143	0.7129 (+0.1724)	0.3796 (+0.0546)	0.7821 (+0.2814)	0.5496 (+0.1122)	0.5681 (-0.1257)	0.7443 (+0.0344)	0.9451 (+0.1173)	0.6738 (+0.1849)	0.9513 (+0.1419)	0.7160 (+0.1017)	0.5380 (+0.2203)	0.3594 (+0.0243)	0.9526 (+0.0839)	0.6825 (+0.1080)
0.9751	273078	0.8392	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.0	280065	-	0.6013	0.7111 (+0.1707)	0.3777 (+0.0526)	0.7873 (+0.2866)	0.5401 (+0.1027)	0.5663 (-0.1275)	0.7433 (+0.0334)	0.9501 (+0.1224)	0.6787 (+0.1898)	0.9512 (+0.1418)	0.7178 (+0.1034)	0.5251 (+0.2074)	0.3569 (+0.0218)	0.9538 (+0.0852)	0.6815 (+0.1069)

The bold row denotes the saved checkpoint.

Training Time

Training: 5.2 hours
Evaluation: 8.5 minutes
Total: 5.3 hours

Framework Versions

Python: 3.11.15
Sentence Transformers: 5.4.1
Transformers: 5.7.0
PyTorch: 2.7.0+cu126
Accelerate: 1.13.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Citation

BibTeX

Ettin Reranker Blogpost

@misc{aarsen2026ettin-reranker,
    title = "Introducing the Ettin Reranker Family",
    author = "Aarsen, Tom",
    year = "2026",
    publisher = "Hugging Face",
    url = "https://huggingface.co/blog/ettin-reranker",
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}