Fix CPU benchmark (ran on fp32 this time)

5dca362 verified 3 days ago

70 kB

language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - cross-encoder
  - reranker
  - generated_from_trainer
  - dataset_size:143393475
  - loss:MSELoss
base_model: jhu-clsp/ettin-encoder-400m
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
  - map
  - mrr@10
  - ndcg@10
model-index:
  - name: ettin-reranker-400m-v1
    results:
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoMSMARCO R100
          type: NanoMSMARCO_R100
        metrics:
          - type: map
            value: 0.641
            name: Map
          - type: mrr@10
            value: 0.6376
            name: Mrr@10
          - type: ndcg@10
            value: 0.7091
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNFCorpus R100
          type: NanoNFCorpus_R100
        metrics:
          - type: map
            value: 0.3997
            name: Map
          - type: mrr@10
            value: 0.6745
            name: Mrr@10
          - type: ndcg@10
            value: 0.4463
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoNQ R100
          type: NanoNQ_R100
        metrics:
          - type: map
            value: 0.7508
            name: Map
          - type: mrr@10
            value: 0.7702
            name: Mrr@10
          - type: ndcg@10
            value: 0.802
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoFiQA2018 R100
          type: NanoFiQA2018_R100
        metrics:
          - type: map
            value: 0.577
            name: Map
          - type: mrr@10
            value: 0.7107
            name: Mrr@10
          - type: ndcg@10
            value: 0.6442
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoTouche2020 R100
          type: NanoTouche2020_R100
        metrics:
          - type: map
            value: 0.4763
            name: Map
          - type: mrr@10
            value: 0.7603
            name: Mrr@10
          - type: ndcg@10
            value: 0.5526
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoSciFact R100
          type: NanoSciFact_R100
        metrics:
          - type: map
            value: 0.708
            name: Map
          - type: mrr@10
            value: 0.7057
            name: Mrr@10
          - type: ndcg@10
            value: 0.7548
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoHotpotQA R100
          type: NanoHotpotQA_R100
        metrics:
          - type: map
            value: 0.9344
            name: Map
          - type: mrr@10
            value: 0.98
            name: Mrr@10
          - type: ndcg@10
            value: 0.9539
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoArguAna R100
          type: NanoArguAna_R100
        metrics:
          - type: map
            value: 0.6714
            name: Map
          - type: mrr@10
            value: 0.6842
            name: Mrr@10
          - type: ndcg@10
            value: 0.7594
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoFEVER R100
          type: NanoFEVER_R100
        metrics:
          - type: map
            value: 0.936
            name: Map
          - type: mrr@10
            value: 0.9733
            name: Mrr@10
          - type: ndcg@10
            value: 0.9526
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoDBPedia R100
          type: NanoDBPedia_R100
        metrics:
          - type: map
            value: 0.7008
            name: Map
          - type: mrr@10
            value: 0.8967
            name: Mrr@10
          - type: ndcg@10
            value: 0.7633
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoClimateFEVER R100
          type: NanoClimateFEVER_R100
        metrics:
          - type: map
            value: 0.5122
            name: Map
          - type: mrr@10
            value: 0.7655
            name: Mrr@10
          - type: ndcg@10
            value: 0.5958
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoSCIDOCS R100
          type: NanoSCIDOCS_R100
        metrics:
          - type: map
            value: 0.3336
            name: Map
          - type: mrr@10
            value: 0.5857
            name: Mrr@10
          - type: ndcg@10
            value: 0.3902
            name: Ndcg@10
      - task:
          type: cross-encoder-reranking
          name: Cross Encoder Reranking
        dataset:
          name: NanoQuoraRetrieval R100
          type: NanoQuoraRetrieval_R100
        metrics:
          - type: map
            value: 0.9563
            name: Map
          - type: mrr@10
            value: 0.98
            name: Mrr@10
          - type: ndcg@10
            value: 0.9724
            name: Ndcg@10
      - task:
          type: cross-encoder-nano-beir
          name: Cross Encoder Nano BEIR
        dataset:
          name: NanoBEIR R100 mean
          type: NanoBEIR_R100_mean
        metrics:
          - type: map
            value: 0.6613
            name: Map
          - type: mrr@10
            value: 0.7788
            name: Mrr@10
          - type: ndcg@10
            value: 0.7151
            name: Ndcg@10

ettin-reranker-400m-v1

This is a Cross Encoder model finetuned from jhu-clsp/ettin-encoder-400m on the cross-encoder/ettin-reranker-v1-data dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

See the release blogpost for details on the training recipe, evaluation results, and speed benchmarks against other public rerankers. The Evaluation section below also has the headline numbers.

Model Details

Model Description

Model Type: Cross Encoder
Base model: jhu-clsp/ettin-encoder-400m
Maximum Sequence Length: 7999 tokens
Number of Output Labels: 1 label
Supported Modality: Text
Training Dataset: cross-encoder/ettin-reranker-v1-data
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Full Model Architecture

CrossEncoder(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'ModernBertModel'})
  (1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'cls', 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': False, 'activation_function': 'torch.nn.modules.activation.GELU', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (3): LayerNorm({'dimension': 1024})
  (4): Dense({'in_features': 1024, 'out_features': 1, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'scores'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder(
    "cross-encoder/ettin-reranker-400m-v1",
    model_kwargs={"dtype": "bfloat16", "attn_implementation": "flash_attention_2"},  # Optional: pip install kernels
)

# Get scores for pairs of inputs
query = "Which planet is known as the Red Planet?"
passages = [
    "Venus is often called Earth's twin because of its similar size and proximity.",
    "Mars, known for its reddish appearance, is often referred to as the Red Planet.",
    "Jupiter, the largest planet in our solar system, has a prominent red spot.",
    "Saturn, famous for its rings, is sometimes mistaken for the Red Planet.",
]
scores = model.predict([(query, passage) for passage in passages])
print(scores)
# [ 3.6875 11.6875  4.75    9.375 ]

# Or rank passages by relevance to a single query
ranked = model.rank(query, passages)
print(ranked)
# [{'corpus_id': 1, 'score': np.float32(11.6875)}, ...]

Evaluation

MTEB(eng, v2) Retrieval

Each model in the ettin-reranker-v1 family was evaluated on the full MTEB(eng, v2) Retrieval benchmark (10 tasks, top-100 reranked) using MTEB's two-stage reranking flow, pairing each reranker with six embedding models that span the speed/quality spectrum. The dashed retriever-only line in each chart below is the headline number to beat. Anything below it means the reranker actively hurts the pipeline on average:

Full table of results (click to expand)

Mean NDCG@10 over the 6 embedder pairings, sorted by MTEB. The released ettin-reranker-v1 family is in bold, and the teacher mixedbread-ai/mxbai-rerank-large-v2 is underlined.

Reranker	Params	MTEB(eng, v2) Retrieval NDCG@10
`Qwen/Qwen3-Reranker-4B`^†	4.02B	0.6367
`mixedbread-ai/mxbai-rerank-large-v2`	1.54B	0.6115
`cross-encoder/ettin-reranker-1b-v1`	1.00B	0.6114
`cross-encoder/ettin-reranker-400m-v1`	401M	0.6091
`cross-encoder/ettin-reranker-150m-v1`	151M	0.5994
`Qwen/Qwen3-Reranker-0.6B`	596M	0.5940
`mixedbread-ai/mxbai-rerank-base-v2`	494M	0.5920
`cross-encoder/ettin-reranker-68m-v1`	68.6M	0.5915
`jinaai/jina-reranker-m0`	2.44B	0.5856
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	0.5843
`cross-encoder/ettin-reranker-32m-v1`	32.8M	0.5779
`ibm-granite/granite-embedding-reranker-english-r2`	150M	0.5656
`cross-encoder/ettin-reranker-17m-v1`	17.6M	0.5576
`BAAI/bge-reranker-v2-m3`	568M	0.5526
`zeroentropy/zerank-2-reranker`^†	4.02B	0.5300
`BAAI/bge-reranker-large`	560M	0.5098
`cross-encoder/ms-marco-MiniLM-L6-v2`	22.7M	0.5082
`cross-encoder/ms-marco-MiniLM-L12-v2`	33.4M	0.5066
`mixedbread-ai/mxbai-rerank-large-v1`	435M	0.5063
`cross-encoder/ms-marco-MiniLM-L4-v2`	19.2M	0.4979
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70.8M	0.4968
`BAAI/bge-reranker-base`	278M	0.4890
`mixedbread-ai/mxbai-rerank-base-v1`	184M	0.4865

^† Capped to max_seq_length=8192 (the 4B Qwen3-based rerankers don't fit on a single H100 80GB at native context). Native-context evaluation is likely higher.

See the release blogpost for the full analysis and per-model commentary.

Speed

All six released models were benchmarked against thirteen public rerankers on three hardware tiers, using sentence-transformers/natural-questions at max_length=512 with each model's best supported attention implementation. The full sweep over fp32+SDPA, bf16+SDPA, padded bf16+FA2, and unpadded bf16+FA2 (showing why the ettin-reranker-v1 family is faster than other ModernBERT-based rerankers) is in the release blogpost. This table shows the throughput in pairs per second on a NVIDIA H100 80GB, all in bfloat16:

Model	Params	Attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	FA2	7517
`cross-encoder/ettin-reranker-32m-v1`	32M	FA2	6602
`cross-encoder/ettin-reranker-68m-v1`	68M	FA2	4913
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	FA2	4029
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	FA2	3817
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	FA2	3311
`cross-encoder/ettin-reranker-150m-v1`	150M	FA2	3237
`BAAI/bge-reranker-base`	278M	FA2	2858
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	2636
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	1953
`cross-encoder/ettin-reranker-400m-v1`	400M	FA2	1738
`BAAI/bge-reranker-large`	560M	FA2	1659
`BAAI/bge-reranker-v2-m3`	568M	FA2	1569
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	FA2	1418
`ibm-granite/granite-embedding-reranker-english-r2`	150M	FA2	1404
`cross-encoder/ettin-reranker-1b-v1`	1B	FA2	928
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	867
`mixedbread-ai/mxbai-rerank-base-v2`	494M	FA2	809
`mixedbread-ai/mxbai-rerank-large-v2`	1.5B	FA2	387

Same benchmark on a consumer GPU (RTX 3090, 24 GB)

Model	Params	Best attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	FA2	9008
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	FA2	5071
`cross-encoder/ettin-reranker-32m-v1`	32M	FA2	4497
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	FA2	4234
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	FA2	2847
`cross-encoder/ettin-reranker-68m-v1`	68M	FA2	1916
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	1677
`BAAI/bge-reranker-base`	278M	FA2	1329
`cross-encoder/ettin-reranker-150m-v1`	150M	FA2	982
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	772
`ibm-granite/granite-embedding-reranker-english-r2`	150M	FA2	598
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	FA2	586
`BAAI/bge-reranker-large`	560M	FA2	448
`BAAI/bge-reranker-v2-m3`	568M	FA2	436
`cross-encoder/ettin-reranker-400m-v1`	400M	FA2	429
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	266
`mixedbread-ai/mxbai-rerank-base-v2`	494M	FA2	221
`cross-encoder/ettin-reranker-1b-v1`	1B	FA2	189
`mixedbread-ai/mxbai-rerank-large-v2`	1.5B	FA2	69

Same benchmark on CPU (Intel Core i7-13700K)

Model	Params	Best attn	pairs / second
`cross-encoder/ettin-reranker-17m-v1`	17M	SDPA	267.4
`cross-encoder/ms-marco-MiniLM-L4-v2`	19M	SDPA	206.2
`cross-encoder/ms-marco-MiniLM-L6-v2`	22M	SDPA	143.9
`cross-encoder/ettin-reranker-32m-v1`	32M	SDPA	92.5
`cross-encoder/ms-marco-MiniLM-L12-v2`	33M	SDPA	75.9
`mixedbread-ai/mxbai-rerank-xsmall-v1`	70M	eager	38.9
`cross-encoder/ettin-reranker-68m-v1`	68M	SDPA	31.2
`BAAI/bge-reranker-base`	278M	SDPA	19.2
`Alibaba-NLP/gte-reranker-modernbert-base`	150M	SDPA	14.7
`ibm-granite/granite-embedding-reranker-english-r2`	150M	SDPA	14.5
`cross-encoder/ettin-reranker-150m-v1`	150M	SDPA	14.0
`mixedbread-ai/mxbai-rerank-base-v1`	184M	eager	13.4
`BAAI/bge-reranker-large`	560M	SDPA	6.2
`BAAI/bge-reranker-v2-m3`	568M	SDPA	6.0
`cross-encoder/ettin-reranker-400m-v1`	400M	SDPA	5.2
`mixedbread-ai/mxbai-rerank-large-v1`	435M	eager	4.3
`mixedbread-ai/mxbai-rerank-base-v2`	494M	SDPA	3.5
`cross-encoder/ettin-reranker-1b-v1`	1B	SDPA	2.1

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO_R100, NanoNFCorpus_R100, NanoNQ_R100, NanoFiQA2018_R100, NanoTouche2020_R100, NanoSciFact_R100, NanoHotpotQA_R100, NanoArguAna_R100, NanoFEVER_R100, NanoDBPedia_R100, NanoClimateFEVER_R100, NanoSCIDOCS_R100 and NanoQuoraRetrieval_R100

Evaluated with CrossEncoderRerankingEvaluator with these parameters:

{
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	NanoMSMARCO_R100	NanoNFCorpus_R100	NanoNQ_R100	NanoFiQA2018_R100	NanoTouche2020_R100	NanoSciFact_R100	NanoHotpotQA_R100	NanoArguAna_R100	NanoFEVER_R100	NanoDBPedia_R100	NanoClimateFEVER_R100	NanoSCIDOCS_R100	NanoQuoraRetrieval_R100
map	0.6410 (+0.1514)	0.3997 (+0.1387)	0.7508 (+0.3312)	0.5770 (+0.2119)	0.4763 (-0.0736)	0.7080 (+0.0383)	0.9344 (+0.1661)	0.6714 (+0.2607)	0.9360 (+0.1641)	0.7008 (+0.1890)	0.5122 (+0.2719)	0.3336 (+0.0593)	0.9563 (+0.1254)
mrr@10	0.6376 (+0.1601)	0.6745 (+0.1747)	0.7702 (+0.3435)	0.7107 (+0.2199)	0.7603 (-0.1468)	0.7057 (+0.0276)	0.9800 (+0.0571)	0.6842 (+0.2911)	0.9733 (+0.1933)	0.8967 (+0.0960)	0.7655 (+0.3616)	0.5857 (+0.0262)	0.9800 (+0.1118)
ndcg@10	0.7091 (+0.1687)	0.4463 (+0.1212)	0.8020 (+0.3014)	0.6442 (+0.2068)	0.5526 (-0.1412)	0.7548 (+0.0449)	0.9539 (+0.1262)	0.7594 (+0.2705)	0.9526 (+0.1432)	0.7633 (+0.1489)	0.5958 (+0.2781)	0.3902 (+0.0551)	0.9724 (+0.1037)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_R100_mean

Evaluated with CrossEncoderNanoBEIREvaluator with these parameters:

{
    "dataset_names": [
        "msmarco",
        "nfcorpus",
        "nq",
        "fiqa2018",
        "touche2020",
        "scifact",
        "hotpotqa",
        "arguana",
        "fever",
        "dbpedia",
        "climatefever",
        "scidocs",
        "quoraretrieval"
    ],
    "dataset_id": "sentence-transformers/NanoBEIR-en",
    "rerank_k": 100,
    "at_k": 10,
    "always_rerank_positives": true
}

Metric	Value
map	0.6613 (+0.1565)
mrr@10	0.7788 (+0.1474)
ndcg@10	0.7151 (+0.1406)

The release blogpost quotes a slightly higher NanoBEIR mean NDCG@10 of 0.7193 for this model, computed in fp32 rather than the bfloat16 used by the training-time evaluation above. Both numbers are valid.

Training Details

Training Dataset

ettin-reranker-v1-data

Dataset: cross-encoder/ettin-reranker-v1-data
Size: 143,393,475 training samples
Columns: query, document, and label

Approximate statistics based on the first 1000 samples:

	query	document	label
type	string	string	float
details	min: 26 characters mean: 55.52 characters max: 249 characters	min: 63 characters mean: 659.91 characters max: 3975 characters	min: -2.94 mean: 8.51 max: 13.88

Samples:

query	document	label
`Help me with my Reborn performance`	I was reading the comment section for Dotacinema's world of dota video, and a bunch of people were complaining how there were a lot of bugs and some talked about PERFORMANCE ISSUES. But there were also people saying that reborn has actually IMPROVED their gameplay? I am one of those people who is running into performance issues and would desperately like to know how some are getting BETTER performance while others like me are getting worse. I'm not complaining about bugs, I'm complaing about framerate, I use to get 60 fps solid in source 1 but I now have 40 or at worst 30 fps in source 2. I have an i3 processor/gtx560ti/16gb RAM i dont think it's a potato pc, so I dont know what's happening, I cleaned my computer recently so dust isnt affecting anything in anyway. So if you gained or had IMPROVED performance in source 2 please list the settings you are enabling, so I can see where I am at fault. (v sync is off btw) TLDR: Have bad performance now from source 2, if you have good p...	`9.5`
`Really wanna try out the game and expansion, ~$60 is hefty. Likelihood of sales?`	`As per title, steam sells the game and its expansions for $60 total. Heavy price to drop. Are there sales on any other website? This game looks fantastic to immerse in otherwise and I'm pleased that this subreddit has at least some attention to help out new folks!`	`9.25`
`Your Avatar. [MGSV Spoilers]`	`Was anyone else suprised he actually replaces the snake model in some cutscenes. I've only tried the first Quiet cutscenes, i was just amazed I haven't seen anybody else say this yet. Sorry if repost.`	`5.25`

Loss: MSELoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity"
}

Evaluation Dataset

ettin-reranker-v1-data

Dataset: cross-encoder/ettin-reranker-v1-data
Size: 5,000 evaluation samples
Columns: query, document, and label

Approximate statistics based on the first 1000 samples:

	query	document	label
type	string	string	float
details	min: 14 characters mean: 52.62 characters max: 168 characters	min: 11 characters mean: 50.12 characters max: 184 characters	min: 4.44 mean: 13.49 max: 18.62

Samples:

query	document	label
`Why do we need binomial distribution?`	`Why is the binomial distribution important?`	`11.375`
`I already have Windows 10, can I delete Windows.old?`	`After resetting windows 10, can I safely delete the "old windows" folder?`	`10.875`
`How can guys last longer during sex?`	`How do men last longer in bed?`	`10.8125`

Loss: MSELoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 4
num_train_epochs: 1
learning_rate: 7e-06
warmup_steps: 0.03
bf16: True
per_device_eval_batch_size: 4
load_best_model_at_end: True
seed: 12

All Hyperparameters

Click to expand

per_device_train_batch_size: 4
num_train_epochs: 1
max_steps: -1
learning_rate: 7e-06
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.03
optim: adamw_torch
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 1
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: None
trackio_bucket_id: None
trackio_static_space_id: None
per_device_eval_batch_size: 4
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 12
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_static_graph: None
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_R100_ndcg@10	NanoNFCorpus_R100_ndcg@10	NanoNQ_R100_ndcg@10	NanoFiQA2018_R100_ndcg@10	NanoTouche2020_R100_ndcg@10	NanoSciFact_R100_ndcg@10	NanoHotpotQA_R100_ndcg@10	NanoArguAna_R100_ndcg@10	NanoFEVER_R100_ndcg@10	NanoDBPedia_R100_ndcg@10	NanoClimateFEVER_R100_ndcg@10	NanoSCIDOCS_R100_ndcg@10	NanoQuoraRetrieval_R100_ndcg@10	NanoBEIR_R100_mean_ndcg@10
-1	-1	-	-	0.0541 (-0.4864)	0.2581 (-0.0669)	0.0549 (-0.4457)	0.0452 (-0.3922)	0.1699 (-0.5239)	0.0402 (-0.6697)	0.0445 (-0.7832)	0.0982 (-0.3906)	0.0648 (-0.7446)	0.1844 (-0.4300)	0.0973 (-0.2205)	0.1106 (-0.2245)	0.0172 (-0.8515)	0.0953 (-0.4792)
0.0000	1	84.3213	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0250	14004	3.9807	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0500	28007	-	0.5958	0.6974 (+0.1570)	0.4581 (+0.1331)	0.7909 (+0.2903)	0.6095 (+0.1721)	0.5916 (-0.1022)	0.7803 (+0.0704)	0.9479 (+0.1202)	0.6904 (+0.2015)	0.9391 (+0.1297)	0.7511 (+0.1367)	0.5744 (+0.2566)	0.4189 (+0.0838)	0.9522 (+0.0835)	0.7078 (+0.1333)
0.0500	28008	0.9853	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.0750	42012	0.8441	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1000	56014	-	0.4915	0.6925 (+0.1520)	0.4611 (+0.1361)	0.7899 (+0.2893)	0.6001 (+0.1627)	0.5888 (-0.1051)	0.7745 (+0.0646)	0.9489 (+0.1211)	0.7147 (+0.2258)	0.9470 (+0.1376)	0.7419 (+0.1275)	0.5885 (+0.2708)	0.4170 (+0.0819)	0.9641 (+0.0954)	0.7099 (+0.1354)
0.1000	56016	0.7794	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1250	70020	0.7366	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1500	84021	-	0.4492	0.7150 (+0.1746)	0.4731 (+0.1481)	0.7949 (+0.2943)	0.6255 (+0.1880)	0.5677 (-0.1262)	0.7534 (+0.0435)	0.9508 (+0.1231)	0.7306 (+0.2418)	0.9422 (+0.1328)	0.7535 (+0.1391)	0.5913 (+0.2736)	0.4193 (+0.0842)	0.9724 (+0.1037)	0.7146 (+0.1401)
0.1500	84024	0.7058	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.1750	98028	0.6819	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2000	112028	-	0.4182	0.6949 (+0.1544)	0.4693 (+0.1443)	0.8009 (+0.3002)	0.6365 (+0.1991)	0.5770 (-0.1168)	0.7748 (+0.0649)	0.9521 (+0.1244)	0.7251 (+0.2363)	0.9392 (+0.1298)	0.7536 (+0.1392)	0.5927 (+0.2750)	0.3958 (+0.0607)	0.9704 (+0.1017)	0.7140 (+0.1395)
0.2000	112032	0.6601	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2250	126036	0.6410	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2500	140035	-	0.3936	0.6878 (+0.1474)	0.4534 (+0.1283)	0.8011 (+0.3005)	0.6320 (+0.1946)	0.5789 (-0.1150)	0.7582 (+0.0483)	0.9475 (+0.1198)	0.7510 (+0.2621)	0.9503 (+0.1409)	0.7518 (+0.1375)	0.5899 (+0.2722)	0.3871 (+0.0520)	0.9714 (+0.1027)	0.7123 (+0.1378)
0.2500	140040	0.6258	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.2750	154044	0.6129	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3000	168042	-	0.3803	0.6929 (+0.1524)	0.4723 (+0.1472)	0.7983 (+0.2977)	0.6412 (+0.2038)	0.5742 (-0.1196)	0.7607 (+0.0508)	0.9601 (+0.1324)	0.7635 (+0.2747)	0.9575 (+0.1481)	0.7573 (+0.1430)	0.5929 (+0.2751)	0.3913 (+0.0562)	0.9823 (+0.1136)	0.7188 (+0.1443)
0.3000	168048	0.6019	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3250	182052	0.5909	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3500	196049	-	0.3724	0.7118 (+0.1713)	0.4643 (+0.1392)	0.8043 (+0.3036)	0.6342 (+0.1968)	0.5703 (-0.1235)	0.7604 (+0.0505)	0.9521 (+0.1244)	0.7475 (+0.2587)	0.9404 (+0.1309)	0.7587 (+0.1443)	0.6047 (+0.2870)	0.4021 (+0.0670)	0.9706 (+0.1019)	0.7170 (+0.1425)
0.3500	196056	0.5813	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.3750	210060	0.5721	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4000	224056	-	0.3645	0.7166 (+0.1761)	0.4591 (+0.1340)	0.8017 (+0.3011)	0.6256 (+0.1881)	0.5672 (-0.1266)	0.7713 (+0.0614)	0.9560 (+0.1283)	0.7473 (+0.2585)	0.9411 (+0.1317)	0.7619 (+0.1476)	0.6011 (+0.2834)	0.3908 (+0.0557)	0.9687 (+0.1000)	0.7160 (+0.1415)
0.4000	224064	0.5639	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4250	238068	0.5558	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4500	252063	-	0.3778	0.7137 (+0.1732)	0.4494 (+0.1244)	0.8045 (+0.3039)	0.6268 (+0.1894)	0.5671 (-0.1267)	0.7596 (+0.0497)	0.9598 (+0.1321)	0.7545 (+0.2657)	0.9369 (+0.1275)	0.7542 (+0.1399)	0.6043 (+0.2866)	0.3880 (+0.0529)	0.9742 (+0.1055)	0.7149 (+0.1403)
0.4500	252072	0.5490	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.4750	266076	0.5427	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5	280070	-	0.3364	0.7034 (+0.1630)	0.4615 (+0.1365)	0.8014 (+0.3008)	0.6315 (+0.1941)	0.5702 (-0.1236)	0.7643 (+0.0543)	0.9544 (+0.1267)	0.7759 (+0.2871)	0.9489 (+0.1395)	0.7641 (+0.1497)	0.6036 (+0.2858)	0.3959 (+0.0608)	0.9758 (+0.1071)	0.7193 (+0.1448)
0.5000	280080	0.5369	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5250	294084	0.5315	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5500	308077	-	0.3309	0.7139 (+0.1734)	0.4596 (+0.1345)	0.8014 (+0.3008)	0.6236 (+0.1862)	0.5687 (-0.1251)	0.7710 (+0.0611)	0.9529 (+0.1252)	0.7689 (+0.2801)	0.9406 (+0.1312)	0.7629 (+0.1486)	0.5962 (+0.2785)	0.3887 (+0.0536)	0.9726 (+0.1039)	0.7170 (+0.1424)
0.5500	308088	0.5258	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.5750	322092	0.5198	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6000	336084	-	0.3317	0.7060 (+0.1656)	0.4671 (+0.1420)	0.7952 (+0.2945)	0.6323 (+0.1949)	0.5588 (-0.1350)	0.7683 (+0.0584)	0.9590 (+0.1313)	0.7583 (+0.2695)	0.9569 (+0.1475)	0.7639 (+0.1496)	0.6008 (+0.2831)	0.3731 (+0.0380)	0.9717 (+0.1030)	0.7163 (+0.1417)
0.6000	336096	0.5161	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6250	350100	0.5112	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6500	364091	-	0.3306	0.7154 (+0.1750)	0.4536 (+0.1286)	0.8023 (+0.3016)	0.6299 (+0.1925)	0.5590 (-0.1349)	0.7597 (+0.0498)	0.9513 (+0.1236)	0.7569 (+0.2681)	0.9546 (+0.1452)	0.7666 (+0.1522)	0.5898 (+0.2721)	0.3970 (+0.0619)	0.9731 (+0.1045)	0.7161 (+0.1416)
0.6500	364104	0.5082	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.6750	378108	0.5046	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7000	392098	-	0.3271	0.7114 (+0.1710)	0.4447 (+0.1197)	0.8015 (+0.3008)	0.6375 (+0.2001)	0.5497 (-0.1441)	0.7628 (+0.0528)	0.9545 (+0.1268)	0.7585 (+0.2697)	0.9514 (+0.1420)	0.7708 (+0.1565)	0.5986 (+0.2809)	0.3943 (+0.0592)	0.9773 (+0.1086)	0.7164 (+0.1418)
0.7000	392112	0.5004	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7250	406116	0.4971	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7500	420105	-	0.3223	0.7146 (+0.1741)	0.4388 (+0.1137)	0.7964 (+0.2958)	0.6360 (+0.1986)	0.5536 (-0.1402)	0.7520 (+0.0421)	0.9529 (+0.1252)	0.7555 (+0.2667)	0.9492 (+0.1398)	0.7671 (+0.1528)	0.5988 (+0.2811)	0.3786 (+0.0435)	0.9765 (+0.1078)	0.7131 (+0.1385)
0.7500	420120	0.4941	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.7750	434124	0.4912	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8000	448112	-	0.3205	0.7126 (+0.1722)	0.4478 (+0.1227)	0.7990 (+0.2984)	0.6257 (+0.1882)	0.5508 (-0.1430)	0.7620 (+0.0521)	0.9522 (+0.1245)	0.7619 (+0.2731)	0.9527 (+0.1433)	0.7639 (+0.1495)	0.5966 (+0.2789)	0.3869 (+0.0518)	0.9715 (+0.1028)	0.7141 (+0.1396)
0.8000	448128	0.4892	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8250	462132	0.4847	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8500	476119	-	0.3174	0.7057 (+0.1652)	0.4441 (+0.1191)	0.7995 (+0.2989)	0.6480 (+0.2106)	0.5544 (-0.1394)	0.7616 (+0.0517)	0.9541 (+0.1264)	0.7547 (+0.2659)	0.9529 (+0.1435)	0.7657 (+0.1514)	0.5987 (+0.2810)	0.3868 (+0.0517)	0.9712 (+0.1025)	0.7152 (+0.1407)
0.8500	476136	0.4825	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.8750	490140	0.4807	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9000	504126	-	0.3140	0.7183 (+0.1779)	0.4453 (+0.1202)	0.7989 (+0.2982)	0.6457 (+0.2083)	0.5613 (-0.1325)	0.7582 (+0.0483)	0.9535 (+0.1258)	0.7603 (+0.2715)	0.9529 (+0.1435)	0.7634 (+0.1490)	0.5984 (+0.2807)	0.3892 (+0.0541)	0.9715 (+0.1028)	0.7167 (+0.1421)
0.9000	504144	0.4789	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9250	518148	0.4771	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9500	532133	-	0.3137	0.7073 (+0.1669)	0.4538 (+0.1288)	0.7986 (+0.2980)	0.6418 (+0.2044)	0.5553 (-0.1385)	0.7589 (+0.0490)	0.9535 (+0.1258)	0.7645 (+0.2756)	0.9492 (+0.1398)	0.7625 (+0.1482)	0.6052 (+0.2875)	0.3955 (+0.0604)	0.9715 (+0.1029)	0.7168 (+0.1422)
0.9501	532152	0.4758	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
0.9751	546156	0.4747	-	-	-	-	-	-	-	-	-	-	-	-	-	-	-
1.0	560130	-	0.3110	0.7091 (+0.1687)	0.4463 (+0.1212)	0.8020 (+0.3014)	0.6442 (+0.2068)	0.5526 (-0.1412)	0.7548 (+0.0449)	0.9539 (+0.1262)	0.7594 (+0.2705)	0.9526 (+0.1432)	0.7633 (+0.1489)	0.5958 (+0.2781)	0.3902 (+0.0551)	0.9724 (+0.1037)	0.7151 (+0.1406)

The bold row denotes the saved checkpoint.

Training Time

Training: 20.6 hours
Evaluation: 25.8 minutes
Total: 21.0 hours

Framework Versions

Python: 3.11.15
Sentence Transformers: 5.4.1
Transformers: 5.7.0
PyTorch: 2.7.0+cu126
Accelerate: 1.13.0
Datasets: 4.8.5
Tokenizers: 0.22.2

Citation

BibTeX

Ettin Reranker Blogpost

@misc{aarsen2026ettin-reranker,
    title = "Introducing the Ettin Reranker Family",
    author = "Aarsen, Tom",
    year = "2026",
    publisher = "Hugging Face",
    url = "https://huggingface.co/blog/ettin-reranker",
}

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}