CrossEncoder based on jhu-clsp/ettin-encoder-150m
This is a Cross Encoder model finetuned from jhu-clsp/ettin-encoder-150m on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: jhu-clsp/ettin-encoder-150m
- Maximum Sequence Length: 7999 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
model = CrossEncoder("bansalaman18/reranker-msmarco-v1.1-ettin-encoder-150m-bce")
pairs = [
['how to put word count on word', 'To insert a word count into a Word 2013 document, place the cursor where you would like the word count to appear (say in the Header or Footer) and then: 1 click the Insert tab. 2 click the Quick Parts icon (towards the right hand end of the toolbar). 3 on the drop down that appears, select Field...'],
['what is the difference between discipleship and evangelism', 'Discipleship, on the other hand, meant helping someone who was already a believer walk out the life of faith. The word “discipleship” brought to my mind a small group Bible study, a conversation across the table with another woman, or an accountability group. And I knew which one I preferred. As a result, the discipleship I offered others contained a lot of good information but lacked the transforming power that can only come from the gospel. (I was also, simply, a coward.). I am beginning to see that evangelism and discipleship are not all that different.'],
['what metal is a trophy made from', 'The trophy stands 36.5 centimetres (14.4 inches) tall and is made of 5 kg (11 lb) of 18 carat (75%) gold with a base (13 centimetres [5.1 inches] in diameter) containing t … wo layers of malachite. Making the world better, one answer at a time. Trophies can be made out of anything you want. however, aluminum is a very reliable and trustworthy metal and it.......... oh crap.......... i have to do a poo...'],
['how do you define what a cult is?', 'The term cult has been misused. The word cult comes from the French cult which is from the Latin word cultus (care/adoration) and Latin Colere (to cultivate.) So, we can plant seeds of good or bad. You can have political cults such as sit ins during the Vietnam War. A good cult could be a religious one, yet some Christians will consider Jehovah Witness a cult and have labeled them as preying on the weak. When someone labels such a thing it is usually because of the lack of understanding. Good cults are usually a small group of people that can have a cult in most anything.'],
['where is silchar', 'Silchar (/ˈsɪlˌʧə/ or /ˈʃɪlˌʧə/) (Bengali: শিলচর Shilchor) shilchôr is the headquarters Of cachar district in the state Of assam In. India it is 343 (kilometres 213) mi south east Of. Guwahati it is the-second largest city of the state in terms of population and municipal. area 1 The Bhubaneshwar temple is about 50 km from Silchar and is on the top the Bhuvan hill. 2 This is a place of pilgrimage and during the festival of Shivaratri, thousand of Shivayats march towards the hilltop to worship Lord Shiva.'],
]
scores = model.predict(pairs)
print(scores.shape)
ranks = model.rank(
'how to put word count on word',
[
'To insert a word count into a Word 2013 document, place the cursor where you would like the word count to appear (say in the Header or Footer) and then: 1 click the Insert tab. 2 click the Quick Parts icon (towards the right hand end of the toolbar). 3 on the drop down that appears, select Field...',
'Discipleship, on the other hand, meant helping someone who was already a believer walk out the life of faith. The word “discipleship” brought to my mind a small group Bible study, a conversation across the table with another woman, or an accountability group. And I knew which one I preferred. As a result, the discipleship I offered others contained a lot of good information but lacked the transforming power that can only come from the gospel. (I was also, simply, a coward.). I am beginning to see that evangelism and discipleship are not all that different.',
'The trophy stands 36.5 centimetres (14.4 inches) tall and is made of 5 kg (11 lb) of 18 carat (75%) gold with a base (13 centimetres [5.1 inches] in diameter) containing t … wo layers of malachite. Making the world better, one answer at a time. Trophies can be made out of anything you want. however, aluminum is a very reliable and trustworthy metal and it.......... oh crap.......... i have to do a poo...',
'The term cult has been misused. The word cult comes from the French cult which is from the Latin word cultus (care/adoration) and Latin Colere (to cultivate.) So, we can plant seeds of good or bad. You can have political cults such as sit ins during the Vietnam War. A good cult could be a religious one, yet some Christians will consider Jehovah Witness a cult and have labeled them as preying on the weak. When someone labels such a thing it is usually because of the lack of understanding. Good cults are usually a small group of people that can have a cult in most anything.',
'Silchar (/ˈsɪlˌʧə/ or /ˈʃɪlˌʧə/) (Bengali: শিলচর Shilchor) shilchôr is the headquarters Of cachar district in the state Of assam In. India it is 343 (kilometres 213) mi south east Of. Guwahati it is the-second largest city of the state in terms of population and municipal. area 1 The Bhubaneshwar temple is about 50 km from Silchar and is on the top the Bhuvan hill. 2 This is a place of pilgrimage and during the festival of Shivaratri, thousand of Shivayats march towards the hilltop to worship Lord Shiva.',
]
)
Evaluation
Metrics
Cross Encoder Reranking
| Metric |
NanoMSMARCO_R100 |
NanoNFCorpus_R100 |
NanoNQ_R100 |
| map |
0.4799 (-0.0097) |
0.3320 (+0.0710) |
0.5099 (+0.0903) |
| mrr@10 |
0.4677 (-0.0098) |
0.5740 (+0.0742) |
0.5267 (+0.1000) |
| ndcg@10 |
0.5377 (-0.0028) |
0.3650 (+0.0399) |
0.5600 (+0.0594) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator with these parameters:{
"dataset_names": [
"msmarco",
"nfcorpus",
"nq"
],
"rerank_k": 100,
"at_k": 10,
"always_rerank_positives": true
}
| Metric |
Value |
| map |
0.4406 (+0.0505) |
| mrr@10 |
0.5228 (+0.0548) |
| ndcg@10 |
0.4875 (+0.0322) |
Training Details
Training Dataset
ms_marco
Evaluation Dataset
ms_marco
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
seed: 12
bf16: True
remove_unused_columns: False
load_best_model_at_end: True
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 128
per_device_eval_batch_size: 128
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: False
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
tp_size: 0
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
NanoMSMARCO_R100_ndcg@10 |
NanoNFCorpus_R100_ndcg@10 |
NanoNQ_R100_ndcg@10 |
NanoBEIR_R100_mean_ndcg@10 |
| -1 |
-1 |
- |
- |
0.0509 (-0.4895) |
0.2451 (-0.0799) |
0.0160 (-0.4847) |
0.1040 (-0.3514) |
| 0.0002 |
1 |
0.8029 |
- |
- |
- |
- |
- |
| 0.0196 |
100 |
0.5268 |
0.3891 |
0.0350 (-0.5054) |
0.2707 (-0.0544) |
0.0181 (-0.4825) |
0.1079 (-0.3474) |
| 0.0391 |
200 |
0.3958 |
0.3871 |
0.0448 (-0.4956) |
0.2859 (-0.0392) |
0.0592 (-0.4415) |
0.1299 (-0.3254) |
| 0.0587 |
300 |
0.3981 |
0.3834 |
0.0761 (-0.4643) |
0.2920 (-0.0330) |
0.0998 (-0.4009) |
0.1560 (-0.2994) |
| 0.0782 |
400 |
0.3893 |
0.3872 |
0.1118 (-0.4286) |
0.2566 (-0.0684) |
0.1088 (-0.3918) |
0.1591 (-0.2963) |
| 0.0978 |
500 |
0.3926 |
0.3748 |
0.2759 (-0.2645) |
0.2855 (-0.0395) |
0.2312 (-0.2695) |
0.2642 (-0.1912) |
| 0.1173 |
600 |
0.3785 |
0.3703 |
0.4123 (-0.1281) |
0.3161 (-0.0090) |
0.3703 (-0.1304) |
0.3662 (-0.0892) |
| 0.1369 |
700 |
0.3705 |
0.3575 |
0.4492 (-0.0912) |
0.3456 (+0.0205) |
0.4972 (-0.0035) |
0.4307 (-0.0247) |
| 0.1565 |
800 |
0.3607 |
0.3624 |
0.4847 (-0.0558) |
0.3295 (+0.0045) |
0.5240 (+0.0233) |
0.4460 (-0.0093) |
| 0.1760 |
900 |
0.362 |
0.3566 |
0.5084 (-0.0320) |
0.3516 (+0.0266) |
0.4776 (-0.0230) |
0.4459 (-0.0095) |
| 0.1956 |
1000 |
0.3655 |
0.3644 |
0.5377 (-0.0028) |
0.3650 (+0.0399) |
0.5600 (+0.0594) |
0.4875 (+0.0322) |
| 0.2151 |
1100 |
0.3564 |
0.3589 |
0.4870 (-0.0535) |
0.3674 (+0.0424) |
0.5042 (+0.0035) |
0.4529 (-0.0025) |
| 0.2347 |
1200 |
0.3606 |
0.3544 |
0.5591 (+0.0186) |
0.3614 (+0.0364) |
0.4803 (-0.0203) |
0.4669 (+0.0116) |
| 0.2543 |
1300 |
0.3639 |
0.3584 |
0.4513 (-0.0891) |
0.3578 (+0.0327) |
0.4583 (-0.0424) |
0.4224 (-0.0329) |
| 0.2738 |
1400 |
0.3628 |
0.3519 |
0.5510 (+0.0106) |
0.3643 (+0.0392) |
0.5178 (+0.0172) |
0.4777 (+0.0223) |
| 0.2934 |
1500 |
0.3586 |
0.3475 |
0.5499 (+0.0095) |
0.3536 (+0.0285) |
0.4808 (-0.0199) |
0.4614 (+0.0060) |
| 0.3129 |
1600 |
0.3549 |
0.3536 |
0.5499 (+0.0094) |
0.3869 (+0.0619) |
0.4560 (-0.0447) |
0.4643 (+0.0089) |
| 0.3325 |
1700 |
0.3529 |
0.3462 |
0.5336 (-0.0068) |
0.3740 (+0.0490) |
0.5136 (+0.0129) |
0.4737 (+0.0184) |
| 0.3520 |
1800 |
0.3498 |
0.3463 |
0.5225 (-0.0179) |
0.3607 (+0.0356) |
0.4553 (-0.0453) |
0.4462 (-0.0092) |
| 0.3716 |
1900 |
0.3492 |
0.3475 |
0.5295 (-0.0109) |
0.3665 (+0.0415) |
0.5074 (+0.0067) |
0.4678 (+0.0124) |
| 0.3912 |
2000 |
0.3475 |
0.3472 |
0.5382 (-0.0022) |
0.3508 (+0.0257) |
0.5278 (+0.0272) |
0.4723 (+0.0169) |
| 0.4107 |
2100 |
0.3557 |
0.3439 |
0.5424 (+0.0020) |
0.3714 (+0.0464) |
0.5170 (+0.0164) |
0.4769 (+0.0216) |
| 0.4303 |
2200 |
0.3523 |
0.3477 |
0.5458 (+0.0054) |
0.3640 (+0.0390) |
0.5412 (+0.0406) |
0.4837 (+0.0283) |
| 0.4498 |
2300 |
0.3363 |
0.3403 |
0.5507 (+0.0103) |
0.3371 (+0.0121) |
0.5121 (+0.0114) |
0.4666 (+0.0113) |
| 0.4694 |
2400 |
0.3604 |
0.3495 |
0.5734 (+0.0329) |
0.3336 (+0.0085) |
0.4734 (-0.0273) |
0.4601 (+0.0047) |
| 0.4889 |
2500 |
0.3472 |
0.3422 |
0.5580 (+0.0175) |
0.3430 (+0.0180) |
0.5442 (+0.0435) |
0.4817 (+0.0263) |
| 0.5085 |
2600 |
0.3495 |
0.3442 |
0.5714 (+0.0310) |
0.3248 (-0.0003) |
0.5574 (+0.0567) |
0.4845 (+0.0292) |
| 0.5281 |
2700 |
0.3311 |
0.3430 |
0.5098 (-0.0306) |
0.3184 (-0.0066) |
0.5240 (+0.0233) |
0.4507 (-0.0046) |
| 0.5476 |
2800 |
0.3433 |
0.3482 |
0.5154 (-0.0251) |
0.3338 (+0.0088) |
0.5135 (+0.0128) |
0.4542 (-0.0011) |
| 0.5672 |
2900 |
0.3457 |
0.3425 |
0.5300 (-0.0105) |
0.3211 (-0.0039) |
0.5569 (+0.0562) |
0.4693 (+0.0139) |
| 0.5867 |
3000 |
0.3378 |
0.3458 |
0.5244 (-0.0160) |
0.2984 (-0.0266) |
0.4908 (-0.0098) |
0.4379 (-0.0175) |
| 0.6063 |
3100 |
0.3462 |
0.3391 |
0.5261 (-0.0144) |
0.3283 (+0.0033) |
0.5074 (+0.0067) |
0.4539 (-0.0015) |
| 0.6259 |
3200 |
0.3495 |
0.3418 |
0.5671 (+0.0267) |
0.3130 (-0.0120) |
0.5373 (+0.0367) |
0.4725 (+0.0171) |
| 0.6454 |
3300 |
0.3464 |
0.3408 |
0.5366 (-0.0038) |
0.3190 (-0.0061) |
0.5256 (+0.0249) |
0.4604 (+0.0050) |
| 0.6650 |
3400 |
0.3381 |
0.3390 |
0.5451 (+0.0046) |
0.3332 (+0.0082) |
0.5269 (+0.0263) |
0.4684 (+0.0130) |
| 0.6845 |
3500 |
0.347 |
0.3365 |
0.5331 (-0.0073) |
0.3128 (-0.0122) |
0.5392 (+0.0385) |
0.4617 (+0.0063) |
| 0.7041 |
3600 |
0.3456 |
0.3398 |
0.5196 (-0.0208) |
0.3130 (-0.0120) |
0.5223 (+0.0216) |
0.4517 (-0.0037) |
| 0.7236 |
3700 |
0.3367 |
0.3405 |
0.5416 (+0.0012) |
0.3041 (-0.0209) |
0.5176 (+0.0170) |
0.4544 (-0.0009) |
| 0.7432 |
3800 |
0.3362 |
0.3401 |
0.5406 (+0.0002) |
0.3046 (-0.0204) |
0.5160 (+0.0153) |
0.4538 (-0.0016) |
| 0.7628 |
3900 |
0.3483 |
0.3396 |
0.5255 (-0.0149) |
0.2882 (-0.0368) |
0.5428 (+0.0421) |
0.4522 (-0.0032) |
| 0.7823 |
4000 |
0.3471 |
0.3403 |
0.5453 (+0.0049) |
0.3020 (-0.0230) |
0.5305 (+0.0299) |
0.4593 (+0.0039) |
| 0.8019 |
4100 |
0.3395 |
0.3396 |
0.5573 (+0.0169) |
0.3112 (-0.0138) |
0.5358 (+0.0352) |
0.4681 (+0.0127) |
| 0.8214 |
4200 |
0.3455 |
0.3392 |
0.5415 (+0.0011) |
0.3049 (-0.0201) |
0.5366 (+0.0359) |
0.4610 (+0.0056) |
| 0.8410 |
4300 |
0.3374 |
0.3386 |
0.5216 (-0.0188) |
0.3003 (-0.0247) |
0.5483 (+0.0477) |
0.4568 (+0.0014) |
| 0.8606 |
4400 |
0.3269 |
0.3372 |
0.5372 (-0.0032) |
0.3147 (-0.0103) |
0.5703 (+0.0697) |
0.4741 (+0.0187) |
| 0.8801 |
4500 |
0.3492 |
0.3378 |
0.5367 (-0.0038) |
0.3119 (-0.0131) |
0.5747 (+0.0741) |
0.4744 (+0.0191) |
| 0.8997 |
4600 |
0.3392 |
0.3372 |
0.5421 (+0.0016) |
0.3104 (-0.0147) |
0.5530 (+0.0524) |
0.4685 (+0.0131) |
| 0.9192 |
4700 |
0.3414 |
0.3370 |
0.5306 (-0.0098) |
0.3082 (-0.0169) |
0.5655 (+0.0649) |
0.4681 (+0.0127) |
| 0.9388 |
4800 |
0.3352 |
0.3361 |
0.5360 (-0.0044) |
0.3057 (-0.0193) |
0.5535 (+0.0528) |
0.4651 (+0.0097) |
| 0.9583 |
4900 |
0.3344 |
0.3364 |
0.5437 (+0.0032) |
0.3036 (-0.0215) |
0.5706 (+0.0699) |
0.4726 (+0.0172) |
| 0.9779 |
5000 |
0.3411 |
0.3361 |
0.5452 (+0.0047) |
0.3015 (-0.0235) |
0.5627 (+0.0620) |
0.4698 (+0.0144) |
| 0.9975 |
5100 |
0.3408 |
0.3362 |
0.5452 (+0.0047) |
0.3050 (-0.0200) |
0.5655 (+0.0649) |
0.4719 (+0.0165) |
| -1 |
-1 |
- |
- |
0.5377 (-0.0028) |
0.3650 (+0.0399) |
0.5600 (+0.0594) |
0.4875 (+0.0322) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.13
- Sentence Transformers: 5.0.0
- Transformers: 4.51.0
- PyTorch: 2.9.1+cu126
- Accelerate: 1.8.1
- Datasets: 3.6.0
- Tokenizers: 0.21.4-dev.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}