CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: microsoft/MiniLM-L12-H384-uncased
- Maximum Sequence Length: 512 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
model = CrossEncoder("yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-normalize-sum")
pairs = [
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
ranks = model.rank(
'How many calories in an egg',
[
'There are on average between 55 and 80 calories in an egg depending on its size.',
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
'Most of the calories in an egg come from the yellow yolk in the center.',
]
)
Evaluation
Metrics
Cross Encoder Reranking
| Metric |
NanoMSMARCO_R100 |
NanoNFCorpus_R100 |
NanoNQ_R100 |
| map |
0.4715 (-0.0181) |
0.3131 (+0.0521) |
0.5155 (+0.0959) |
| mrr@10 |
0.4636 (-0.0139) |
0.5461 (+0.0462) |
0.5212 (+0.0945) |
| ndcg@10 |
0.5387 (-0.0018) |
0.3326 (+0.0075) |
0.5741 (+0.0734) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator with these parameters:{
"dataset_names": [
"msmarco",
"nfcorpus",
"nq"
],
"rerank_k": 100,
"at_k": 10,
"always_rerank_positives": true
}
| Metric |
Value |
| map |
0.4334 (+0.0433) |
| mrr@10 |
0.5103 (+0.0423) |
| ndcg@10 |
0.4818 (+0.0264) |
Training Details
Training Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 78,704 training samples
- Columns:
query, docs, and labels
- Approximate statistics based on the first 1000 samples:
|
query |
docs |
labels |
| type |
string |
list |
list |
| details |
- min: 10 characters
- mean: 34.51 characters
- max: 113 characters
|
- min: 3 elements
- mean: 7.11 elements
- max: 12 elements
|
- min: 3 elements
- mean: 7.11 elements
- max: 12 elements
|
- Samples:
| query |
docs |
labels |
what makes insulin |
["Insulin is a hormone. It makes our body's cells absorb glucose from the blood. The glucose is stored in the liver and muscle as glycogen and stops the body from using fat as a source of energy. When there is very little insulin in the blood, or none at all, glucose is not taken up by most body cells. Insulin is also released when glucose is present in the blood. After eating carbohydrates, blood glucose levels rise. Insulin makes it possible for glucose to enter our body's cells-without glucose in our cells they would not be able to function. Without insulin the glucose cannot enter our", "Type 1 Diabetes Type 1 diabetes is a serious condition that occurs when the pancreas makes little or no insulin. Without insulin, the body is unable to take the glucose (blood sugar) it gets from food into cells to fuel the body. People with type 1 diabetes must take daily insulin or other medications daily. With the help of insulin, the body's cells take up the glucose and use it for energy. When ... |
[1, 0, 0, 0, 0, ...] |
what is the temperature in puerto plata dominican republic |
['Puerto Plata: Annual Weather Averages. July is the hottest month in Puerto Plata with an average temperature of 27°C (81°F) and the coldest is January at 23°C (73°F) with the most daily sunshine hours at 8 in August. The wettest month is December with an average of 246mm of rain. ', 'The average daily temperature in Puerto Plata can reach highs of around 28 C, which can drop to 17 C. The temperature of the sea stays on average at around 26 C (79 F). There is an average rainfall of 148 mm over 11 days in Puerto Plata throughout this month. Puerto Plata sees 6 hours of sunshine a day during this month.', 'This report describes the typical weather at the Gregorio Luperon Luperón International (Airport Puerto, Plata Dominican) republic weather station over the course of an Average. December it is based on the historical records from 1997 to. 2012 earlier records are either unavailable or. unreliable Wind. The wind is most often out of the east (27% of the time). The wind is least often o... |
[1, 0, 0, 0, 0, ...] |
what nutrients are in guacamole |
['Guacamole contains mashed avocados and seasonings, such as lime or lemon juice, garlic and cilantro. You can eat this Mexican food with tortilla chips or as a topping. Avocados contribute vitamins, minerals and healthy fats to the dish. In moderation, guacamole is a healthy addition to a balanced diet. Sodium and Potassium. Each serving of guacamole contains 10 milligrams of sodium and 452 milligrams of potassium. A high-sodium diet can lead to high blood pressure and cause congestive heart failure, kidney failure and stroke, according to MayoClinic.com.', "Calories and Macronutrients. A guacamole recipe made with four large avocados makes eight servings, each containing 150 calories and 2 grams of protein. Because guacamole is a calorie-dense food, watch your portion size if you're counting calories. A serving of guacamole contains 9 grams of total carbohydrates. Sodium and Potassium. Each serving of guacamole contains 10 milligrams of sodium and 452 milligrams of potassium. A high-... |
[1, 1, 0, 0, 0, ...] |
- Loss:
PListMLELoss with these parameters:{
"lambda_weight": "sentence_transformers.cross_encoder.losses.PListMLELoss.PListMLELambdaWeight",
"activation_fct": "torch.nn.modules.linear.Identity",
"mini_batch_size": null,
"respect_input_order": true
}
Evaluation Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 1,000 evaluation samples
- Columns:
query, docs, and labels
- Approximate statistics based on the first 1000 samples:
|
query |
docs |
labels |
| type |
string |
list |
list |
| details |
- min: 11 characters
- mean: 33.06 characters
- max: 99 characters
|
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
|
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
|
- Samples:
| query |
docs |
labels |
what currency is accepted in london |
["The UK unit of currency is the pound sterling (£). In London we often call one pound (£1) a quid and sometimes a nicker. A lot of European countries have changed their currency to the Euro, but the UK has not yet joined. There is a lot of speculation about if and when we ever will join, but that's another story.", "The UK's currency is the pound sterling (£ / GBP). Despite being a member of the European Union, the UK has not adopted the euro. There are 100 pence (p) to the pound (£). Notes come in denominations of £5, £10, £20 and £50. Coins come in 1p, 2p, 5p, 10p, 20p, 50p, £1 and £2. Money Talks: Speak Like a Londoner. You will usually hear British people say pee rather than pence, as in 50p (50 pee). More colloquially, a pound is known as a quid, a five pound note is a fiver and a ten pound note a tenner.", 'Either change your dollars at the airport when you arrive or use your ATM. There are countless places (banks, bureaux de change) in London to change money, and some major sho... |
[1, 0, 0, 0, 0, ...] |
what are earwigs |
['Earwigs are nocturnal insects commonly found in high moisture areas near human dwellings. They are omnivorous feeding on both plants and other insects living and dead. Contrary to the old wives tales, earwigs do not intentionally crawl into your ears and they do not eat your brains, their omnivory notwithstanding. ', 'With about 2,000 species in 12 families, they are one of the smaller insect orders. Earwigs have characteristic cerci, a pair of forceps pincers on their abdomen, and membranous wings folded underneath short forewings, hence the scientific order name, skin wings.. Earwigs rarely use their flying ability. Earwigs are mostly nocturnal and often hide in small, moist crevices during the day, and are active at night, feeding on a wide variety of insects and plants. Damage to foliage, flowers, and various crops is commonly blamed on earwigs, especially the common earwig Forficula auricularia', 'Earwigs, or pincher bugs, like to eat decomposing plants and wet leaves. They inva... |
[1, 0, 0, 0, 0, ...] |
what is the water temperature in daydream island |
['Daydream Island weather consists of warm winters, sunny springs and autumns and hot humid summers. Water temperatures are a beautiful 25 degrees Celsius all year round. Maximum temperatures in Daydream Island rarely move out of the 31C to 24C range all year round. Daydream Island experiences higher rainfall averages in December through to February. This is also when the highest average temperatures are recorded. The tropical showers are typically heavy but brief, and there are usually plenty of sunshine periods during these months.', 'To get to Daydream Island Resort and Spa you travel by luxury launch transfer (ferry) or helicopter. If you are transferring by launch this is done from Port of Airlie (mainland Australia) or the Great Barrier Reef Airport (Hamilton Island).', 'What To Bring To Daydream Island! Daydream is the dream tropical island complete with the perfect tropical climate that you dream about. The island has a beautiful warm sub-tropical climate that is perfect for th... |
[1, 0, 0, 0, 0, ...] |
- Loss:
PListMLELoss with these parameters:{
"lambda_weight": "sentence_transformers.cross_encoder.losses.PListMLELoss.PListMLELambdaWeight",
"activation_fct": "torch.nn.modules.linear.Identity",
"mini_batch_size": null,
"respect_input_order": true
}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
NanoMSMARCO_R100_ndcg@10 |
NanoNFCorpus_R100_ndcg@10 |
NanoNQ_R100_ndcg@10 |
NanoBEIR_R100_mean_ndcg@10 |
| -1 |
-1 |
- |
- |
0.1311 (-0.4093) |
0.2772 (-0.0479) |
0.0728 (-0.4278) |
0.1604 (-0.2950) |
| 0.0002 |
1 |
2.1843 |
- |
- |
- |
- |
- |
| 0.0508 |
250 |
2.0934 |
- |
- |
- |
- |
- |
| 0.1016 |
500 |
1.9667 |
1.9470 |
0.0856 (-0.4548) |
0.2062 (-0.1189) |
0.1352 (-0.3655) |
0.1423 (-0.3130) |
| 0.1525 |
750 |
1.927 |
- |
- |
- |
- |
- |
| 0.2033 |
1000 |
1.8803 |
1.8826 |
0.4032 (-0.1372) |
0.2588 (-0.0662) |
0.4783 (-0.0223) |
0.3801 (-0.0753) |
| 0.2541 |
1250 |
1.8766 |
- |
- |
- |
- |
- |
| 0.3049 |
1500 |
1.8778 |
1.8625 |
0.4667 (-0.0738) |
0.2987 (-0.0263) |
0.5095 (+0.0089) |
0.4250 (-0.0304) |
| 0.3558 |
1750 |
1.866 |
- |
- |
- |
- |
- |
| 0.4066 |
2000 |
1.8586 |
1.8422 |
0.5211 (-0.0193) |
0.3072 (-0.0178) |
0.5527 (+0.0521) |
0.4604 (+0.0050) |
| 0.4574 |
2250 |
1.8588 |
- |
- |
- |
- |
- |
| 0.5082 |
2500 |
1.845 |
1.8368 |
0.5387 (-0.0018) |
0.3326 (+0.0075) |
0.5741 (+0.0734) |
0.4818 (+0.0264) |
| 0.5591 |
2750 |
1.8499 |
- |
- |
- |
- |
- |
| 0.6099 |
3000 |
1.8396 |
1.8326 |
0.5161 (-0.0243) |
0.3296 (+0.0046) |
0.5773 (+0.0766) |
0.4743 (+0.0190) |
| 0.6607 |
3250 |
1.8373 |
- |
- |
- |
- |
- |
| 0.7115 |
3500 |
1.8372 |
1.8296 |
0.5154 (-0.0250) |
0.3109 (-0.0141) |
0.5724 (+0.0717) |
0.4662 (+0.0109) |
| 0.7624 |
3750 |
1.8405 |
- |
- |
- |
- |
- |
| 0.8132 |
4000 |
1.8304 |
1.8294 |
0.5389 (-0.0015) |
0.3155 (-0.0095) |
0.5748 (+0.0741) |
0.4764 (+0.0210) |
| 0.8640 |
4250 |
1.8292 |
- |
- |
- |
- |
- |
| 0.9148 |
4500 |
1.8268 |
1.8217 |
0.5298 (-0.0106) |
0.3097 (-0.0154) |
0.5653 (+0.0647) |
0.4683 (+0.0129) |
| 0.9656 |
4750 |
1.8273 |
- |
- |
- |
- |
- |
| -1 |
-1 |
- |
- |
0.5387 (-0.0018) |
0.3326 (+0.0075) |
0.5741 (+0.0734) |
0.4818 (+0.0264) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.4.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
PListMLELoss
@inproceedings{lan2014position,
title={Position-Aware ListMLE: A Sequential Learning Process for Ranking.},
author={Lan, Yanyan and Zhu, Yadong and Guo, Jiafeng and Niu, Shuzi and Cheng, Xueqi},
booktitle={UAI},
volume={14},
pages={449--458},
year={2014}
}