CrossEncoder based on microsoft/MiniLM-L12-H384-uncased
This is a Cross Encoder model finetuned from microsoft/MiniLM-L12-H384-uncased on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: microsoft/MiniLM-L12-H384-uncased
- Maximum Sequence Length: 512 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
model = CrossEncoder("yjoonjang/reranker-msmarco-v1.1-MiniLM-L12-H384-uncased-plistmle-sigmoid")
pairs = [
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
ranks = model.rank(
'How many calories in an egg',
[
'There are on average between 55 and 80 calories in an egg depending on its size.',
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
'Most of the calories in an egg come from the yellow yolk in the center.',
]
)
Evaluation
Metrics
Cross Encoder Reranking
| Metric |
NanoMSMARCO_R100 |
NanoNFCorpus_R100 |
NanoNQ_R100 |
| map |
0.4636 (-0.0260) |
0.3174 (+0.0564) |
0.5700 (+0.1504) |
| mrr@10 |
0.4500 (-0.0275) |
0.4912 (-0.0086) |
0.5739 (+0.1472) |
| ndcg@10 |
0.5191 (-0.0213) |
0.3169 (-0.0081) |
0.6383 (+0.1377) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_R100_mean
- Evaluated with
CrossEncoderNanoBEIREvaluator with these parameters:{
"dataset_names": [
"msmarco",
"nfcorpus",
"nq"
],
"rerank_k": 100,
"at_k": 10,
"always_rerank_positives": true
}
| Metric |
Value |
| map |
0.4503 (+0.0603) |
| mrr@10 |
0.5051 (+0.0371) |
| ndcg@10 |
0.4915 (+0.0361) |
Training Details
Training Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 78,704 training samples
- Columns:
query, docs, and labels
- Approximate statistics based on the first 1000 samples:
|
query |
docs |
labels |
| type |
string |
list |
list |
| details |
- min: 11 characters
- mean: 33.74 characters
- max: 100 characters
|
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
|
- min: 3 elements
- mean: 6.50 elements
- max: 10 elements
|
- Samples:
| query |
docs |
labels |
cost of installing central air |
['Central Air Average Costs. The actual cost of central air installation depends on a number of factors, including the size of the home as well as the unit’s tonnage and SEER rating. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 1 In a 2,000 square foot home with existing ductwork, central air conditioning costs $3,000 to $5,000 installed. 2 If ductwork is additionally required, costs could reach $6,000 to $10,000 or more. 3 Mini-split central air conditioner prices average $1,500 to $3', 'For example, homes with forced hot air heating will have the duct work necessary for a fast and easy installation, when the project involves the running of ducts however the prices climb significantly. The average price to install a central air conditioner will range from $2650 to upwards of $15K. This installation cannot be considered a DIY project, and it is traditional for a homeowner to hire a contractor for the job. Central ai... |
[1, 0, 0, 0, 0, ...] |
how much does it cost to set up a cabinet shop |
['According to Kennedy, most cabinets range from $500 to $1,500 per cabinet box. Based on an estimated 30 cabinets in an average-size kitchen, you can be looking at a cost of about $15,000-$45,000, she says. Discover everything you need to know about cabinets with our free guide! 1. Measure the dimensions of your kitchen', "December 28, 2005 Question Those of you who consider your operation small, what type of machinery is the minimum for what you do? I'm starting a one man shop, 2,400 square feet, and know what I would like to have to start, but am curious how the rest of you get by. A simple streamlined operation that worked for professional builders, and sell some to DIYers for a retail price. I am a one man shop that builds cabinets, furniture and exterior/interior doors. My shop is 1600 sq ft with 300 sq ft of it being a small spray room.", 'Seven years later and I moved out of the garage to a more legitimate setting in an industrial park. Today, 25 years after starting out, my co... |
[1, 0, 0, 0, 0, ...] |
how close can a gas meter be to a condensing unit |
['Is it dangerous if it is close to the gas meter/pipe? Thanks! It should be 3 feet from the gas meter vent, and not the actual gas meter itself. The gas company can come out later to extend this vent further away from the meter if it is within 3 feet. But the chance that anything actually happening because of the ac too close to the vent is insanely remote. I would be more worried about getting hit by lightning than any problems with the gas.', 'Condensing Unit Too Close to House – Bad air conditioner installation jobs such as this one proves that it is in the best interest of the homeowner to hire competent HVAC air conditioner and heating installers so that the job is done correctly.', "Re: Condensing furnace Exhaust, Distances from window, electric and gas meters. Joel, 3 ft from operable window is what I have on the Electrical Service. Gas meter looks OK. Install instructions in your post says if below 100,000 btu clearance is 12', and 36' if over 100,000 btu.", 'Condensing Unit Too Close to House. This condensing unit was too close to the house to effectively reject heat. It was a bad HVAC condensing unit installation job by the HVAC installers. A mechanical inspector rejected the final for the permit until the condensing unit was correctly installed. It is recommended that condensing units have at least 2 feet of space so that it can'] |
[1, 0, 0, 0] |
- Loss:
ListMLELoss with these parameters:{
"lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
"activation_fct": "torch.nn.modules.activation.Sigmoid",
"mini_batch_size": 16,
"respect_input_order": true
}
Evaluation Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 1,000 evaluation samples
- Columns:
query, docs, and labels
- Approximate statistics based on the first 1000 samples:
|
query |
docs |
labels |
| type |
string |
list |
list |
| details |
- min: 11 characters
- mean: 34.38 characters
- max: 99 characters
|
- min: 2 elements
- mean: 6.00 elements
- max: 10 elements
|
- min: 2 elements
- mean: 6.00 elements
- max: 10 elements
|
- Samples:
| query |
docs |
labels |
how long does an iva stay on your credit file |
['For example your payments to your mobile phone (if you’re on a contract) and electricity companies will also appear in your credit report. Your IVA will show on your credit file for six years from the day it started. So if your IVA was five years long it will only be listed on your credit file for a further 12 months. The idea behind asking creditors to correct the dates on default notices is to make sure that these too will be gone within 12 months. Post IVA credit file clean up. It’s a happy day when your individual voluntary arrangement (IVA) finally ends, you’re well and truly free and clear and your money is your own again. You can also take satisfaction from the fact that you have done your best by your creditors.', 'LinkedIn0. An Individual Voluntary Arrangement (IVA) is recorded on your credit file for 6 years. During this time your credit rating will be negatively affected. Unfortunately your credit rating will not suddenly become good again after your Arrangement has ended ... |
[1, 0, 0, 0, 0, ...] |
Plants which produce their gametes in flowers are called what? |
['Plants which produce their gametes in flowers are called: antheridium, gymnosperms, angiosperms, or vascular. They are called angiosperms.', 'In humans, cells that do not produce gametes are collectively called somatic cells. Somatic cells do not include sperm and ova, the cells from which they are made, and und … ifferentiated stem cells.', 'This event is called fertilization. The male gametes produced by animals and some plants (e.g., club mosses, horsetails, ferns) are called spermatozoa (plural of spermatozoon), or simply sperm. Their female gametes are called ova (plural of ovum). Ova are often called eggs. Most plants produce male gametes called pollen grains.', 'Unlike animals, plants have multicellular haploid and multicellular diploid stages in their life cycle. Gametes develop from the multicellular haploid gametophytes (Greek phyton, plant). Fertilization gives rise to a multicellular, diploid sporophyte that produces haploid spores via meiosis.', 'Original conversation... |
[1, 0, 0, 0, 0, ...] |
what is a dts sound system |
['DTS is a series of multichannel audio technologies owned by DTS, Inc. (formerly known as D igital T heater S ystems, Inc.), an American company specializing in digital surround sound formats used for both commercial/theatrical and consumer grade applications. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, but DTS was not part of the original DVD specification, so early DVD players do not recognize DTS audio tracks at all.', 'DTS Connect is a blanket name for a two-part system used on the computer platform only, in order to convert PC audio into the DTS format, transported via a single S/PDIF cable. The two components of the system are DTS Interactive and DTS Neo:PC. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. Both music and movie DVDs allow delivery of DTS audio signal, bu... |
[1, 0, 0, 0, 0, ...] |
- Loss:
ListMLELoss with these parameters:{
"lambda_weight": "sentence_transformers.cross_encoder.losses.ListMLELoss.ListMLELambdaWeight",
"activation_fct": "torch.nn.modules.activation.Sigmoid",
"mini_batch_size": 16,
"respect_input_order": true
}
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True
All Hyperparameters
Click to expand
overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
Training Logs
| Epoch |
Step |
Training Loss |
Validation Loss |
NanoMSMARCO_R100_ndcg@10 |
NanoNFCorpus_R100_ndcg@10 |
NanoNQ_R100_ndcg@10 |
NanoBEIR_R100_mean_ndcg@10 |
| -1 |
-1 |
- |
- |
0.0407 (-0.4997) |
0.2816 (-0.0435) |
0.0231 (-0.4775) |
0.1151 (-0.3402) |
| 0.0002 |
1 |
883.6996 |
- |
- |
- |
- |
- |
| 0.0508 |
250 |
921.6613 |
- |
- |
- |
- |
- |
| 0.1016 |
500 |
904.6479 |
856.3090 |
0.1094 (-0.4310) |
0.2034 (-0.1216) |
0.2049 (-0.2957) |
0.1726 (-0.2828) |
| 0.1525 |
750 |
900.1757 |
- |
- |
- |
- |
- |
| 0.2033 |
1000 |
892.1912 |
847.0684 |
0.3615 (-0.1789) |
0.2856 (-0.0394) |
0.5605 (+0.0598) |
0.4025 (-0.0528) |
| 0.2541 |
1250 |
891.0896 |
- |
- |
- |
- |
- |
| 0.3049 |
1500 |
882.4826 |
844.2736 |
0.4446 (-0.0959) |
0.3072 (-0.0178) |
0.6115 (+0.1108) |
0.4544 (-0.0009) |
| 0.3558 |
1750 |
878.0654 |
- |
- |
- |
- |
- |
| 0.4066 |
2000 |
878.2091 |
840.3965 |
0.4614 (-0.0791) |
0.3450 (+0.0200) |
0.6472 (+0.1466) |
0.4845 (+0.0292) |
| 0.4574 |
2250 |
878.5553 |
- |
- |
- |
- |
- |
| 0.5082 |
2500 |
877.2454 |
841.2769 |
0.4602 (-0.0802) |
0.3123 (-0.0127) |
0.5765 (+0.0759) |
0.4497 (-0.0057) |
| 0.5591 |
2750 |
864.5746 |
- |
- |
- |
- |
- |
| 0.6099 |
3000 |
899.3305 |
838.2897 |
0.4752 (-0.0652) |
0.3152 (-0.0099) |
0.6333 (+0.1326) |
0.4746 (+0.0192) |
| 0.6607 |
3250 |
870.9701 |
- |
- |
- |
- |
- |
| 0.7115 |
3500 |
873.4406 |
835.9516 |
0.5191 (-0.0213) |
0.3169 (-0.0081) |
0.6383 (+0.1377) |
0.4915 (+0.0361) |
| 0.7624 |
3750 |
882.9871 |
- |
- |
- |
- |
- |
| 0.8132 |
4000 |
881.5676 |
836.2292 |
0.5024 (-0.0380) |
0.3269 (+0.0019) |
0.6350 (+0.1343) |
0.4881 (+0.0327) |
| 0.8640 |
4250 |
884.8231 |
- |
- |
- |
- |
- |
| 0.9148 |
4500 |
875.8995 |
834.7368 |
0.5028 (-0.0376) |
0.3284 (+0.0034) |
0.6200 (+0.1193) |
0.4837 (+0.0284) |
| 0.9656 |
4750 |
868.8395 |
- |
- |
- |
- |
- |
| -1 |
-1 |
- |
- |
0.5191 (-0.0213) |
0.3169 (-0.0081) |
0.6383 (+0.1377) |
0.4915 (+0.0361) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.11.11
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.49.0
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.4.0
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
ListMLELoss
@inproceedings{lan2013position,
title={Position-aware ListMLE: a sequential learning process for ranking},
author={Lan, Yanyan and Guo, Jiafeng and Cheng, Xueqi and Liu, Tie-Yan},
booktitle={Proceedings of the Twenty-Ninth Conference on Uncertainty in Artificial Intelligence},
pages={333--342},
year={2013}
}