Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ayushexel/emb-bge-base-en-v1.5-squad-7-epochs")
# Run inference
sentences = [
'The ability to store and execute lists of instructions are called what?',
'The ability to store and execute lists of instructions called programs makes computers extremely versatile, distinguishing them from calculators. The Church–Turing thesis is a mathematical statement of this versatility: any computer with a minimum capability (being Turing-complete) is, in principle, capable of performing the same tasks that any other computer can perform. Therefore, any type of computer (netbook, supercomputer, cellular automaton, etc.) is able to perform the same computational tasks, given enough time and storage capacity.',
"In most computers, individual instructions are stored as machine code with each instruction being given a unique number (its operation code or opcode for short). The command to add two numbers together would have one opcode; the command to multiply them would have a different opcode, and so on. The simplest computers are able to perform any of a handful of different instructions; the more complex computers have several hundred to choose from, each with a unique numerical code. Since the computer's memory is able to store numbers, it can also store the instruction codes. This leads to the important fact that entire programs (which are just lists of these instructions) can be represented as lists of numbers and can themselves be manipulated inside the computer in the same way as numeric data. The fundamental concept of storing programs in the computer's memory alongside the data they operate on is the crux of the von Neumann, or stored program[citation needed], architecture. In some cases, a computer might store some or all of its program in memory that is kept separate from the data it operates on. This is called the Harvard architecture after the Harvard Mark I computer. Modern von Neumann computers display some traits of the Harvard architecture in their designs, such as in CPU caches.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
gooqa-devTripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.4192 |
question, context, and negative| question | context | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| question | context | negative |
|---|---|---|
Which string instrument often played the basso continuo parts? |
Baroque instruments included some instruments from the earlier periods (e.g., the hurdy-gurdy and recorder) and a number of new instruments (e.g, the cello, contrabass and fortepiano). Some instruments from previous eras fell into disuse, such as the shawm and the wooden cornet. The key Baroque instruments for strings included the violin, viol, viola, viola d'amore, cello, contrabass, lute, theorbo (which often played the basso continuo parts), mandolin, cittern, Baroque guitar, harp and hurdy-gurdy. Woodwinds included the Baroque flute, Baroque oboe, rackett, recorder and the bassoon. Brass instruments included the cornett, natural horn, Baroque trumpet, serpent and the trombone. Keyboard instruments included the clavichord, tangent piano, the fortepiano (an early version of the piano), the harpsichord and the pipe organ. Percussion instruments included the timpani, snare drum, tambourine and the castanets. |
That said, the score does not provide complete and exact instructions on how to perform a historical work. Even if the tempo is written with an Italian instruction (e.g., Allegro), we do not know exactly how fast the piece should be played. As well, in the Baroque era, many works that were designed for basso continuo accompaniment do not specify which instruments should play the accompaniment or exactly how the chordal instrument (harpsichord, lute, etc.) should play the chords, which are not notated in the part (only a figured bass symbol beneath the bass part is used to guide the chord-playing performer). The performer and/or the conductor have a range of options for musical expression and interpretation of a scored piece, including the phrasing of melodies, the time taken during fermatas (held notes) or pauses, and the use (or choice not to use) of effects such as vibrato or glissando (these effects are possible on various stringed, brass and woodwind instruments and with the human ... |
Besides fuels and paving, what accounts for most of the other use of bitumen? |
Roofing shingles account for most of the remaining asphalt/bitumen consumption. Other uses include cattle sprays, fence-post treatments, and waterproofing for fabrics. Asphalt/bitumen is used to make Japan black, a lacquer known especially for its use on iron and steel, and it is also used in paint and marker inks by some graffiti supply companies to increase the weather resistance and permanence of the paint or ink, and to make the color much darker.[citation needed] Asphalt/bitumen is also used to seal some alkaline batteries during the manufacturing process. |
The value of the deposit was obvious from the start, but the means of extracting the bitumen were not. The nearest town, Fort McMurray, Alberta was a small fur trading post, other markets were far away, and transportation costs were too high to ship the raw bituminous sand for paving. In 1915, Sidney Ells of the Federal Mines Branch experimented with separation techniques and used the bitumen to pave 600 feet of road in Edmonton, Alberta. Other roads in Alberta were paved with oil sands, but it was generally not economic. During the 1920s Dr. Karl A. Clark of the Alberta Research Council patented a hot water oil separation process and entrepreneur Robert C. Fitzsimmons built the Bitumount oil separation plant, which between 1925 and 1958 produced up to 300 barrels (50 m3) per day of bitumen using Dr. Clark's method. Most of the bitumen was used for waterproofing roofs, but other uses included fuels, lubrication oils, printers ink, medicines, rust and acid-proof paints, fireproof roofin... |
Where is the "Lotte Shopping Avenue" located? |
Parkson enters by acquiring local brand Centro Department Store in 2011. Centro still operates for middle market while the 'Parkson' brand itself, positioned for middle-up segment, enters in 2014 by opening its first store in Medan, followed by its second store in Jakarta. Lotte, meanwhile, enters the market by inking partnership with Ciputra Group, creating what its called 'Lotte Shopping Avenue' inside the Ciputra World Jakarta complex, as well as acquiring Makro and rebranding it into Lotte Mart. |
13th Street is in three parts. The first is a dead end from Avenue C. The second starts at a dead end, just before Avenue B, and runs to Greenwich Avenue, and the third part is from Eighth Avenue to Tenth Avenue. |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
question, context, and negative_1| question | context | negative_1 | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| question | context | negative_1 |
|---|---|---|
People of what ethnicity most visibly participated in the Draft Riots of 1863? |
Democratic Party candidates were consistently elected to local office, increasing the city's ties to the South and its dominant party. In 1861, Mayor Fernando Wood called on the aldermen to declare independence from Albany and the United States after the South seceded, but his proposal was not acted on. Anger at new military conscription laws during the American Civil War (1861–1865), which spared wealthier men who could afford to pay a $300 (equivalent to $5,766 in 2016) commutation fee to hire a substitute, led to the Draft Riots of 1863, whose most visible participants were ethnic Irish working class. The situation deteriorated into attacks on New York's elite, followed by attacks on black New Yorkers and their property after fierce competition for a decade between Irish immigrants and blacks for work. Rioters burned the Colored Orphan Asylum to the ground, but more than 200 children escaped harm due to efforts of the New York City Police Department, which was mainly made up of Iris... |
Democratic Party candidates were consistently elected to local office, increasing the city's ties to the South and its dominant party. In 1861, Mayor Fernando Wood called on the aldermen to declare independence from Albany and the United States after the South seceded, but his proposal was not acted on. Anger at new military conscription laws during the American Civil War (1861–1865), which spared wealthier men who could afford to pay a $300 (equivalent to $5,766 in 2016) commutation fee to hire a substitute, led to the Draft Riots of 1863, whose most visible participants were ethnic Irish working class. The situation deteriorated into attacks on New York's elite, followed by attacks on black New Yorkers and their property after fierce competition for a decade between Irish immigrants and blacks for work. Rioters burned the Colored Orphan Asylum to the ground, but more than 200 children escaped harm due to efforts of the New York City Police Department, which was mainly made up of Iris... |
What is an example of irish broadcaster? |
Honorary knighthoods are appointed to citizens of nations where Queen Elizabeth II is not Head of State, and may permit use of post-nominal letters but not the title of Sir or Dame. Occasionally honorary appointees are, incorrectly, referred to as Sir or Dame - Bill Gates or Bob Geldof, for example. Honorary appointees who later become a citizen of a Commonwealth realm can convert their appointment from honorary to substantive, then enjoy all privileges of membership of the order including use of the title of Sir and Dame for the senior two ranks of the Order. An example is Irish broadcaster Terry Wogan, who was appointed an honorary Knight Commander of the Order in 2005 and on successful application for dual British and Irish citizenship was made a substantive member and subsequently styled as "Sir Terry Wogan KBE". |
In Ireland, pubs are known for their atmosphere or "craic". In Irish, a pub is referred to as teach tábhairne ("tavernhouse") or teach óil ("drinkinghouse"). Live music, either sessions of traditional Irish music or varieties of modern popular music, is frequently featured in the pubs of Ireland. Pubs in Northern Ireland are largely identical to their counterparts in the Republic of Ireland except for the lack of spirit grocers. A side effect of "The Troubles" was that the lack of a tourist industry meant that a higher proportion of traditional bars have survived the wholesale refitting of Irish pub interiors in the 'English style' in the 1950s and 1960s. New Zealand sports a number of Irish pubs. |
What is the highest point in Eritrea? |
Eritrea can be split into three ecoregions. To the east of the highlands are the hot, arid coastal plains stretching down to the southeast of the country. The cooler, more fertile highlands, reaching up to 3000m has a different habitat. Habitats here vary from the sub-tropical rainforest at Filfil Solomona to the precipitous cliffs and canyons of the southern highlands. The Afar Triangle or Danakil Depression of Eritrea is the probable location of a triple junction where three tectonic plates are pulling away from one another.The highest point of the country, Emba Soira, is located in the center of Eritrea, at 3,018 meters (9,902 ft) above sea level. |
During the Middle Ages, the Eritrea region was known as Medri Bahri ("sea-land"). The name Eritrea is derived from the ancient Greek name for Red Sea (Ἐρυθρὰ Θάλασσα Erythra Thalassa, based on the adjective ἐρυθρός erythros "red"). It was first formally adopted in 1890, with the formation of Italian Eritrea (Colonia Eritrea). The territory became the Eritrea Governorate within Italian East Africa in 1936. Eritrea was annexed by Ethiopia in 1953 (nominally within a federation until 1962) and an Eritrean Liberation Front formed in 1960. Eritrea gained independence following the 1993 referendum, and the name of the new state was defined as State of Eritrea in the 1997 constitution.[citation needed] |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: stepsper_device_train_batch_size: 128per_device_eval_batch_size: 128num_train_epochs: 7warmup_ratio: 0.1fp16: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 128per_device_eval_batch_size: 128per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 7max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | gooqa-dev_cosine_accuracy |
|---|---|---|---|---|
| -1 | -1 | - | - | 0.3536 |
| 0.2890 | 100 | 0.7455 | 0.8072 | 0.3876 |
| 0.5780 | 200 | 0.4742 | 0.7586 | 0.3952 |
| 0.8671 | 300 | 0.4203 | 0.7294 | 0.4088 |
| 1.1561 | 400 | 0.323 | 0.7337 | 0.4130 |
| 1.4451 | 500 | 0.2707 | 0.7246 | 0.4112 |
| 1.7341 | 600 | 0.2631 | 0.7098 | 0.4182 |
| 2.0231 | 700 | 0.2566 | 0.7093 | 0.4200 |
| 2.3121 | 800 | 0.1478 | 0.7202 | 0.4146 |
| 2.6012 | 900 | 0.1543 | 0.7255 | 0.4176 |
| 2.8902 | 1000 | 0.1555 | 0.7219 | 0.4174 |
| 3.1792 | 1100 | 0.1227 | 0.7331 | 0.4134 |
| 3.4682 | 1200 | 0.1001 | 0.7305 | 0.4178 |
| 3.7572 | 1300 | 0.1011 | 0.7346 | 0.4234 |
| 4.0462 | 1400 | 0.0989 | 0.7443 | 0.4208 |
| 4.3353 | 1500 | 0.0738 | 0.7498 | 0.4138 |
| 4.6243 | 1600 | 0.079 | 0.7519 | 0.4172 |
| 4.9133 | 1700 | 0.0771 | 0.7563 | 0.4166 |
| 5.2023 | 1800 | 0.0646 | 0.7595 | 0.4136 |
| 5.4913 | 1900 | 0.0589 | 0.7618 | 0.4130 |
| 5.7803 | 2000 | 0.0605 | 0.7605 | 0.4172 |
| 6.0694 | 2100 | 0.0603 | 0.7597 | 0.4164 |
| 6.3584 | 2200 | 0.0556 | 0.7652 | 0.4164 |
| 6.6474 | 2300 | 0.0527 | 0.7664 | 0.4150 |
| 6.9364 | 2400 | 0.0541 | 0.7655 | 0.4174 |
| -1 | -1 | - | - | 0.4192 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
BAAI/bge-base-en-v1.5