diff --git "a/README.md" "b/README.md" --- "a/README.md" +++ "b/README.md" @@ -7,152 +7,117 @@ tags: - feature-extraction - dense - generated_from_trainer -- dataset_size:3876398 -- loss:CachedMultipleNegativesRankingLoss +- dataset_size:7176192 +- loss:AnglELoss - loss:CoSENTLoss +- loss:CachedMultipleNegativesRankingLoss base_model: jhu-clsp/ettin-encoder-17m widget: -- source_sentence: what important natural resources in west africa did the songhai - empire control +- source_sentence: what is paediatric clinical psychology sentences: - - Timur In 1398, Timur invaded northern India, attacking the Delhi Sultanate ruled - by Sultan Nasir-ud-Din Mahmud Shah Tughluq of the Tughlaq Dynasty. He was opposed - by Ahirs and faced some reversals from the Jats, but the Sultanate at Delhi did - nothing to stop him.[59][60] After crossing the Indus river on 30 September 1398, - he sacked Tulamba and massacred its inhabitants.[61] Then he advanced and captured - Multan by October.[62] - - Timur In 1398, Timur invaded northern India, attacking the Delhi Sultanate ruled - by Sultan Nasir-ud-Din Mahmud Shah Tughluq of the Tughlaq Dynasty. He was opposed - by Ahirs and faced some reversals from the Jats, but the Sultanate at Delhi did - nothing to stop him.[59][60] After crossing the Indus river on 30 September 1398, - he sacked Tulamba and massacred its inhabitants.[61] Then he advanced and captured - Multan by October.[62] - - Timur In 1398, Timur invaded northern India, attacking the Delhi Sultanate ruled - by Sultan Nasir-ud-Din Mahmud Shah Tughluq of the Tughlaq Dynasty. He was opposed - by Ahirs and faced some reversals from the Jats, but the Sultanate at Delhi did - nothing to stop him.[59][60] After crossing the Indus river on 30 September 1398, - he sacked Tulamba and massacred its inhabitants.[61] Then he advanced and captured - Multan by October.[62] - - Indus Valley Civilisation Suggested contributory causes for the localisation of - the IVC include changes in the course of the river,[148] and climate change that - is also signalled for the neighbouring areas of the Middle East.[149][150] As - of 2016[update] many scholars believe that drought and a decline in trade with - Egypt and Mesopotamia caused the collapse of the Indus Civilisation.[151] - - History of West Africa In 1591, Morocco invaded the Songhai Empire under Ahmad - al-Mansur of the Saadi Dynasty to secure the goldfields of the Sahel. At the Battle - of Tondibi, the Songhai army was defeated. The Moroccans captured Djenne, Gao, - and Timbuktu, but they were unable to secure the whole region. Askiya Nuhu and - the Songhay army regrouped at Dendi in the heart of Songhai territory where a - spirited guerrilla resistance sapped the resources of the Moroccans, who were - dependent upon constant resupply from Morocco. Songhai split into several states - during the 17th century. - - Ancient Egypt The pharaoh was the absolute monarch of the country and, at least - in theory, wielded complete control of the land and its resources. The king was - the supreme military commander and head of the government, who relied on a bureaucracy - of officials to manage his affairs. In charge of the administration was his second - in command, the vizier, who acted as the king's representative and coordinated - land surveys, the treasury, building projects, the legal system, and the archives.[88] - At a regional level, the country was divided into as many as 42 administrative - regions called nomes each governed by a nomarch, who was accountable to the vizier - for his jurisdiction. The temples formed the backbone of the economy. Not only - were they houses of worship, but were also responsible for collecting and storing - the nation's wealth in a system of granaries and treasuries administered by overseers, - who redistributed grain and goods.[89] -- source_sentence: A group of friends and I went to a corn maize. It was a dark night - and the corn maize was said to be haunted. A scary man jumped out and scared us. - We ended up lost in the maize for two hours. We made it out and were glad we experienced - the maze together. + - Pediatric neuropsychology (paediatric in the UK) is a sub-speciality within the + field of clinical neuropsychology that studies the relationship between brain + health and behaviour in children.any pediatric neuropsychologists are involved + in teaching, research, supervision, and training of undergraduate and graduate + students in the field. In the United States undergraduate and graduate psychology + programs generally do not offer a track in pediatric neuropsychology, per se. + - "â\x80\x9CRealâ\x80\x9D hummus, should contain about 175 calories, out of which\ + \ 70-80 calories are contributed by fat. The average Israeli eats 8-10 kilograms\ + \ (18-22 pounds) of hummus every year, so weâ\x80\x99re talking about extra 15,000\ + \ calories which can make him gain about 2.5kg of body weight each year. So you\ + \ can see how excessive consumption of the packaged product might be fattening\ + \ over the years. The common serving size of hummus (real hummus, that is), which\ + \ is around one cup (220-240g) may contain 400-450 calories. And every pita (â\x80\ + \x9Cpita breadâ\x80\x9D) contains another 270, so itâ\x80\x99s not really â\x80\ + \x9Cdietaryâ\x80\x9D." + - "Pediatrics (also spelled paediatrics or pædiatrics) is the branch of medicine\ + \ that involves the medical care of infants, children, and adolescents. The American\ + \ Academy of Pediatrics recommends people be under pediatric care up to the age\ + \ of 21.[1] A medical practitioner who specializes in this area is known as a\ + \ pediatrician, or paediatrician. The word pediatrics and its cognates mean healer\ + \ of children; they derive from two Greek words: Ï\x80αá¿\x96Ï\x82 (pais child)\ + \ and ἰαÏ\x84Ï\x81Ï\x8CÏ\x82 (iatros doctor, healer)." +- source_sentence: For example , Elizabeth Coffin , daughter of a wealthy merchant + from Nantucket , was mother of the prominent Massachusetts industrialists Henry + Coffin Nevins and David Nevins , Jr.. sentences: - - 'hypothesis: The median income for a household in the city was $26,969, and the - median income for a family was $31,997.' - - 'hypothesis: There is one objects is Daniel carrying.' - - 'hypothesis: We calmed down in order to solve the maize ends after we made it - out' -- source_sentence: J. David Spurlock was born on November 18 , 1959 in Dallas , Texas - . He moved to Memphis , Tennessee in 1973 . + - Born in the middle of the Great Depression , Carl Carl Spitz was trained and owned + by Terry . + - For example , Elizabeth Coffin , a daughter of a wealthy merchant from Nantucket + , was the mother of the prominent Massachusetts industrialists Henry Coffin Nevins + and David Nevins , Jr ... + - The couple had their first child , Shalk Jr , in August 2012 , and in March 2014 + his second son Nicol was born . +- source_sentence: UN Chief Finds His Voice, Remains Cautious on China sentences: - - David Spurlock was born on 18 November 1959 in Dallas , Texas , and moved to Memphis - , Tennessee in 1973 . - - This is a list of the etymology of street names in the district Covent Garden - in London . - - During the five days of the journey , he brought some books about Elba which he - studied from Fontainebleau . -- source_sentence: The musical films '' Mark Twain `` and '' Huckleberry Finn `` , - both based on Tom Sawyer 's novels , were partially shot on site . + - 'Insight: U.N. chief finds his voice, but remains cautious on China' + - kashmir is claimed by both india and pakistan. + - Death toll in Kenya bus attack rises to six +- source_sentence: Mayor Michael R. Bloomberg said yesterday that the men's behavior + "was a disgrace, and totally inappropriate for city employees." sentences: - - The musical films `` Mark Twain '' and `` Huckleberry Finn '' , both based on - Tom Sawyer 's novels , were shot partially on location here . - - His father called him Ali and his nickname was Murtaza . - - A KWL chart can be used for all subjects in a whole group or a small group atmosphere - . -- source_sentence: what blood type can donate blood to all other blood types. + - SOCIAL ECONOMICAL RESEARCH STUDY + - I assume you arrive from you apartment in Old Town, Selina Kyle. + - The way the men acted "was a disgrace, and totally inappropriate for city employees" + according to Michael R. Bloomberg's comments yesterday +- source_sentence: what are the three subatomic particles called? sentences: - - A number of Blood donors also donate platelets by apheresis (a procedure in which - Blood is drawn from a Blood donor and separated into its components, some of which - are retained, such as plasma or platelets, and the remainder of the Blood is returned, - by transfusion, to the Blood donor; also called hemapheresis). - - "Uluru is easily the most iconic natural landform in Australia, and its formation\ - \ was equally special.The creation of Uluru and Kata Tjuta â\x80\x94 as both were\ - \ formed at the same time â\x80\x94 began over 500 million years ago. At this\ - \ time the big crustal blocks that form the Australian continent coming together.luru\ - \ is easily the most iconic natural landform in Australia, and its formation was\ - \ equally special." - - "AB can only donate blood to other ABs. A can donate blood to As and ABs. B can\ - \ donate blood to Bs and ABs. O can donate to all blood types. Negative Rh factors\ - \ can donate toâ\x80¦ both positive and negative Rh factors, however postive Rh\ - \ factors can only donate to other positive. So if I had O+ blood, I couldn't\ - \ donate to A-, but could donate to A+." + - Subatomic particles include electrons, the negatively charged, almost massless + particles that nevertheless account for most of the size of the atom, and they + include the heavier building blocks of the small but very dense nucleus of the + atom, the positively charged protons and the electrically neutral neutrons. + - Your body needs cholesterol to build healthy cells, but high levels of cholesterol + can increase your risk of heart disease. With high cholesterol, you can develop + fatty deposits in your blood vessels. Eventually, these deposits grow, making + it difficult for enough blood to flow through your arteries. + - 'If you experience any of the following symptoms, stop taking ibuprofen and call + your doctor: stomach pain, heartburn, vomit that is bloody or looks like coffee + grounds, blood in the stool, or black and tarry stools. Keep all appointments + with your doctor and the laboratory.' datasets: -- tasksource/merged-2l-nli -- tasksource/merged-3l-nli -- tasksource/zero-shot-label-nli -- MoritzLaurer/dataset_train_nli - google-research-datasets/paws - nyu-mll/glue - mwong/fever-evidence-related +- tasksource/parade +- tasksource/apt - tasksource/sts-companion -- tomaarsen/natural-questions-hard-negatives -- tomaarsen/gooaq-hard-negatives -- bclavie/msmarco-500k-triplets -- sentence-transformers/all-nli -- sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1 -- sentence-transformers/gooaq -- sentence-transformers/natural-questions +- tasksource/zero-shot-label-nli pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on jhu-clsp/ettin-encoder-17m -This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) on 18 datasets. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. +This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) on 19 datasets. It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) -- **Maximum Sequence Length:** 512 tokens +- **Maximum Sequence Length:** 1024 tokens - **Output Dimensionality:** 256 dimensions - **Similarity Function:** Cosine Similarity - **Training Datasets:** - - [merged-2l-nli](https://huggingface.co/datasets/tasksource/merged-2l-nli) - - [merged-3l-nli](https://huggingface.co/datasets/tasksource/merged-3l-nli) - - [zero-shot-label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli) - - [dataset_train_nli](https://huggingface.co/datasets/MoritzLaurer/dataset_train_nli) - [paws/labeled_final](https://huggingface.co/datasets/paws) - [glue/mrpc](https://huggingface.co/datasets/glue) - - [glue/qqp](https://huggingface.co/datasets/glue) - [fever-evidence-related](https://huggingface.co/datasets/mwong/fever-evidence-related) + - [parade](https://huggingface.co/datasets/tasksource/parade) + - [apt](https://huggingface.co/datasets/tasksource/apt) - [glue/stsb](https://huggingface.co/datasets/glue) - sick/relatedness - [sts-companion](https://huggingface.co/datasets/tasksource/sts-companion) - - [tomaarsen/natural-questions-hard-negatives](https://huggingface.co/datasets/tomaarsen/natural-questions-hard-negatives) - - [tomaarsen/gooaq-hard-negatives](https://huggingface.co/datasets/tomaarsen/gooaq-hard-negatives) - - [bclavie/msmarco-500k-triplets](https://huggingface.co/datasets/bclavie/msmarco-500k-triplets) - - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) - - [sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1](https://huggingface.co/datasets/sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1) - - [sentence-transformers/gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) - - [sentence-transformers/natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) + - [zero-shot-label-nli](https://huggingface.co/datasets/tasksource/zero-shot-label-nli) + - tomaarsen/natural-questions-hard-negatives + - tomaarsen/gooaq-hard-negatives + - bclavie/msmarco-500k-triplets + - sentence-transformers/msmarco-co-condenser-margin-mse-sym-mnrl-mean-v1 + - sentence-transformers/gooaq + - sentence-transformers/natural-questions + - sentence-transformers/quora-duplicates + - sentence-transformers/s2orc + - sentence-transformers/codesearchnet + - sentence-transformers/stackexchange-duplicates - **Language:** en @@ -166,8 +131,9 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [j ``` SentenceTransformer( - (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) + (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False, 'architecture': 'ModernBertModel'}) (1): Pooling({'word_embedding_dimension': 256, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) + (2): Normalize() ) ``` @@ -189,12 +155,12 @@ from sentence_transformers import SentenceTransformer model = SentenceTransformer("tasksource/ettin-17m-embed") # Run inference queries = [ - "what blood type can donate blood to all other blood types.", + "what are the three subatomic particles called?", ] documents = [ - "AB can only donate blood to other ABs. A can donate blood to As and ABs. B can donate blood to Bs and ABs. O can donate to all blood types. Negative Rh factors can donate toâ\x80¦ both positive and negative Rh factors, however postive Rh factors can only donate to other positive. So if I had O+ blood, I couldn't donate to A-, but could donate to A+.", - 'A number of Blood donors also donate platelets by apheresis (a procedure in which Blood is drawn from a Blood donor and separated into its components, some of which are retained, such as plasma or platelets, and the remainder of the Blood is returned, by transfusion, to the Blood donor; also called hemapheresis).', - 'Uluru is easily the most iconic natural landform in Australia, and its formation was equally special.The creation of Uluru and Kata Tjuta â\x80\x94 as both were formed at the same time â\x80\x94 began over 500 million years ago. At this time the big crustal blocks that form the Australian continent coming together.luru is easily the most iconic natural landform in Australia, and its formation was equally special.', + 'Subatomic particles include electrons, the negatively charged, almost massless particles that nevertheless account for most of the size of the atom, and they include the heavier building blocks of the small but very dense nucleus of the atom, the positively charged protons and the electrically neutral neutrons.', + 'Your body needs cholesterol to build healthy cells, but high levels of cholesterol can increase your risk of heart disease. With high cholesterol, you can develop fatty deposits in your blood vessels. Eventually, these deposits grow, making it difficult for enough blood to flow through your arteries.', + 'If you experience any of the following symptoms, stop taking ibuprofen and call your doctor: stomach pain, heartburn, vomit that is bloody or looks like coffee grounds, blood in the stool, or black and tarry stools. Keep all appointments with your doctor and the laboratory.', ] query_embeddings = model.encode_query(queries) document_embeddings = model.encode_document(documents) @@ -204,7 +170,7 @@ print(query_embeddings.shape, document_embeddings.shape) # Get the similarity scores for the embeddings similarities = model.similarity(query_embeddings, document_embeddings) print(similarities) -# tensor([[ 0.6392, 0.7162, -0.1267]]) +# tensor([[ 0.7121, -0.0953, 0.0591]]) ```