--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:4122 - loss:MultipleNegativesRankingLoss base_model: sentence-transformers/all-MiniLM-L6-v2 widget: - source_sentence: Environment Minister Greg Hunt the Coalition's emissions reduction fund, at $13.95 per tonne of carbon, is around 1 per cent of the cost of reducing carbon under the former Labor government's carbon pricing scheme, which he cost $1,300 a tonne. sentences: - Sirius's heliacal rising, just before the start of the Nile flood, gave Sopdet a close connection with the flood and the resulting growth of plants. - The proposal would have set an emissions price of NZ$15 per tonne of CO2-equivalent. - '"More recently, evaporation over lakes has steadily been increasing, largely due to increases in water surface temperature," Gronewold said.' - source_sentence: “In 2013 the level of U.S. farm output was about 2.7 times its 1948 level, and productivity was growing at an average annual rate of 1.52%. sentences: - As the concentration of carbon dioxide increases in the atmosphere, the increased uptake of carbon dioxide into the oceans is causing a measurable decrease in the pH of the oceans, which is referred to as ocean acidification. - The IPCC was tasked with reviewing peer-reviewed scientific literature and other relevant publications to provide information on the state of knowledge about climate change. - Private sector productivity growth, measured as real output per hour of all persons, increased at an average rate of 1.9% during Reagan's eight years, compared to an average 1.3% during the preceding eight years. - source_sentence: '''Phil Jones said that for the past 15 years there has been no "statistically significant" warming.' sentences: - From this, he concluded that "The post-1980 global warming trend from surface thermometers is not credible. - Fox News has widely been described as a major platform for climate change denial. - In comparison to the extended record, the sea-ice extent in the polar region by September 2007 was only half the recorded mass that had been estimated to exist within the 1950–1970 period. - source_sentence: '"NASA satellite data from the years 2000 through 2011 show the Earth''s atmosphere is allowing far more heat to be released into space than alarmist computer models have predicted, reports a new study in the peer-reviewed science journal Remote Sensing.' sentences: - The Lamont–Doherty Earth Observatory at Columbia University is one of the world's leading research centers developing fundamental knowledge about the origin, evolution and future of the natural world. - Mann said, "Ten years ago, the availability of data became quite sparse by the time you got back to 1,000 AD, and what we had then was weighted towards tree-ring data; but now you can go back 1,300 years without using tree-ring data at all and still get a verifiable conclusion." - This premature announcement came from a preliminary news release about a study which had not yet been peer reviewed. - source_sentence: '...there [is] anecdotal and other evidence suggesting similar melts from 1938-43 and on other occasions.' sentences: - They were formed by the melting of sulfur deposits at temperatures as low as 113 °C (235 °F). - For example, in the study of the origin of the earth, one can reasonably model earth's mass, temperature, and rate of rotation, as a function of time allowing one to extrapolate forward or backward in time and so predict future or prior events. - Consequently, summers are 2.3 °C (4 °F) warmer in the Northern Hemisphere than in the Southern Hemisphere under similar conditions. pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 results: - task: type: information-retrieval name: Information Retrieval dataset: name: claims dev type: claims-dev metrics: - type: cosine_accuracy@1 value: 0.24025974025974026 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.44155844155844154 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.5454545454545454 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.6818181818181818 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.24025974025974026 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.19047619047619044 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.15454545454545457 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.10714285714285714 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.09577922077922078 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.21482683982683978 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.27532467532467536 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.36612554112554113 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.2932326612195408 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.3742553081838797 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.23004915088757852 name: Cosine Map@100 --- # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2 This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) - **Maximum Sequence Length:** 256 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity - **Supported Modality:** Text ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/huggingface/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'}) (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True}) (2): Normalize({}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("jmroth/my-awesome-model") # Run inference sentences = [ '...there [is] anecdotal and other evidence suggesting similar melts from 1938-43 and on other occasions.', 'They were formed by the melting of sulfur deposits at temperatures as low as 113\xa0°C (235\xa0°F).', 'Consequently, summers are 2.3\xa0°C (4\xa0°F) warmer in the Northern Hemisphere than in the Southern Hemisphere under similar conditions.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.4966, 0.1535], # [0.4966, 1.0000, 0.3254], # [0.1535, 0.3254, 1.0000]]) ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `claims-dev` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.sentence_transformer.evaluation.InformationRetrievalEvaluator) | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.2403 | | cosine_accuracy@3 | 0.4416 | | cosine_accuracy@5 | 0.5455 | | cosine_accuracy@10 | 0.6818 | | cosine_precision@1 | 0.2403 | | cosine_precision@3 | 0.1905 | | cosine_precision@5 | 0.1545 | | cosine_precision@10 | 0.1071 | | cosine_recall@1 | 0.0958 | | cosine_recall@3 | 0.2148 | | cosine_recall@5 | 0.2753 | | cosine_recall@10 | 0.3661 | | **cosine_ndcg@10** | **0.2932** | | cosine_mrr@10 | 0.3743 | | cosine_map@100 | 0.23 | ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 4,122 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:----------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life. | At very high concentrations (100 times atmospheric concentration, or greater), carbon dioxide can be toxic to animal life, so raising the concentration to 10,000 ppm (1%) or higher for several hours will eliminate pests such as whiteflies and spider mites in a greenhouse. | | Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life. | Plants can grow as much as 50 percent faster in concentrations of 1,000 ppm CO 2 when compared with ambient conditions, though this assumes no change in climate and no limitation on other nutrients. | | Not only is there no scientific evidence that CO2 is a pollutant, higher CO2 concentrations actually help ecosystems support more plant and animal life. | Higher carbon dioxide concentrations will favourably affect plant growth and demand for water. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim", "gather_across_devices": false, "directions": [ "query_to_doc" ], "partition_mode": "joint", "hardness_mode": null, "hardness_strength": 0.0 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 128 - `learning_rate`: 2e-05 - `weight_decay`: 0.01 - `warmup_steps`: 0.1 - `fp16`: True - `load_best_model_at_end`: True - `push_to_hub`: True - `hub_model_id`: jmroth/nlp-biencoder-finetuned - `hub_strategy`: end - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `do_predict`: False - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 128 - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.01 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 3 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: None - `warmup_ratio`: None - `warmup_steps`: 0.1 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `enable_jit_checkpoint`: False - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `use_cpu`: False - `seed`: 42 - `data_seed`: None - `bf16`: False - `fp16`: True - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: -1 - `ddp_backend`: None - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `group_by_length`: False - `length_column_name`: length - `project`: huggingface - `trackio_space_id`: trackio - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `push_to_hub`: True - `resume_from_checkpoint`: None - `hub_model_id`: jmroth/nlp-biencoder-finetuned - `hub_strategy`: end - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `auto_find_batch_size`: False - `full_determinism`: False - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_num_input_tokens_seen`: no - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: True - `use_cache`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | claims-dev_cosine_ndcg@10 | |:----------:|:-------:|:-------------:|:-------------------------:| | 0.0775 | 10 | 1.4212 | - | | 0.1550 | 20 | 1.4229 | - | | 0.2326 | 30 | 1.1129 | - | | 0.3101 | 40 | 0.9966 | - | | 0.3876 | 50 | 0.9207 | 0.2829 | | 0.4651 | 60 | 0.8326 | - | | 0.5426 | 70 | 0.8989 | - | | 0.6202 | 80 | 0.9630 | - | | 0.6977 | 90 | 0.8394 | - | | 0.7752 | 100 | 0.8764 | 0.2893 | | 0.8527 | 110 | 0.8208 | - | | 0.9302 | 120 | 0.7684 | - | | 1.0078 | 130 | 0.7049 | - | | 1.0853 | 140 | 0.7378 | - | | 1.1628 | 150 | 0.6265 | 0.2941 | | 1.2403 | 160 | 0.6832 | - | | 1.3178 | 170 | 0.6365 | - | | 1.3953 | 180 | 0.5991 | - | | 1.4729 | 190 | 0.5456 | - | | **1.5504** | **200** | **0.6355** | **0.2943** | | 1.6279 | 210 | 0.5927 | - | | 1.7054 | 220 | 0.7117 | - | | 1.7829 | 230 | 0.5096 | - | | 1.8605 | 240 | 0.6036 | - | | 1.9380 | 250 | 0.6768 | 0.2896 | | 2.0155 | 260 | 0.6589 | - | | 2.0930 | 270 | 0.5436 | - | | 2.1705 | 280 | 0.5173 | - | | 2.2481 | 290 | 0.5544 | - | | 2.3256 | 300 | 0.5583 | 0.2911 | | 2.4031 | 310 | 0.5903 | - | | 2.4806 | 320 | 0.5265 | - | | 2.5581 | 330 | 0.5107 | - | | 2.6357 | 340 | 0.6144 | - | | 2.7132 | 350 | 0.5175 | 0.2932 | | 2.7907 | 360 | 0.5805 | - | | 2.8682 | 370 | 0.5299 | - | | 2.9457 | 380 | 0.5621 | - | * The bold row denotes the saved checkpoint. ### Training Time - **Training**: 32.6 minutes ### Framework Versions - Python: 3.12.13 - Sentence Transformers: 5.4.1 - Transformers: 5.0.0 - PyTorch: 2.10.0+cu128 - Accelerate: 1.13.0 - Datasets: 4.0.0 - Tokenizers: 0.22.2 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{oord2019representationlearningcontrastivepredictive, title={Representation Learning with Contrastive Predictive Coding}, author={Aaron van den Oord and Yazhe Li and Oriol Vinyals}, year={2019}, eprint={1807.03748}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/1807.03748}, } ```