Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from unsloth/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'Gemma3TextModel'})
(1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})
(2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
(3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
(4): Normalize({})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Gem-Software/embeddinggemma-300m-gem-v5-hyde")
# Run inference
queries = [
'Senior Engineering Manager at Airbnb (2020-Present), Engineering Manager at Uber (2016-2020), Software Engineer at Amazon (2013-2016) | BASc Computer Engineering, University of Waterloo | Search Infrastructure, Machine Learning, Distributed Systems, Elasticsearch, Team Leadership, Ranking Algorithms, A/B Testing, Personalization, Roadmap Planning | Engineering leader with 11 years building search and discovery products at marketplace scale, currently leading a team of 20 engineers at Airbnb focused on search infrastructure and personalized ranking systems.',
]
documents = [
'Yahoo | Technical Lead - Social Patform | Worked with Yahoo Lab to develop identity mapping algorithm to unify various social graphs like Yahoo, Facebook, Twitter and Flickr to create a unified view of a person on internet.Technologies: Hadoop,HBase, Pig | Yahoo | Principal Engineer - Conversational Assistant | Developed interactive Natural Language Understanding Platform for chatbots. Platform provides Intent classification, Entity Detection, Dialog understanding, Slot filling, Domain detection etcWave of Yahoo Bots on Kik and Facebook heavily relied on this platform for online active learning.Technologies: Spark ML, Weka, Stanford NLP, CRF++ | Amazon | Senior Engineering Manager - Alexa Video | I help build personalized voice search and discovery experience on devices like Fire TV and Echo Show. When you ask Alexa to play a video e.g. play the 83 world cup movie, play something on Netflix, play the Seahawks game or tune to oscars etc., it will be my teams behind indexing, searching and ranking to select "the" video entity you are interested to watch. Similarly, when you are in an ambient mode on these devices and see the latest season of Marvelous Mrs. Maisel, a continue watching carousal or content similar to what you have watched pop-up on your screen, its highly likely that they are developed by my team. We are invested in developing (1) real time indexing solutions to several hundred million entities (2) state of the art deep learning based information retrieval and personalization ranking solutions and (3) low latency and highly available ML services that powers millions concurrent users. | Citrix | Lead Development Engineer | Developed Web Publishing Platform for citrix.com | Yahoo | Technical Lead- User Reputation | Developed platform to compute global and category wise user reputation scores. These scores were used as signals for content personalization, comment ranking , abuse detection and customer care | Amazon | Software Development Manager- Alexa Info',
'Esri | Product Engineering Intern | | Georgia Institute of Technology | Student | | ServiceNow | Application Developer | • Full-stack development of a productivity application extension for team schedule management. Intended for production.<br> • Designed database schema structure to efficiently handle concurrent operations for hundreds of team members<br> • REST API development in JavaScript using ServiceNow internal tooling to support application operations<br> • Front-end development using internal codeless platform as well as ServiceNow internal tool (SEISMIC/Tectonic) similar to React. <br> • Also built internal tool for finding Zoom meeting timestamps with transcripts relevant to user’s search term. | Meta | Software Engineer | Working across the stack in ads and ad delivery <br><br>- Native calling for lead generation ad products<br>- Machine learning methods for related ads <!----> | Stealth | Software Engineer | Making an AI assistant for friend groups <!----> | Georgia Tech college of computing | Teaching Assistant | As a teaching assistant for CS3510, GT's algorithm design and analysis course, I:<br>- Hold weekly office hours for students who want to better explore and understand algorithm design and analysis<br>- Host and answer discussions online pertaining to class material<br>- Grade homework and tests for 200 students <!----> | Georgia Institute of Technology | Teaching Assistant | | Retool | Software Engineer | AI agents <!----> | SWE + AI + ML. Retool, Meta | "Once you know that you can work with purpose, it becomes hard to work without it." | | | | ',
"Schlumberger | Software Engineer | Built surface control systems for underground robots in C++. | Coinbase | Director of Engineering, Transactions | | Airbnb | Employee Payments Software Engineer | Braintree Credit Card Tools Vault and Transaction Processing - Payments Event-based financial reporting system | Coinbase | Head of Payments Risk | | Coinbase | Senior Director, Engineering, Trading | | Airbnb | Engineering Manager and Payments Technical Lead | - Airbnb's price accuracy system - Airbnb's next generation payment ecosystem - Airbnb's billing system | Coinbase | Senior Engineering Manager, Risk/Payments | | Coinbase | Director of Payments Engineering | | Square | Senior Software Engineer | - Square Store Payments (15 months, Ruby, JavaScript) - Square Marketplace Search Infrastructure (9 months, Java) | Coinbase | engineering manager, onboarding/payments/risk | | Coinbase | Software Engineer | Lead cross-functional teams on all matters related to fraud and risk. Also, in other engineering roles, lead payments and risk engineering teams | Software Engineer | | | | | ",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8965, 0.6802, 0.8064]])
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with |
Berkeley Unified School DIsrict | Educator | | WCCUSD | Educator | | Educator at WCCUSD | | | | | |
0.0 |
Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with |
Vkan Tech Solutions |
Software Developer |
Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with |
Uber |
Senior Staff Software Engineer, TLM |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss",
"cos_score_transformation": "torch.nn.modules.linear.Identity"
}
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. |
Direct Current Co., Ltd. |
Operating Department Intern |
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. |
Roots Industries India | Data Science Intern | During my internship at Roots Industries India Private Limited, I developed a robust forecasting model to predict product sales quantities for future years using Python and a dataset containing over 3 lakh records of sales data. Under my mentor's guidance, I implemented an ARIMA model for time series forecasting, leveraging its effectiveness in capturing trends and seasonality. When the ARIMA model faced performance challenges, I integrated exponential smoothing to enhance predictive accuracy. | Student at Amrita Vishwa Vidyapeetham | | | | 5 days workshop on Cricket Analytics|Introduction to Data Analysis using Microsoft Excel | |
0.5 |
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. |
TotalEnergies | Console Operator | Alkylation/Cogeneration/Process Water Treatment Center | Console Operator @ TotalEnergies | Industrial Technology | Experienced Console Operator; seeking a Supervisor Role. |
0.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss",
"cos_score_transformation": "torch.nn.modules.linear.Identity"
}
per_device_train_batch_size: 32learning_rate: 2e-05warmup_steps: 0.1gradient_accumulation_steps: 2bf16: Truegradient_checkpointing: Trueeval_strategy: stepsper_device_eval_batch_size: 32load_best_model_at_end: Truebatch_sampler: no_duplicatesper_device_train_batch_size: 32num_train_epochs: 3max_steps: -1learning_rate: 2e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torchoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 2average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 32prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0736 | 25 | 0.0704 | - |
| 0.1473 | 50 | 0.0403 | - |
| 0.2209 | 75 | 0.0403 | - |
| 0.2946 | 100 | 0.0384 | 0.0401 |
| 0.3682 | 125 | 0.0365 | - |
| 0.4418 | 150 | 0.0367 | - |
| 0.5155 | 175 | 0.0343 | - |
| 0.5891 | 200 | 0.0344 | 0.0360 |
| 0.6627 | 225 | 0.0348 | - |
| 0.7364 | 250 | 0.0322 | - |
| 0.8100 | 275 | 0.0328 | - |
| 0.8837 | 300 | 0.0305 | 0.0353 |
| 0.9573 | 325 | 0.0327 | - |
| 1.0295 | 350 | 0.0327 | - |
| 1.1031 | 375 | 0.0256 | - |
| 1.1767 | 400 | 0.0248 | 0.0358 |
| 1.2504 | 425 | 0.0239 | - |
| 1.3240 | 450 | 0.0255 | - |
| 1.3976 | 475 | 0.0229 | - |
| 1.4713 | 500 | 0.0246 | 0.0341 |
| 1.5449 | 525 | 0.0239 | - |
| 1.6186 | 550 | 0.0213 | - |
| 1.6922 | 575 | 0.0230 | - |
| 1.7658 | 600 | 0.0223 | 0.0328 |
| 1.8395 | 625 | 0.0212 | - |
| 1.9131 | 650 | 0.0208 | - |
| 1.9867 | 675 | 0.0255 | - |
| 2.0589 | 700 | 0.0192 | 0.0376 |
| 2.1325 | 725 | 0.0154 | - |
| 2.2062 | 750 | 0.0147 | - |
| 2.2798 | 775 | 0.0143 | - |
| 2.3535 | 800 | 0.0128 | 0.0326 |
| 2.4271 | 825 | 0.0127 | - |
| 2.5007 | 850 | 0.0131 | - |
| 2.5744 | 875 | 0.0130 | - |
| 2.6480 | 900 | 0.0137 | 0.0328 |
| 2.7216 | 925 | 0.0139 | - |
| 2.7953 | 950 | 0.0129 | - |
| 2.8689 | 975 | 0.0126 | - |
| 2.9426 | 1000 | 0.0126 | 0.0324 |
| 3.0 | 1020 | - | 0.0324 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
unsloth/embeddinggemma-300m