SentenceTransformer based on unsloth/embeddinggemma-300m

This is a sentence-transformers model finetuned from unsloth/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: unsloth/embeddinggemma-300m
Maximum Sequence Length: 2048 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (4): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gem-Software/embeddinggemma-300m-gem-v5-hyde")
# Run inference
queries = [
    'Senior Engineering Manager at Airbnb (2020-Present), Engineering Manager at Uber (2016-2020), Software Engineer at Amazon (2013-2016) | BASc Computer Engineering, University of Waterloo | Search Infrastructure, Machine Learning, Distributed Systems, Elasticsearch, Team Leadership, Ranking Algorithms, A/B Testing, Personalization, Roadmap Planning | Engineering leader with 11 years building search and discovery products at marketplace scale, currently leading a team of 20 engineers at Airbnb focused on search infrastructure and personalized ranking systems.',
]
documents = [
    'Yahoo | Technical Lead - Social Patform | Worked with Yahoo Lab to develop identity mapping algorithm to unify various social graphs like Yahoo, Facebook, Twitter and Flickr to create a unified view of a person on internet.Technologies: Hadoop,HBase, Pig | Yahoo | Principal Engineer - Conversational Assistant | Developed interactive Natural Language Understanding Platform for chatbots. Platform provides Intent classification, Entity Detection, Dialog understanding, Slot filling, Domain detection etcWave of Yahoo Bots on Kik and Facebook heavily relied on this platform for online active learning.Technologies: Spark ML, Weka, Stanford NLP, CRF++ | Amazon | Senior Engineering Manager - Alexa Video | I help build personalized voice search and discovery experience on devices like Fire TV and Echo Show. When you ask Alexa to play a video e.g. play the 83 world cup movie, play something on Netflix, play the Seahawks game or tune to oscars etc., it will be my teams behind indexing, searching and ranking to select "the" video entity you are interested to watch. Similarly, when you are in an ambient mode on these devices and see the latest season of Marvelous Mrs. Maisel, a continue watching carousal or content similar to what you have watched pop-up on your screen, its highly likely that they are developed by my team. We are invested in developing (1) real time indexing solutions to several hundred million entities (2) state of the art deep learning based information retrieval and personalization ranking solutions and (3) low latency and highly available ML services that powers millions concurrent users. | Citrix | Lead Development Engineer | Developed Web Publishing Platform for citrix.com | Yahoo | Technical Lead- User Reputation | Developed platform to compute global and category wise user reputation scores. These scores were used as signals for content personalization, comment ranking , abuse detection and customer care | Amazon | Software Development Manager- Alexa Info',
    'Esri | Product Engineering Intern |  | Georgia Institute of Technology | Student |  | ServiceNow | Application Developer | &#x2022; Full-stack development of a productivity application extension for team schedule management. Intended for production.<br> &#x2022; Designed database schema structure to efficiently handle concurrent operations for hundreds of team members<br> &#x2022; REST API development in JavaScript using ServiceNow internal tooling to support application operations<br> &#x2022; Front-end development using internal codeless platform as well as ServiceNow internal tool (SEISMIC/Tectonic) similar to React. <br> &#x2022; Also built internal tool for finding Zoom meeting timestamps with transcripts relevant to user&#x2019;s search term. | Meta | Software Engineer | Working across the stack in ads and ad delivery <br><br>- Native calling for lead generation ad products<br>- Machine learning methods for related ads <!----> | Stealth | Software Engineer | Making an AI assistant for friend groups <!----> | Georgia Tech college of computing | Teaching Assistant | As a teaching assistant for CS3510, GT&apos;s algorithm design and analysis course, I:<br>- Hold weekly office hours for students who want to better explore and understand algorithm design and analysis<br>- Host and answer discussions online pertaining to class material<br>- Grade homework and tests for 200 students <!----> | Georgia Institute of Technology | Teaching Assistant |  | Retool | Software Engineer | AI agents <!----> | SWE + AI + ML. Retool, Meta | "Once you know that you can work with purpose, it becomes hard to work without it." |  |  |  | ',
    "Schlumberger | Software Engineer | Built surface control systems for underground robots in C++. | Coinbase | Director of Engineering, Transactions |  | Airbnb | Employee Payments Software Engineer | Braintree Credit Card Tools Vault and Transaction Processing - Payments Event-based financial reporting system | Coinbase | Head of Payments Risk |  | Coinbase | Senior Director, Engineering, Trading |  | Airbnb | Engineering Manager and Payments Technical Lead | - Airbnb's price accuracy system - Airbnb's next generation payment ecosystem - Airbnb's billing system | Coinbase | Senior Engineering Manager, Risk/Payments |  | Coinbase | Director of Payments Engineering |  | Square | Senior Software Engineer | - Square Store Payments (15 months, Ruby, JavaScript) - Square Marketplace Search Infrastructure (9 months, Java) | Coinbase | engineering manager, onboarding/payments/risk |  | Coinbase | Software Engineer | Lead cross-functional teams on all matters related to fraud and risk. Also, in other engineering roles, lead payments and risk engineering teams | Software Engineer |  |  |  |  | ",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8965, 0.6802, 0.8064]])

Training Details

Training Dataset

Unnamed Dataset

Size: 21,727 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 99 tokens
mean: 276.31 tokens
max: 385 tokens

min: 24 tokens
mean: 298.5 tokens
max: 659 tokens

min: 0.0
mean: 0.52
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 99 tokens mean: 276.31 tokens max: 385 tokens	min: 24 tokens mean: 298.5 tokens max: 659 tokens	min: 0.0 mean: 0.52 max: 1.0

Samples:

sentence1	sentence2	score
`Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Berkeley Unified School DIsrict \| Educator \| \| WCCUSD \| Educator \| \| Educator at WCCUSD \| \| \| \| \|`	`0.0`
`Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Vkan Tech Solutions`	Software Developer
`Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Uber`	Senior Staff Software Engineer, TLM

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss",
    "cos_score_transformation": "torch.nn.modules.linear.Identity"
}

Evaluation Dataset

Unnamed Dataset

Size: 8,717 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 119 tokens
mean: 174.89 tokens
max: 322 tokens

min: 21 tokens
mean: 303.78 tokens
max: 538 tokens

min: 0.0
mean: 0.59
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 119 tokens mean: 174.89 tokens max: 322 tokens	min: 21 tokens mean: 303.78 tokens max: 538 tokens	min: 0.0 mean: 0.59 max: 1.0

Samples:

sentence1	sentence2	score
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	`Direct Current Co., Ltd.`	Operating Department Intern
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	Roots Industries India \| Data Science Intern \| During my internship at Roots Industries India Private Limited, I developed a robust forecasting model to predict product sales quantities for future years using Python and a dataset containing over 3 lakh records of sales data. Under my mentor's guidance, I implemented an ARIMA model for time series forecasting, leveraging its effectiveness in capturing trends and seasonality. When the ARIMA model faced performance challenges, I integrated exponential smoothing to enhance predictive accuracy. \| Student at Amrita Vishwa Vidyapeetham \| \| \| \| 5 days workshop on Cricket Analytics\|Introduction to Data Analysis using Microsoft Excel \|	`0.5`
Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	TotalEnergies \| Console Operator \| Alkylation/Cogeneration/Process Water Treatment Center \| Console Operator @ TotalEnergies \| Industrial Technology \| Experienced Console Operator; seeking a Supervisor Role. AAS Instrumentation Technology Degree (2016) BS Industrial Technology Degree (2025) MS Engineering Management (expected 2027) 8 yrs of Refinery Experience ~ 3 yrs Console Operator at TotalEnergies ~ 3 yrs Process Operator at TotalEnergies ~ 2 yrs packaging operator at Lion Elastomers \| Troubleshooting\|Sales\|Refinery\|Team Building\|Customer Service\|Social Media\|Strategic Planning\|Maintenance Management\|Petroleum\|Engineering\|Calibration\|Inspection\|Electricians\|Commissioning\|Electronics\|Maintenance\|Microsoft Office\|Microsoft Word\|Veterans\|Leadership\|Maintenance & Repair \| \| \|	`0.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss",
    "cos_score_transformation": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 32
learning_rate: 2e-05
warmup_steps: 0.1
gradient_accumulation_steps: 2
bf16: True
gradient_checkpointing: True
eval_strategy: steps
per_device_eval_batch_size: 32
load_best_model_at_end: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

per_device_train_batch_size: 32
num_train_epochs: 3
max_steps: -1
learning_rate: 2e-05
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_steps: 0.1
optim: adamw_torch
optim_args: None
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
optim_target_modules: None
gradient_accumulation_steps: 2
average_tokens_across_devices: True
max_grad_norm: 1.0
label_smoothing_factor: 0.0
bf16: True
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
use_liger_kernel: False
liger_kernel_config: None
use_cache: False
neftune_noise_alpha: None
torch_empty_cache_steps: None
auto_find_batch_size: False
log_on_each_node: True
logging_nan_inf_filter: True
include_num_input_tokens_seen: no
log_level: passive
log_level_replica: warning
disable_tqdm: False
project: huggingface
trackio_space_id: trackio
eval_strategy: steps
per_device_eval_batch_size: 32
prediction_loss_only: True
eval_on_start: False
eval_do_concat_batches: True
eval_use_gather_object: False
eval_accumulation_steps: None
include_for_metrics: []
batch_eval_metrics: False
save_only_model: False
save_on_each_node: False
enable_jit_checkpoint: False
push_to_hub: False
hub_private_repo: None
hub_model_id: None
hub_strategy: every_save
hub_always_push: False
hub_revision: None
load_best_model_at_end: True
ignore_data_skip: False
restore_callback_states_from_checkpoint: False
full_determinism: False
seed: 42
data_seed: None
use_cpu: False
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_pin_memory: True
dataloader_persistent_workers: False
dataloader_prefetch_factor: None
remove_unused_columns: True
label_names: None
train_sampling_strategy: random
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
ddp_backend: None
ddp_timeout: 1800
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
deepspeed: None
debug: []
skip_memory_metrics: True
do_predict: False
resume_from_checkpoint: None
warmup_ratio: None
local_rank: -1
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss
0.0736	25	0.0704	-
0.1473	50	0.0403	-
0.2209	75	0.0403	-
0.2946	100	0.0384	0.0401
0.3682	125	0.0365	-
0.4418	150	0.0367	-
0.5155	175	0.0343	-
0.5891	200	0.0344	0.0360
0.6627	225	0.0348	-
0.7364	250	0.0322	-
0.8100	275	0.0328	-
0.8837	300	0.0305	0.0353
0.9573	325	0.0327	-
1.0295	350	0.0327	-
1.1031	375	0.0256	-
1.1767	400	0.0248	0.0358
1.2504	425	0.0239	-
1.3240	450	0.0255	-
1.3976	475	0.0229	-
1.4713	500	0.0246	0.0341
1.5449	525	0.0239	-
1.6186	550	0.0213	-
1.6922	575	0.0230	-
1.7658	600	0.0223	0.0328
1.8395	625	0.0212	-
1.9131	650	0.0208	-
1.9867	675	0.0255	-
2.0589	700	0.0192	0.0376
2.1325	725	0.0154	-
2.2062	750	0.0147	-
2.2798	775	0.0143	-
2.3535	800	0.0128	0.0326
2.4271	825	0.0127	-
2.5007	850	0.0131	-
2.5744	875	0.0130	-
2.6480	900	0.0137	0.0328
2.7216	925	0.0139	-
2.7953	950	0.0129	-
2.8689	975	0.0126	-
2.9426	1000	0.0126	0.0324
3.0	1020	-	0.0324

The bold row denotes the saved checkpoint.

Training Time

Training: 1.1 hours
Evaluation: 32.4 minutes
Total: 1.6 hours

Framework Versions

Python: 3.11.10
Sentence Transformers: 5.4.0
Transformers: 5.5.3
PyTorch: 2.6.0+cu124
Accelerate: 1.13.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 28

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for Gem-Software/embeddinggemma-300m-gem-v5-hyde

Base model

unsloth/embeddinggemma-300m

Finetuned

(18)

this model

Paper for Gem-Software/embeddinggemma-300m-gem-v5-hyde

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12