SentenceTransformer based on unsloth/embeddinggemma-300m

This is a sentence-transformers model finetuned from unsloth/embeddinggemma-300m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: unsloth/embeddinggemma-300m
  • Maximum Sequence Length: 2048 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'embedding_dimension': 768, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
  (4): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gem-Software/embeddinggemma-300m-gem-v5-hyde")
# Run inference
queries = [
    'Senior Engineering Manager at Airbnb (2020-Present), Engineering Manager at Uber (2016-2020), Software Engineer at Amazon (2013-2016) | BASc Computer Engineering, University of Waterloo | Search Infrastructure, Machine Learning, Distributed Systems, Elasticsearch, Team Leadership, Ranking Algorithms, A/B Testing, Personalization, Roadmap Planning | Engineering leader with 11 years building search and discovery products at marketplace scale, currently leading a team of 20 engineers at Airbnb focused on search infrastructure and personalized ranking systems.',
]
documents = [
    'Yahoo | Technical Lead - Social Patform | Worked with Yahoo Lab to develop identity mapping algorithm to unify various social graphs like Yahoo, Facebook, Twitter and Flickr to create a unified view of a person on internet.Technologies: Hadoop,HBase, Pig | Yahoo | Principal Engineer - Conversational Assistant | Developed interactive Natural Language Understanding Platform for chatbots. Platform provides Intent classification, Entity Detection, Dialog understanding, Slot filling, Domain detection etcWave of Yahoo Bots on Kik and Facebook heavily relied on this platform for online active learning.Technologies: Spark ML, Weka, Stanford NLP, CRF++ | Amazon | Senior Engineering Manager - Alexa Video | I help build personalized voice search and discovery experience on devices like Fire TV and Echo Show. When you ask Alexa to play a video e.g. play the 83 world cup movie, play something on Netflix, play the Seahawks game or tune to oscars etc., it will be my teams behind indexing, searching and ranking to select "the" video entity you are interested to watch. Similarly, when you are in an ambient mode on these devices and see the latest season of Marvelous Mrs. Maisel, a continue watching carousal or content similar to what you have watched pop-up on your screen, its highly likely that they are developed by my team. We are invested in developing (1) real time indexing solutions to several hundred million entities (2) state of the art deep learning based information retrieval and personalization ranking solutions and (3) low latency and highly available ML services that powers millions concurrent users. | Citrix | Lead Development Engineer | Developed Web Publishing Platform for citrix.com | Yahoo | Technical Lead- User Reputation | Developed platform to compute global and category wise user reputation scores. These scores were used as signals for content personalization, comment ranking , abuse detection and customer care | Amazon | Software Development Manager- Alexa Info',
    'Esri | Product Engineering Intern |  | Georgia Institute of Technology | Student |  | ServiceNow | Application Developer | &#x2022; Full-stack development of a productivity application extension for team schedule management. Intended for production.<br> &#x2022; Designed database schema structure to efficiently handle concurrent operations for hundreds of team members<br> &#x2022; REST API development in JavaScript using ServiceNow internal tooling to support application operations<br> &#x2022; Front-end development using internal codeless platform as well as ServiceNow internal tool (SEISMIC/Tectonic) similar to React. <br> &#x2022; Also built internal tool for finding Zoom meeting timestamps with transcripts relevant to user&#x2019;s search term. | Meta | Software Engineer | Working across the stack in ads and ad delivery <br><br>- Native calling for lead generation ad products<br>- Machine learning methods for related ads <!----> | Stealth | Software Engineer | Making an AI assistant for friend groups <!----> | Georgia Tech college of computing | Teaching Assistant | As a teaching assistant for CS3510, GT&apos;s algorithm design and analysis course, I:<br>- Hold weekly office hours for students who want to better explore and understand algorithm design and analysis<br>- Host and answer discussions online pertaining to class material<br>- Grade homework and tests for 200 students <!----> | Georgia Institute of Technology | Teaching Assistant |  | Retool | Software Engineer | AI agents <!----> | SWE + AI + ML. Retool, Meta | "Once you know that you can work with purpose, it becomes hard to work without it." |  |  |  | ',
    "Schlumberger | Software Engineer | Built surface control systems for underground robots in C++. | Coinbase | Director of Engineering, Transactions |  | Airbnb | Employee Payments Software Engineer | Braintree Credit Card Tools Vault and Transaction Processing - Payments Event-based financial reporting system | Coinbase | Head of Payments Risk |  | Coinbase | Senior Director, Engineering, Trading |  | Airbnb | Engineering Manager and Payments Technical Lead | - Airbnb's price accuracy system - Airbnb's next generation payment ecosystem - Airbnb's billing system | Coinbase | Senior Engineering Manager, Risk/Payments |  | Coinbase | Director of Payments Engineering |  | Square | Senior Software Engineer | - Square Store Payments (15 months, Ruby, JavaScript) - Square Marketplace Search Infrastructure (9 months, Java) | Coinbase | engineering manager, onboarding/payments/risk |  | Coinbase | Software Engineer | Lead cross-functional teams on all matters related to fraud and risk. Also, in other engineering roles, lead payments and risk engineering teams | Software Engineer |  |  |  |  | ",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8965, 0.6802, 0.8064]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 21,727 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 99 tokens
    • mean: 276.31 tokens
    • max: 385 tokens
    • min: 24 tokens
    • mean: 298.5 tokens
    • max: 659 tokens
    • min: 0.0
    • mean: 0.52
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with Berkeley Unified School DIsrict | Educator | | WCCUSD | Educator | | Educator at WCCUSD | | | | | 0.0
    Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with Vkan Tech Solutions Software Developer
    Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) | BS Computer Science, UC Berkeley | React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership | Staff-level Full Stack Engineer with Uber Senior Staff Software Engineer, TLM
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss",
        "cos_score_transformation": "torch.nn.modules.linear.Identity"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 8,717 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 119 tokens
    • mean: 174.89 tokens
    • max: 322 tokens
    • min: 21 tokens
    • mean: 303.78 tokens
    • max: 538 tokens
    • min: 0.0
    • mean: 0.59
    • max: 1.0
  • Samples:
    sentence1 sentence2 score
    Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. Direct Current Co., Ltd. Operating Department Intern
    Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. Roots Industries India | Data Science Intern | During my internship at Roots Industries India Private Limited, I developed a robust forecasting model to predict product sales quantities for future years using Python and a dataset containing over 3 lakh records of sales data. Under my mentor's guidance, I implemented an ARIMA model for time series forecasting, leveraging its effectiveness in capturing trends and seasonality. When the ARIMA model faced performance challenges, I integrated exponential smoothing to enhance predictive accuracy. | Student at Amrita Vishwa Vidyapeetham | | | | 5 days workshop on Cricket Analytics|Introduction to Data Analysis using Microsoft Excel | 0.5
    Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) | MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst | Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS | Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management. TotalEnergies | Console Operator | Alkylation/Cogeneration/Process Water Treatment Center | Console Operator @ TotalEnergies | Industrial Technology | Experienced Console Operator; seeking a Supervisor Role.

    AAS Instrumentation Technology Degree (2016)
    BS Industrial Technology Degree (2025)
    MS Engineering Management (expected 2027)


    8 yrs of Refinery Experience

    ~ 3 yrs Console Operator at TotalEnergies
    ~ 3 yrs Process Operator at TotalEnergies
    ~ 2 yrs packaging operator at Lion Elastomers | Troubleshooting|Sales|Refinery|Team Building|Customer Service|Social Media|Strategic Planning|Maintenance Management|Petroleum|Engineering|Calibration|Inspection|Electricians|Commissioning|Electronics|Maintenance|Microsoft Office|Microsoft Word|Veterans|Leadership|Maintenance & Repair | | |
    0.0
  • Loss: CosineSimilarityLoss with these parameters:
    {
        "loss_fct": "torch.nn.modules.loss.MSELoss",
        "cos_score_transformation": "torch.nn.modules.linear.Identity"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • learning_rate: 2e-05
  • warmup_steps: 0.1
  • gradient_accumulation_steps: 2
  • bf16: True
  • gradient_checkpointing: True
  • eval_strategy: steps
  • per_device_eval_batch_size: 32
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 3
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 2
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 32
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.0736 25 0.0704 -
0.1473 50 0.0403 -
0.2209 75 0.0403 -
0.2946 100 0.0384 0.0401
0.3682 125 0.0365 -
0.4418 150 0.0367 -
0.5155 175 0.0343 -
0.5891 200 0.0344 0.0360
0.6627 225 0.0348 -
0.7364 250 0.0322 -
0.8100 275 0.0328 -
0.8837 300 0.0305 0.0353
0.9573 325 0.0327 -
1.0295 350 0.0327 -
1.1031 375 0.0256 -
1.1767 400 0.0248 0.0358
1.2504 425 0.0239 -
1.3240 450 0.0255 -
1.3976 475 0.0229 -
1.4713 500 0.0246 0.0341
1.5449 525 0.0239 -
1.6186 550 0.0213 -
1.6922 575 0.0230 -
1.7658 600 0.0223 0.0328
1.8395 625 0.0212 -
1.9131 650 0.0208 -
1.9867 675 0.0255 -
2.0589 700 0.0192 0.0376
2.1325 725 0.0154 -
2.2062 750 0.0147 -
2.2798 775 0.0143 -
2.3535 800 0.0128 0.0326
2.4271 825 0.0127 -
2.5007 850 0.0131 -
2.5744 875 0.0130 -
2.6480 900 0.0137 0.0328
2.7216 925 0.0139 -
2.7953 950 0.0129 -
2.8689 975 0.0126 -
2.9426 1000 0.0126 0.0324
3.0 1020 - 0.0324
  • The bold row denotes the saved checkpoint.

Training Time

  • Training: 1.1 hours
  • Evaluation: 32.4 minutes
  • Total: 1.6 hours

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 5.4.0
  • Transformers: 5.5.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
Downloads last month
28
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Gem-Software/embeddinggemma-300m-gem-v5-hyde

Finetuned
(18)
this model

Paper for Gem-Software/embeddinggemma-300m-gem-v5-hyde