SentenceTransformer based on google/embeddinggemma-300m

This is a sentence-transformers model finetuned from google/embeddinggemma-300m on the nz-hansard-triplets dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: google/embeddinggemma-300m
Maximum Sequence Length: 2048 tokens
Output Dimensionality: 768 dimensions
Similarity Function: Cosine Similarity
Training Dataset:
- nz-hansard-triplets

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 2048, 'do_lower_case': False, 'architecture': 'Gemma3TextModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 3072, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Dense({'in_features': 3072, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (4): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dinushiTJ/nz-hansard-embedding-gemma-zeroshot")
# Run inference
queries = [
    "non_maori_origin",
]
documents = [
    'The significance lies in the fundamental activities of government: ownership, expenditure, and the regulation of private property. While our oversight mechanisms for state-owned assets are reasonably robust, despite frequently yielding unsatisfactory returns—indeed, the government often proves to be an inefficient proprietor, yet its performance is adequately reported, highlighting deficiencies. Furthermore, our framework for monitoring public spending is globally recognised as exemplary, largely due to the Fiscal Responsibility Act, now integrated into the Public Finance Act. This ensures comprehensive scrutiny of governmental outlays. Although some may argue spending is excessive, sound fiscal regulations foster a strong political aversion to deficit-running administrations, owing to enhanced transparency.',
    "It matters profoundly because Government's activities—ownership, spending, and regulation—often intersect with Māori property, including whenua, taonga, and intellectual property, which it neither owns nor has a right to tax without Treaty partnership. While there is some oversight for Crown ownership of assets, the returns for Māori on Treaty settlements or co-governance arrangements are often disappointing, reflecting a failure to uphold rangatiratanga. We have some regimes for oversight of Government expenditure, but these often lack specific mechanisms to ensure equitable distribution or Treaty-consistent investment in Māori development. There is a strong need for fiscal rules that explicitly account for Treaty obligations and ensure transparency in spending that impacts Māori, fostering accountability for outcomes for tangata whenua.",
    "Consequently, this initiative serves to align New Zealand's parliamentary procedures with international best practices and comparable systems globally. Experience from other jurisdictions where this approach has been piloted or adopted indicates a frequent rise in petition submissions. Furthermore, it has consistently highlighted to legislative bodies—irrespective of their global location—matters that, while perhaps not central to the legislators' immediate focus, are undeniably paramount for a substantial portion of the citizenry. Thus, this will unequivocally strengthen our nation's democratic framework.",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.8994, -0.7277,  0.9108]])

Evaluation

Metrics

Triplet

Dataset: nz-hansard-triplet-eval
Evaluated with TripletEvaluator

Metric	Value
cosine_accuracy	1.0

Training Details

Training Dataset

nz-hansard-triplets

Dataset: nz-hansard-triplets at f9d1f78
Size: 2,770 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 6 tokens mean: 7.73 tokens max: 8 tokens	min: 57 tokens mean: 136.89 tokens max: 374 tokens	min: 76 tokens mean: 139.27 tokens max: 266 tokens

Samples:

anchor	positive	negative
`non_maori_origin`	A primary objective of these legislative changes involves modifying the Court Security Act of 1999, specifically by broadening the authority of court security personnel. This expanded mandate permits them to refuse admission, remove, and hold individuals found with illicit substances, or those exhibiting aggressive or abusive behaviour, or committing minor infractions within court facilities. This aspect of the legislation is crucial, as it provides explicit guidelines and specific powers for security officers to enforce order within the court environment. It grants them the necessary discretion to intervene, whether proceedings are active or not, against individuals bringing prohibited items or drugs into the courts, thereby ensuring swift and proper resolution of such incidents. Furthermore, the bill refines the legal definition of 'court' and clarifies the geographical scope within which these powers can be exercised.	A critical aspect of ensuring justice for all involves addressing the cultural safety and appropriate engagement of Māori within court settings. This legislation should have considered specific protocols for court security officers when interacting with Māori individuals, particularly those who may be unfamiliar with the Pākehā justice system or who are experiencing cultural distress. It is vital to ensure that security measures do not inadvertently create barriers or exacerbate existing inequities for Māori. This includes training for officers on Te Reo Māori, tikanga, and the historical context of Māori interactions with the justice system, to prevent misunderstandings and ensure respectful treatment. Furthermore, the definition of 'court premises' should acknowledge areas where Māori cultural practices, such as karakia or waiata, might occur, ensuring these are accommodated respectfully within security frameworks, rather than being seen as disruptive.
`non_maori_origin`	Perhaps the most compelling aspect of the Tribunals Powers and Procedures Legislation Bill concerns the Human Rights Review Tribunal, where the compelling submissions from its chairperson, Mr Rodger Haines QC, warrant particular acknowledgement. I commend Mr Haines for his valuable input. As highlighted by Mr Haines, the Human Rights Act 1993's structural limitations have led to a substantial accumulation of cases over recent years. This backlog, and the resulting frustration, is evident in media reports detailing how individuals seeking to uphold their human rights endure 'unacceptable' delays of two to three years for resolution, despite persistent calls for legislative reform. Mr Haines' submission further revealed that for several years, the tribunal's entire workload, intended for five full-time decision-makers, has been managed by the chairperson alone. Consequently, a severe backlog continues to grow annually, rendering the tribunal effectively non-functional for many parties. T...	A critical concern within the human rights framework is the persistent challenge Māori face in accessing justice for breaches of their Treaty rights and cultural protections. The Human Rights Review Tribunal, while vital, often struggles to adequately address the unique dimensions of Māori human rights, which are intrinsically linked to Te Tiriti o Waitangi. The existing backlog disproportionately affects Māori claimants, who may already face systemic barriers in navigating the Pākehā legal system. Future legislative reforms must specifically consider how to enhance the tribunal's capacity to hear and resolve cases involving Māori cultural rights, land rights, and the Crown's Treaty obligations. This includes ensuring culturally competent processes, the availability of Te Reo Māori services, and a deeper understanding of tikanga within the tribunal's operations, to ensure that justice delayed is not justice denied for Māori.
`non_maori_origin`	These legislative amendments deserve praise for their potential to shorten the duration required to hear and conclude disputes. They are designed to foster greater uniformity in tribunal operations, thereby solidifying tribunals' position as the preferred initial avenue for prompt and expert resolution of significant issues. This reinforces the vital function of tribunals in offering an alternative dispute resolution framework distinct from the traditional court system. The bill's emphasis on simplifying and standardising statutory authorities is crucial, as it translates into a more straightforward process for individuals involved in disputes. This enables them to resolve their issues more quickly, move past the conflict, and resume their normal lives, which is ultimately the core objective of a functioning justice system—to empower citizens to contribute positively to society.	While enhancing the efficiency of general tribunals is valuable, it is equally imperative to ensure that dispute resolution mechanisms adequately serve Māori communities, respecting tikanga and Te Ao Māori principles. The current system often fails to provide culturally appropriate pathways for resolving disputes that arise within or affect Māori, such as those concerning whānau, hapū, or iwi. Future legislative efforts should explore strengthening or establishing specific Māori dispute resolution bodies, or integrating tikanga-based processes more deeply into existing tribunals, to ensure that Māori can access justice in a manner that aligns with their cultural values. This would not only improve access to justice but also affirm the Crown's Treaty obligations by recognising and supporting Māori self-determination in dispute resolution, moving beyond a one-size-fits-all approach to justice.

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 0.3
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 1
learning_rate: 2e-05
num_train_epochs: 1
warmup_ratio: 0.1
warmup_steps: 0.1
load_best_model_at_end: True
eval_on_start: True
prompts: task: classification | query:

All Hyperparameters

Click to expand

do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 1
per_device_eval_batch_size: 8
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0.1
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: True
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: task: classification | query:
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	nz-hansard-triplet-eval_cosine_accuracy
0	0	-	0.1594
0.0181	50	0.1438	-
0.0361	100	0.0038	-
0.0542	150	0.0041	-
0.0722	200	0.0104	0.8486
0.0903	250	0.0113	-
0.1083	300	0.0039	-
0.1264	350	0.0149	-
0.1444	400	0.0088	0.9980
0.1625	450	0.0057	-
0.1805	500	0.0038	-
0.1986	550	0.0081	-
0.2166	600	0.0	0.8526
0.2347	650	0.0011	-
0.2527	700	0.0	-
0.2708	750	0.0041	-
0.2888	800	0.0072	0.8665
0.3069	850	0.0078	-
0.3249	900	0.0	-
0.3430	950	0.0	-
0.3610	1000	0.0	0.8506
0.3791	1050	0.0	-
0.3971	1100	0.0	-
0.4152	1150	0.0	-
0.4332	1200	0.0062	0.9124
0.4513	1250	0.0175	-
0.4693	1300	0.0142	-
0.4874	1350	0.0	-
0.5054	1400	0.0089	0.9940
0.5235	1450	0.0	-
0.5415	1500	0.0	-
0.5596	1550	0.0098	-
0.5776	1600	0.0031	0.8486
0.5957	1650	0.0	-
0.6137	1700	0.0	-
0.6318	1750	0.0085	-
0.6498	1800	0.0046	0.8705
0.6679	1850	0.0	-
0.6859	1900	0.0045	-
0.7040	1950	0.0	-
0.7220	2000	0.0011	0.8665
0.7401	2050	0.0043	-
0.7581	2100	0.0	-
0.7762	2150	0.0	-
0.7942	2200	0.0017	0.8606
0.8123	2250	0.0034	-
0.8303	2300	0.0049	-
0.8484	2350	0.0	-
0.8664	2400	0.0059	0.8665
0.8845	2450	0.0006	-
0.9025	2500	0.0	-
0.9206	2550	0.0	-
0.9386	2600	0.0	1.0
0.9567	2650	0.0	-
0.9747	2700	0.0	-
0.9928	2750	0.0	-

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.12
Sentence Transformers: 5.2.2
Transformers: 5.0.0
PyTorch: 2.9.0+cu126
Accelerate: 1.12.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Downloads last month: 4

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for dinushiTJ/nz-hansard-embedding-gemma-zeroshot

Base model

google/embeddinggemma-300m

Finetuned

(222)

this model

Dataset used to train dinushiTJ/nz-hansard-embedding-gemma-zeroshot

Papers for dinushiTJ/nz-hansard-embedding-gemma-zeroshot

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12

In Defense of the Triplet Loss for Person Re-Identification

Paper • 1703.07737 • Published Mar 22, 2017

Evaluation results

Cosine Accuracy on nz hansard triplet eval
self-reported

1.000