Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m-v2.0. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: GteModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'How does the new Eurostat methodology differ in scope from the indicators used in this Directive for calculating energy consumption?',
'(29) The methodology for calculation of primary energy consumption and final energy consumption is aligned with the new Eurostat methodology, but the indicators used for the purpose of this Directive have a different scope, in that they exclude ambient energy and include energy consumption in international aviation for the targets in primary energy consumption and final energy consumption. The use of new indicators also implies that any changes in energy consumption of blast furnaces are now only reflected in primary energy consumption.',
'(92) InvestEU is the Union flagship programme to boost investment, especially the green and digital transition, by providing financing and technical assistance, for instance through blending mechanisms. Such an approach contributes to crowd in additional public and private capital. Moreover, Member States are encouraged to contribute to the InvestEU Member State compartment to support financial products available to net-zero technology manufacturing, without prejudice to applicable State aid rules.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
InformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.7136 |
| cosine_accuracy@3 | 0.9244 |
| cosine_accuracy@5 | 0.9589 |
| cosine_accuracy@10 | 0.9819 |
| cosine_precision@1 | 0.7136 |
| cosine_precision@3 | 0.3081 |
| cosine_precision@5 | 0.1918 |
| cosine_precision@10 | 0.0982 |
| cosine_recall@1 | 0.7136 |
| cosine_recall@3 | 0.9244 |
| cosine_recall@5 | 0.9589 |
| cosine_recall@10 | 0.9819 |
| cosine_ndcg@10 | 0.8626 |
| cosine_mrr@10 | 0.8228 |
| cosine_map@100 | 0.8237 |
query_text and doc_text| query_text | doc_text | |
|---|---|---|
| type | string | string |
| details |
|
|
| query_text | doc_text |
|---|---|
The regulation's applicability extends to various stakeholders involved in AI systems, including providers, deployers, importers, and manufacturers, regardless of their location. It specifically addresses high-risk AI systems and outlines the limitations of its scope, particularly concerning national security and military applications. Additionally, it clarifies that it does not interfere with the responsibilities of member states regarding national security or the operations of public authorities and international organizations in specific contexts. |
(180) The European Data Protection Supervisor and the European Data Protection Board were consulted in accordance with Article 42(1) and (2) of Regulation (EU) 2018/1725 and delivered their joint opinion on 18 June 2021, |
How should loans with unknown use of proceeds be allocated in terms of sectors and alignment metrics? |
instruments. For loans whose use of proceeds is known, the value shall be included for the relevant sector and alignment metric. For loans whose use of proceeds is unknown, the gross carrying amount of the exposure shall be allocated to the relevant sectors and alignment metrics based on the counterparties’ activity distribution, including by counterparties’ turnover by activity. Institutions shall add a row in the template for each relevant combination of sectors disclosed in column (b) and alignment metrics included in column (d). --- |
What measures must AIFMs implement to ensure they do not rely solely on credit ratings for assessing the creditworthiness of AIFs' assets? |
▼M1 |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: stepslearning_rate: 2e-05num_train_epochs: 4warmup_ratio: 0.1fp16: Trueload_best_model_at_end: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 8per_device_eval_batch_size: 8per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 4max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.7763 |
| 0.0863 | 500 | 0.2343 | - |
| 0.1726 | 1000 | 0.1259 | 0.814 |
| 0.2589 | 1500 | 0.1027 | - |
| 0.3452 | 2000 | 0.0757 | 0.8288 |
| 0.4316 | 2500 | 0.0617 | - |
| 0.5179 | 3000 | 0.0651 | 0.8288 |
| 0.6042 | 3500 | 0.0863 | - |
| 0.6905 | 4000 | 0.06 | 0.8376 |
| 0.7768 | 4500 | 0.0579 | - |
| 0.8631 | 5000 | 0.0593 | 0.8342 |
| 0.9494 | 5500 | 0.0485 | - |
| 1.0357 | 6000 | 0.0465 | 0.8384 |
| 1.1220 | 6500 | 0.0276 | - |
| 1.2084 | 7000 | 0.0353 | 0.8392 |
| 1.2947 | 7500 | 0.0335 | - |
| 1.3810 | 8000 | 0.0292 | 0.8436 |
| 1.4673 | 8500 | 0.0276 | - |
| 1.5536 | 9000 | 0.0404 | 0.8485 |
| 1.6399 | 9500 | 0.0476 | - |
| 1.7262 | 10000 | 0.0265 | 0.8601 |
| 1.8125 | 10500 | 0.017 | - |
| 1.8988 | 11000 | 0.0217 | 0.8549 |
| 1.9852 | 11500 | 0.0329 | - |
| 2.0715 | 12000 | 0.0207 | 0.8577 |
| 2.1578 | 12500 | 0.0199 | - |
| 2.2441 | 13000 | 0.015 | 0.8544 |
| 2.3304 | 13500 | 0.0143 | - |
| 2.4167 | 14000 | 0.0117 | 0.8574 |
| 2.5030 | 14500 | 0.0204 | - |
| 2.5893 | 15000 | 0.0141 | 0.8595 |
| 2.6756 | 15500 | 0.0123 | - |
| 2.7620 | 16000 | 0.0211 | 0.8538 |
| 2.8483 | 16500 | 0.0207 | - |
| 2.9346 | 17000 | 0.0134 | 0.8562 |
| 3.0209 | 17500 | 0.0276 | - |
| 3.1072 | 18000 | 0.0106 | 0.8552 |
| 3.1935 | 18500 | 0.0129 | - |
| 3.2798 | 19000 | 0.0157 | 0.8582 |
| 3.3661 | 19500 | 0.0164 | - |
| 3.4524 | 20000 | 0.0192 | 0.8614 |
| 3.5388 | 20500 | 0.0138 | - |
| 3.6251 | 21000 | 0.0141 | 0.8601 |
| 3.7114 | 21500 | 0.0109 | - |
| 3.7977 | 22000 | 0.0178 | 0.8605 |
| 3.8840 | 22500 | 0.0088 | - |
| 3.9703 | 23000 | 0.0255 | 0.8626 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
Snowflake/snowflake-arctic-embed-m-v2.0