Matryoshka Representation Learning
Paper • 2205.13147 • Published • 25
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2 on the json dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bbmb/deep-learning-for-embedding-model-ssilwal-qpham6_army_doc")
# Run inference
sentences = [
'Offense \n11 January 2024 ATP 3-21.8 4-61\nlight the target, making it easier to acquire effectively. Leaders and Soldiers \nuse the infrared devices to identify enemy or friendly personnel and then \nengage targets using their aiming lights. \n4-172. Illuminating rounds fired to burn on the ground can mark objectives. This helps\nthe platoon orient on the objective but may adversely affect night vision devices.\n4-173. Leaders plan but do not always use illumination during limited visibility\nattacks. Battalion commanders normally control conventional illumination but ma y\na\nuthorize the company commander to do so. If the commander decides to use\nconventional illumination , the commander should not call for it until the assault is\ninitiated or the attack is detected. It should be placed on several locations over a wide\narea to confuse the enemy as to the exact place of the attack. It should be placed beyond\nthe objective to help assaulting Soldiers see and fire at withdrawing or counterattacking\nenemy Soldiers. Infrared illumination is a good capability to light the objective without\nlighting it for enemy forces without night vision devices. This advantage is degraded\nwhen used against a peer threat with the same night vision capabilities.\n4-174. The platoon leader , squad leaders , and vehicle commanders must know unit\ntactical SOP and develop sound COAs to synchronize the employment of infrared\nillumination devices , target designators , and aiming lights during their assault on the\nobjective. These include using luminous tape or chemical lights to mark personnel and\nusing weapons control restrictions.\n4-175. The platoon leader may use the following techniques to increase control during\nthe assault:\n\uf06c Use no flares, grenades, or obscuration on the objective.\n\uf06c Use mortar or artillery rounds to orient attacking units.\n\uf06c Use a base squad or fire team to pace and guide others.\n\uf06c Reduce intervals between Soldiers and squads.\n4-176. Like a daylight attack , indirect and direct fires are planned for a limited\nvisibility attack but are not executed unless the platoon is detected or is ready to assault.\nSome weapons may fire before the attack and maintain a pattern to deceive the enemy\nor to help cover noise ma de by the platoon ’s movement. This is not done if it will\ndisclose the attack.\n4-177. Obscuration further reduces the enemy’s visibility, particularly if the enemy has\nnight vision devices. The FO fires obscuration rounds close to or on enemy positions ,\nso it does not restrict friendly movement or hinder the reduction of obstacles. Employing \nobscuration on the objective during the assault may make it hard for assaulting Soldiers\nto find enemy fighting positions. If enough thermal sights are available , obscuration on\nthe objective may provide a decisive advantage for a well-trained platoon.\nNote. I f the enemy is equipped with night vision devices , leaders must evaluate \nthe risk of using each technique and ensure the mission is not compromised by \nthe enemy’s ability to detect infrared light sources.',
'What are the advantages of using infrared illumination in assaults?',
'How can leaders effectively provide command and control during defensive operations?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
dim_384, dim_256, dim_128 and dim_64InformationRetrievalEvaluator| Metric | dim_384 | dim_256 | dim_128 | dim_64 |
|---|---|---|---|---|
| cosine_accuracy@1 | 0.0037 | 0.0037 | 0.0037 | 0.0019 |
| cosine_accuracy@3 | 0.0131 | 0.0112 | 0.0093 | 0.0075 |
| cosine_accuracy@5 | 0.0485 | 0.0373 | 0.0466 | 0.0429 |
| cosine_accuracy@10 | 0.4496 | 0.4459 | 0.4366 | 0.4216 |
| cosine_precision@1 | 0.0037 | 0.0037 | 0.0037 | 0.0019 |
| cosine_precision@3 | 0.0044 | 0.0037 | 0.0031 | 0.0025 |
| cosine_precision@5 | 0.0097 | 0.0075 | 0.0093 | 0.0086 |
| cosine_precision@10 | 0.045 | 0.0446 | 0.0437 | 0.0422 |
| cosine_recall@1 | 0.0037 | 0.0037 | 0.0037 | 0.0019 |
| cosine_recall@3 | 0.0131 | 0.0112 | 0.0093 | 0.0075 |
| cosine_recall@5 | 0.0485 | 0.0373 | 0.0466 | 0.0429 |
| cosine_recall@10 | 0.4496 | 0.4459 | 0.4366 | 0.4216 |
| cosine_ndcg@10 | 0.1501 | 0.1489 | 0.1465 | 0.139 |
| cosine_mrr@10 | 0.0659 | 0.0653 | 0.0646 | 0.0595 |
| cosine_map@100 | 0.0862 | 0.0859 | 0.0847 | 0.079 |
positive and anchor| positive | anchor | |
|---|---|---|
| type | string | string |
| details |
|
|
| positive | anchor |
|---|---|
Appendix A |
What is the purpose of having one squad engage while others observe in an observed fire scenario? |
Glossary |
How is the term SDM used in the military? |
Chapter 1 |
What offensive and defensive actions can an Infantry rifle platoon perform? |
MatryoshkaLoss with these parameters:{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
384,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1
],
"n_dims_per_step": -1
}
eval_strategy: epochper_device_train_batch_size: 64per_device_eval_batch_size: 16gradient_accumulation_steps: 8num_train_epochs: 20lr_scheduler_type: cosinewarmup_ratio: 0.2bf16: Truetf32: Trueload_best_model_at_end: Trueoptim: adamw_torch_fusedbatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 64per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 8eval_accumulation_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 20max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.2warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Truelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Falsehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseeval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | dim_384_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
|---|---|---|---|---|---|---|
| 0.9474 | 9 | - | 0.1225 | 0.1221 | 0.1145 | 0.0915 |
| 1.0526 | 10 | 7.2521 | - | - | - | - |
| 2.0 | 19 | - | 0.1296 | 0.1261 | 0.1157 | 0.1089 |
| 2.1053 | 20 | 5.4977 | - | - | - | - |
| 2.9474 | 28 | - | 0.1294 | 0.1377 | 0.1262 | 0.1090 |
| 3.1579 | 30 | 4.3477 | - | - | - | - |
| 4.0 | 38 | - | 0.1330 | 0.1378 | 0.1260 | 0.1126 |
| 4.2105 | 40 | 3.3767 | - | - | - | - |
| 4.9474 | 47 | - | 0.1415 | 0.1388 | 0.1294 | 0.1221 |
| 5.2632 | 50 | 2.6443 | - | - | - | - |
| 6.0 | 57 | - | 0.1515 | 0.1395 | 0.1348 | 0.1218 |
| 6.3158 | 60 | 2.0824 | - | - | - | - |
| 6.9474 | 66 | - | 0.1480 | 0.1411 | 0.1335 | 0.1242 |
| 7.3684 | 70 | 1.6734 | - | - | - | - |
| 8.0 | 76 | - | 0.1491 | 0.1481 | 0.1428 | 0.1313 |
| 8.4211 | 80 | 1.3894 | - | - | - | - |
| 8.9474 | 85 | - | 0.1449 | 0.1497 | 0.1419 | 0.1341 |
| 9.4737 | 90 | 1.1443 | - | - | - | - |
| 10.0 | 95 | - | 0.1466 | 0.1494 | 0.1399 | 0.1396 |
| 10.5263 | 100 | 1.0121 | - | - | - | - |
| 10.9474 | 104 | - | 0.1458 | 0.1477 | 0.1415 | 0.1371 |
| 11.5789 | 110 | 0.8833 | - | - | - | - |
| 12.0 | 114 | - | 0.1479 | 0.1474 | 0.1445 | 0.1374 |
| 12.6316 | 120 | 0.8201 | - | - | - | - |
| 12.9474 | 123 | - | 0.1519 | 0.1486 | 0.1458 | 0.1360 |
| 13.6842 | 130 | 0.736 | - | - | - | - |
| 14.0 | 133 | - | 0.1505 | 0.1471 | 0.1484 | 0.1376 |
| 14.7368 | 140 | 0.6924 | - | - | - | - |
| 14.9474 | 142 | - | 0.1496 | 0.1486 | 0.1451 | 0.1396 |
| 15.7895 | 150 | 0.672 | - | - | - | - |
| 16.0 | 152 | - | 0.1492 | 0.1489 | 0.1464 | 0.1404 |
| 16.8421 | 160 | 0.6455 | - | - | - | - |
| 16.9474 | 161 | - | 0.1496 | 0.1493 | 0.1468 | 0.1389 |
| 17.8947 | 170 | 0.6538 | - | - | - | - |
| 18.0 | 171 | - | 0.1501 | 0.1470 | 0.1461 | 0.1393 |
| 18.9474 | 180 | 0.628 | 0.1501 | 0.1489 | 0.1465 | 0.1390 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/all-MiniLM-L6-v2