Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents
Paper • 2310.19923 • Published • 14
This is a sentence-transformers model finetuned from laion/clap-htsat-unfused on the librispeech_asr dataset. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'get_text_features', 'method_output_name': 'pooler_output'}, 'audio': {'method': 'get_audio_features', 'method_output_name': 'pooler_output'}}, 'module_output_name': 'sentence_embedding', 'message_format': 'auto', 'architecture': 'ClapModel'})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs")
# Run inference
inputs = [
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_0.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_1.wav',
'https://huggingface.co/tomaarsen/clap-htsat-unfused-librispeech-5-epochs-128bs/resolve/main/assets/audio_2.wav',
]
embeddings = model.encode(inputs)
print(embeddings.shape)
# [3, 512]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4362, 0.6843],
# [0.4362, 1.0000, 0.2179],
# [0.6843, 0.2179, 1.0000]])
librispeech-eval and librispeech-testInformationRetrievalEvaluator| Metric | librispeech-eval | librispeech-test |
|---|---|---|
| cosine_accuracy@1 | 0.245 | 0.0489 |
| cosine_accuracy@3 | 0.52 | 0.1183 |
| cosine_accuracy@5 | 0.645 | 0.1691 |
| cosine_accuracy@10 | 0.785 | 0.2641 |
| cosine_precision@1 | 0.245 | 0.0489 |
| cosine_precision@3 | 0.1733 | 0.0394 |
| cosine_precision@5 | 0.129 | 0.0338 |
| cosine_precision@10 | 0.0785 | 0.0264 |
| cosine_recall@1 | 0.245 | 0.0489 |
| cosine_recall@3 | 0.52 | 0.1183 |
| cosine_recall@5 | 0.645 | 0.1691 |
| cosine_recall@10 | 0.785 | 0.2641 |
| cosine_ndcg@10 | 0.503 | 0.1402 |
| cosine_mrr@10 | 0.414 | 0.1027 |
| cosine_map@100 | 0.4253 | 0.1195 |
audio and text| audio | text | |
|---|---|---|
| type | audio | string |
| details |
|
|
| audio | text |
|---|---|
CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE BEGINNING OF THIS LIAISON IN A FEW LINES BUT I WANTED YOU TO SEE EVERY STEP BY WHICH WE CAME I TO AGREE TO WHATEVER MARGUERITE WISHED |
|
MARGUERITE TO BE UNABLE TO LIVE APART FROM ME IT WAS THE DAY AFTER THE EVENING WHEN SHE CAME TO SEE ME THAT I SENT HER MANON LESCAUT FROM THAT TIME SEEING THAT I COULD NOT CHANGE MY MISTRESS'S LIFE I CHANGED MY OWN |
|
I WISHED ABOVE ALL NOT TO LEAVE MYSELF TIME TO THINK OVER THE POSITION I HAD ACCEPTED FOR IN SPITE OF MYSELF IT WAS A GREAT DISTRESS TO ME THUS MY LIFE GENERALLY SO CALM |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
audio and text| audio | text | |
|---|---|---|
| type | audio | string |
| details |
|
|
| audio | text |
|---|---|
HE WAS IN A FEVERED STATE OF MIND OWING TO THE BLIGHT HIS WIFE'S ACTION THREATENED TO CAST UPON HIS ENTIRE FUTURE |
|
HE WOULD HAVE TO PAY HER THE MONEY WHICH SHE WOULD NOW REGULARLY DEMAND OR THERE WOULD BE TROUBLE IT DID NOT MATTER WHAT HE DID |
|
HURSTWOOD WALKED THE FLOOR MENTALLY ARRANGING THE CHIEF POINTS OF HIS SITUATION |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc",
"doc_to_query"
],
"partition_mode": "per_direction",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 4num_train_epochs: 5learning_rate: 2e-05warmup_steps: 0.1bf16: Trueeval_strategy: stepsper_device_eval_batch_size: 4batch_sampler: no_duplicatesper_device_train_batch_size: 4num_train_epochs: 5max_steps: -1learning_rate: 2e-05lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 1average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: stepsper_device_eval_batch_size: 4prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Falseignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss | librispeech-eval_cosine_ndcg@10 | librispeech-test_cosine_ndcg@10 |
|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.0279 | 0.0037 |
| 0.1001 | 714 | 1.4538 | 1.1503 | 0.0727 | - |
| 0.2001 | 1428 | 0.9953 | 0.8749 | 0.0841 | - |
| 0.3002 | 2142 | 0.9557 | 0.7760 | 0.1252 | - |
| 0.4003 | 2856 | 0.9621 | 2.4026 | 0.0353 | - |
| 0.5004 | 3570 | 0.9721 | 0.9326 | 0.0720 | - |
| 0.6004 | 4284 | 0.8931 | 0.8454 | 0.0934 | - |
| 0.7005 | 4998 | 0.8368 | 0.5494 | 0.1741 | - |
| 0.8006 | 5712 | 0.8001 | 0.4935 | 0.2170 | - |
| 0.9006 | 6426 | 0.7817 | 0.7168 | 0.1476 | - |
| 1.0007 | 7140 | 0.7235 | 0.6410 | 0.1809 | - |
| 1.1008 | 7854 | 0.6620 | 0.6527 | 0.1726 | - |
| 1.2008 | 8568 | 0.6492 | 0.4146 | 0.2116 | - |
| 1.3009 | 9282 | 0.6342 | 0.7536 | 0.1695 | - |
| 1.4010 | 9996 | 0.6438 | 0.6872 | 0.1873 | - |
| 1.5011 | 10710 | 0.6103 | 0.4385 | 0.2767 | - |
| 1.6011 | 11424 | 0.6052 | 0.8028 | 0.1805 | - |
| 1.7012 | 12138 | 0.5950 | 0.3628 | 0.2891 | - |
| 1.8013 | 12852 | 0.5672 | 0.6978 | 0.2120 | - |
| 1.9013 | 13566 | 0.5611 | 0.5946 | 0.1965 | - |
| 2.0014 | 14280 | 0.5546 | 0.2659 | 0.3589 | - |
| 2.1015 | 14994 | 0.5133 | 0.4273 | 0.2806 | - |
| 2.2015 | 15708 | 0.4588 | 0.4356 | 0.2929 | - |
| 2.3016 | 16422 | 0.4629 | 0.5123 | 0.2538 | - |
| 2.4017 | 17136 | 0.4429 | 0.3757 | 0.3092 | - |
| 2.5018 | 17850 | 0.5000 | 0.4237 | 0.3297 | - |
| 2.6018 | 18564 | 0.4328 | 0.5146 | 0.3291 | - |
| 2.7019 | 19278 | 0.4284 | 0.3348 | 0.3483 | - |
| 2.8020 | 19992 | 0.4598 | 0.3768 | 0.3865 | - |
| 2.9020 | 20706 | 0.4183 | 0.3908 | 0.2594 | - |
| 3.0021 | 21420 | 0.4180 | 0.3240 | 0.3470 | - |
| 3.1022 | 22134 | 0.3624 | 0.3487 | 0.4205 | - |
| 3.2022 | 22848 | 0.3627 | 0.3124 | 0.3650 | - |
| 3.3023 | 23562 | 0.3651 | 0.3025 | 0.3046 | - |
| 3.4024 | 24276 | 0.3644 | 0.3708 | 0.4050 | - |
| 3.5025 | 24990 | 0.3480 | 0.3458 | 0.3998 | - |
| 3.6025 | 25704 | 0.3542 | 0.2936 | 0.4141 | - |
| 3.7026 | 26418 | 0.2954 | 0.2692 | 0.3876 | - |
| 3.8027 | 27132 | 0.3336 | 0.2221 | 0.3915 | - |
| 3.9027 | 27846 | 0.3255 | 0.3140 | 0.4253 | - |
| 4.0028 | 28560 | 0.3093 | 0.2278 | 0.4607 | - |
| 4.1029 | 29274 | 0.2715 | 0.3176 | 0.4261 | - |
| 4.2029 | 29988 | 0.2812 | 0.2814 | 0.4590 | - |
| 4.3030 | 30702 | 0.2690 | 0.2390 | 0.4997 | - |
| 4.4031 | 31416 | 0.2697 | 0.2575 | 0.4720 | - |
| 4.5032 | 32130 | 0.2616 | 0.3054 | 0.4863 | - |
| 4.6032 | 32844 | 0.2437 | 0.2467 | 0.4852 | - |
| 4.7033 | 33558 | 0.2532 | 0.2505 | 0.5196 | - |
| 4.8034 | 34272 | 0.2640 | 0.2242 | 0.4926 | - |
| 4.9034 | 34986 | 0.2245 | 0.2345 | 0.4999 | - |
| -1 | -1 | - | - | 0.5030 | 0.1402 |
Carbon emissions were measured using CodeCarbon.
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{günther2024jinaembeddings28192token,
title={Jina Embeddings 2: 8192-Token General-Purpose Text Embeddings for Long Documents},
author={Michael Günther and Jackmin Ong and Isabelle Mohr and Alaeddine Abdessalem and Tanguy Abel and Mohammad Kalim Akram and Susana Guzman and Georgios Mastrapas and Saba Sturua and Bo Wang and Maximilian Werk and Nan Wang and Han Xiao},
year={2024},
eprint={2310.19923},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2310.19923},
}
Base model
laion/clap-htsat-unfused