SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the trec dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/all-distilroberta-v1-trec-group-by-label-d7c7ce7")
# Run inference
sentences = [
    "What country contains Africa 's northernmost point ?",
    'What is the difference between classical conditioning and operant conditioning ?',
    'Where can stocks be traded on-line ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.3409,  0.9962],
#         [-0.3409,  1.0000, -0.3273],
#         [ 0.9962, -0.3273,  1.0000]])

Evaluation

Metrics

Triplet

Metric trec-dev trec-test
cosine_accuracy 0.9207 0.9354

Training Details

Training Dataset

trec

  • Dataset: trec at a073d2e
  • Size: 4,952 training samples
  • Columns: text and label
  • Approximate statistics based on the first 1000 samples:
    text label
    type string int
    details
    • min: 6 tokens
    • mean: 13.29 tokens
    • max: 39 tokens
    • 0: ~0.30%
    • 1: ~1.50%
    • 2: ~3.10%
    • 3: ~0.60%
    • 4: ~0.80%
    • 5: ~3.90%
    • 7: ~1.80%
    • 8: ~1.20%
    • 9: ~1.70%
    • 10: ~0.20%
    • 11: ~0.40%
    • 12: ~0.20%
    • 13: ~4.30%
    • 14: ~0.40%
    • 15: ~0.90%
    • 16: ~0.10%
    • 17: ~1.20%
    • 18: ~0.70%
    • 19: ~0.10%
    • 20: ~0.70%
    • 21: ~1.40%
    • 22: ~0.20%
    • 23: ~0.50%
    • 24: ~7.90%
    • 25: ~4.40%
    • 26: ~4.90%
    • 27: ~3.90%
    • 28: ~3.30%
    • 29: ~17.50%
    • 30: ~0.50%
    • 31: ~0.70%
    • 32: ~2.70%
    • 33: ~2.80%
    • 34: ~0.80%
    • 35: ~7.90%
    • 36: ~1.40%
    • 37: ~0.10%
    • 38: ~6.30%
    • 39: ~4.80%
    • 40: ~0.40%
    • 41: ~1.10%
    • 42: ~0.10%
    • 43: ~0.70%
    • 44: ~0.40%
    • 45: ~0.40%
    • 46: ~0.50%
    • 47: ~0.10%
    • 48: ~0.20%
  • Samples:
    text label
    How did serfdom develop in and then leave Russia ? 26
    What films featured the character Popeye Doyle ? 5
    How can I find a list of celebrities ' real names ? 26
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • warmup_steps: 0.1
  • eval_strategy: steps
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss trec-dev_cosine_accuracy trec-test_cosine_accuracy
-1 -1 - 0.7683 -
0.1028 22 4.8286 - -
0.2009 43 - 0.9289 -
0.2056 44 4.2511 - -
0.3084 66 4.0568 - -
0.4019 86 - 0.9085 -
0.4112 88 3.9307 - -
0.5140 110 3.9124 - -
0.6028 129 - 0.9167 -
0.6168 132 3.9655 - -
0.7196 154 3.9726 - -
0.8037 172 - 0.9207 -
0.8224 176 3.8237 - -
0.9252 198 3.8720 - -
-1 -1 - - 0.9354

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.3.0.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
2
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/all-distilroberta-v1-trec-group-by-label-d7c7ce7

Finetuned
(50)
this model

Dataset used to train tomaarsen/all-distilroberta-v1-trec-group-by-label-d7c7ce7

Papers for tomaarsen/all-distilroberta-v1-trec-group-by-label-d7c7ce7

Evaluation results