SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the trec dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/all-distilroberta-v1-trec-group-by-label-b3d9cd1")
# Run inference
sentences = [
    "What country contains Africa 's northernmost point ?",
    'What is the difference between classical conditioning and operant conditioning ?',
    'Where can stocks be traded on-line ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.6728,  0.9707],
#         [-0.6728,  1.0000, -0.6537],
#         [ 0.9707, -0.6537,  1.0000]])

Evaluation

Metrics

Triplet

Metric trec_dev trec-test
cosine_accuracy 0.935 0.9495

Training Details

Training Dataset

trec

  • Dataset: trec at a073d2e
  • Size: 4,952 training samples
  • Columns: text and label
  • Approximate statistics based on the first 1000 samples:
    text label
    type string int
    details
    • min: 6 tokens
    • mean: 13.29 tokens
    • max: 39 tokens
    • 0: ~0.30%
    • 1: ~1.50%
    • 2: ~3.10%
    • 3: ~0.60%
    • 4: ~0.80%
    • 5: ~3.90%
    • 7: ~1.80%
    • 8: ~1.20%
    • 9: ~1.70%
    • 10: ~0.20%
    • 11: ~0.40%
    • 12: ~0.20%
    • 13: ~4.30%
    • 14: ~0.40%
    • 15: ~0.90%
    • 16: ~0.10%
    • 17: ~1.20%
    • 18: ~0.70%
    • 19: ~0.10%
    • 20: ~0.70%
    • 21: ~1.40%
    • 22: ~0.20%
    • 23: ~0.50%
    • 24: ~7.90%
    • 25: ~4.40%
    • 26: ~4.90%
    • 27: ~3.90%
    • 28: ~3.30%
    • 29: ~17.50%
    • 30: ~0.50%
    • 31: ~0.70%
    • 32: ~2.70%
    • 33: ~2.80%
    • 34: ~0.80%
    • 35: ~7.90%
    • 36: ~1.40%
    • 37: ~0.10%
    • 38: ~6.30%
    • 39: ~4.80%
    • 40: ~0.40%
    • 41: ~1.10%
    • 42: ~0.10%
    • 43: ~0.70%
    • 44: ~0.40%
    • 45: ~0.40%
    • 46: ~0.50%
    • 47: ~0.10%
    • 48: ~0.20%
  • Samples:
    text label
    How did serfdom develop in and then leave Russia ? 26
    What films featured the character Popeye Doyle ? 5
    How can I find a list of celebrities ' real names ? 26
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • warmup_steps: 0.1
  • eval_strategy: steps
  • per_device_eval_batch_size: 32
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 32
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss trec_dev_cosine_accuracy trec-test_cosine_accuracy
-1 -1 - 0.7683 -
0.1 14 4.7867 - -
0.2 28 4.4926 0.9370 -
0.3 42 4.2807 - -
0.4 56 4.1435 0.9207 -
0.5 70 4.1765 - -
0.6 84 4.0990 0.9370 -
0.7 98 4.0775 - -
0.8 112 4.0200 0.9390 -
0.9 126 3.8469 - -
1.0 140 3.7847 0.9350 -
-1 -1 - - 0.9495

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.3.0.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/all-distilroberta-v1-trec-group-by-label-b3d9cd1

Finetuned
(50)
this model

Dataset used to train tomaarsen/all-distilroberta-v1-trec-group-by-label-b3d9cd1

Papers for tomaarsen/all-distilroberta-v1-trec-group-by-label-b3d9cd1

Evaluation results