SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the trec dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("tomaarsen/all-distilroberta-v1-trec-balanced-sampler")
# Run inference
sentences = [
    "What country contains Africa 's northernmost point ?",
    'What is the difference between classical conditioning and operant conditioning ?',
    'Where can stocks be traded on-line ?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000, -0.6694,  0.9470],
#         [-0.6694,  1.0000, -0.6573],
#         [ 0.9470, -0.6573,  1.0000]])

Evaluation

Metrics

Triplet

Metric trec-dev trec-test
cosine_accuracy 0.9451 0.9434

Training Details

Training Dataset

trec

  • Dataset: trec at a073d2e
  • Size: 4,952 training samples
  • Columns: text and label
  • Approximate statistics based on the first 1000 samples:
    text label
    type string int
    details
    • min: 6 tokens
    • mean: 13.29 tokens
    • max: 39 tokens
    • 0: ~0.30%
    • 1: ~1.50%
    • 2: ~3.10%
    • 3: ~0.60%
    • 4: ~0.80%
    • 5: ~3.90%
    • 7: ~1.80%
    • 8: ~1.20%
    • 9: ~1.70%
    • 10: ~0.20%
    • 11: ~0.40%
    • 12: ~0.20%
    • 13: ~4.30%
    • 14: ~0.40%
    • 15: ~0.90%
    • 16: ~0.10%
    • 17: ~1.20%
    • 18: ~0.70%
    • 19: ~0.10%
    • 20: ~0.70%
    • 21: ~1.40%
    • 22: ~0.20%
    • 23: ~0.50%
    • 24: ~7.90%
    • 25: ~4.40%
    • 26: ~4.90%
    • 27: ~3.90%
    • 28: ~3.30%
    • 29: ~17.50%
    • 30: ~0.50%
    • 31: ~0.70%
    • 32: ~2.70%
    • 33: ~2.80%
    • 34: ~0.80%
    • 35: ~7.90%
    • 36: ~1.40%
    • 37: ~0.10%
    • 38: ~6.30%
    • 39: ~4.80%
    • 40: ~0.40%
    • 41: ~1.10%
    • 42: ~0.10%
    • 43: ~0.70%
    • 44: ~0.40%
    • 45: ~0.40%
    • 46: ~0.50%
    • 47: ~0.10%
    • 48: ~0.20%
  • Samples:
    text label
    How did serfdom develop in and then leave Russia ? 26
    What films featured the character Popeye Doyle ? 5
    How can I find a list of celebrities ' real names ? 26
  • Loss: BatchAllTripletLoss

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • warmup_steps: 0.1
  • eval_strategy: steps
  • batch_sampler: group_by_label

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 1
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 1
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: steps
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: group_by_label
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss trec-dev_cosine_accuracy trec-test_cosine_accuracy
-1 -1 - 0.7683 -
0.1066 13 4.8492 - -
0.2049 25 - 0.9309 -
0.2131 26 4.5356 - -
0.3197 39 4.2800 - -
0.4098 50 - 0.9289 -
0.4262 52 4.2069 - -
0.5328 65 4.1544 - -
0.6148 75 - 0.9390 -
0.6393 78 4.1101 - -
0.7459 91 4.0749 - -
0.8197 100 - 0.9451 -
0.8525 104 4.0868 - -
0.9590 117 4.0453 - -
-1 -1 - - 0.9434

Framework Versions

  • Python: 3.11.6
  • Sentence Transformers: 5.3.0.dev0
  • Transformers: 5.3.0.dev0
  • PyTorch: 2.10.0+cu126
  • Accelerate: 1.12.0
  • Datasets: 4.3.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

BatchAllTripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tomaarsen/all-distilroberta-v1-trec-balanced-sampler

Finetuned
(50)
this model

Dataset used to train tomaarsen/all-distilroberta-v1-trec-balanced-sampler

Papers for tomaarsen/all-distilroberta-v1-trec-balanced-sampler

Evaluation results