SentenceTransformer based on BAAI/bge-large-en-v1.5

This is a sentence-transformers model finetuned from BAAI/bge-large-en-v1.5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-large-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rshn-krn/bge-large-legal-billing")
# Run inference
sentences = [
    'Carrier: Coaction Global, Inc.\nAttorney: Senior Counsel | Rate: $245/hr | Units: 0.1\nTask: L120 - Analysis/Strategy | Activity: A103 - Draft/revise\nNarrative: Draft Litigation Status Report regarding the deposition testimony of California Highway Patrol Officer A.S. Johnson with attention to Section I. Testimony, subsection G. Witness Statements, subsection 3. Tuna Taleni ',
    'Carrier: Coaction Global, Inc.\nAttorney: Senior Counsel | Rate: $245/hr | Units: 0.3\nTask: L120 - Analysis/Strategy | Activity: A103 - Draft/revise\nNarrative: Draft Litigation Status Report regarding the deposition testimony of California Highway Patrol Officer A.S. Johnson with attention to Section I. Testimony, subsection G. Witness Statements, subsection 4. Lumafale Oti ',
    'Carrier: Mitsui Sumitomo Marine Management, Inc.\nAttorney: Senior Counsel | Rate: $325/hr | Units: 0.1\nTask: L110 - Fact Investigation/Development | Activity: A107 - Communicat/OUT\nNarrative: Email to current counsel for Google defendants regarding status of dismissal of non-related entities in our to evaluate the case for insured.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.9998, 0.4579],
#         [0.9998, 1.0000, 0.4567],
#         [0.4579, 0.4567, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 28,621 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 50 tokens
    • mean: 81.19 tokens
    • max: 210 tokens
    • min: 44 tokens
    • mean: 81.51 tokens
    • max: 211 tokens
  • Samples:
    anchor positive
    Carrier: Golden Bear Insurance Company
    Attorney: Partner | Rate: $210/hr | Units: 0.3
    Task: L120 - Analysis/Strategy | Activity: A104 - Review/analyze
    Narrative: Review and analysis of asserted claims with particular attention to elements necessary to establish and likely defenses available to rebut claims and related initial pleadings, investigation, and discovery to leverage same so as to strategically implement ideal defense strategy
    Carrier: Golden Bear Insurance Company
    Attorney: Partner | Rate: $250/hr | Units: 0.1
    Task: L440 - Other Trial Preparation and Support | Activity: A104 - Review/analyze
    Narrative: Analyze the notice from the civil division manager for Monmouth County ordering a trial in this matter in April of 2025
    Carrier: Aspen Specialty Insurance Company
    Attorney: Associate | Rate: $240/hr | Units: 0.1
    Task: L110 - Fact Investigation/Development | Activity: A103 - Draft/revise
    Narrative: Draft and evaluate electronic correspondence to Ross McKissick, Plaintiff's counsel, discussing the nature and scope of Plaintiff's meet and confer issues for purposes of defense of same.
    Carrier: Aspen Specialty Insurance Company
    Attorney: Associate | Rate: $240/hr | Units: 1.4
    Task: L310 - Written Discovery | Activity: A103 - Draft/revise
    Narrative: Prepare Defendant's First Request for Production to Plaintiff.
    Carrier: Argo Group U.S.
    Attorney: Paralegal | Rate: $135/hr | Units: 0.2
    Task: L110 - Fact Investigation/Development | Activity: A103 - Draft/revise
    Narrative: Draft r letter to American Pacific Concrete LLC to be in compliance with order issued by Arbitrator and to advise possible default can be entered if failure to respond.
    Carrier: Argo Group U.S.
    Attorney: Paralegal | Rate: $135/hr | Units: 0.2
    Task: L130 - Experts/Consultants | Activity: A108 - Communicat/MISC
    Narrative: Emails (4) to and from retained expert Jeff Ambrosia regarding meeting of experts of all parties to discuss landscaping issues in compliance with Arbitrator's order.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • warmup_steps: 0.1
  • gradient_accumulation_steps: 4
  • fp16: True
  • gradient_checkpointing: True

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 3
  • max_steps: -1
  • learning_rate: 5e-05
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 4
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: False
  • fp16: True
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: no
  • per_device_eval_batch_size: 8
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.0447 10 1.5338
0.0894 20 0.4740
0.1341 30 0.4362
0.1788 40 0.3799
0.2235 50 0.4078
0.2682 60 0.3891
0.3128 70 0.4182
0.3575 80 0.3938
0.4022 90 0.4459
0.4469 100 0.4123
0.4916 110 0.3640
0.5363 120 0.4194
0.5810 130 0.3928
0.6257 140 0.4267
0.6704 150 0.4228
0.7151 160 0.4358
0.7598 170 0.4309
0.8045 180 0.4161
0.8492 190 0.4289
0.8939 200 0.4091
0.9385 210 0.3994
0.9832 220 0.4184
1.0268 230 0.4119
1.0715 240 0.4279
1.1162 250 0.3907
1.1609 260 0.4242
1.2056 270 0.4049
1.2503 280 0.3787
1.2950 290 0.4150
1.3397 300 0.4472
1.3844 310 0.3944
1.4291 320 0.4288
1.4737 330 0.3718
1.5184 340 0.4148
1.5631 350 0.4160
1.6078 360 0.3907
1.6525 370 0.3918
1.6972 380 0.3777
1.7419 390 0.4300
1.7866 400 0.3913
1.8313 410 0.4205
1.8760 420 0.3863
1.9207 430 0.4370
1.9654 440 0.4225
2.0089 450 0.4057
2.0536 460 0.3843
2.0983 470 0.4034
2.1430 480 0.4115
2.1877 490 0.4128
2.2324 500 0.4028
2.2771 510 0.4198
2.3218 520 0.3613
2.3665 530 0.4017
2.4112 540 0.3639
2.4559 550 0.3978
2.5006 560 0.3982
2.5453 570 0.4059
2.5899 580 0.4175
2.6346 590 0.4510
2.6793 600 0.4210
2.7240 610 0.4098
2.7687 620 0.4082
2.8134 630 0.3970
2.8581 640 0.3846
2.9028 650 0.4155
2.9475 660 0.4071
2.9922 670 0.4293

Framework Versions

  • Python: 3.11.10
  • Sentence Transformers: 5.3.0
  • Transformers: 5.4.0
  • PyTorch: 2.11.0+cu130
  • Accelerate: 1.13.0
  • Datasets: 4.8.4
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
19
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rshn-krn/bge-large-legal-billing

Finetuned
(74)
this model

Papers for rshn-krn/bge-large-legal-billing