SentenceTransformer based on Alibaba-NLP/gte-Qwen2-1.5B-instruct

This is a sentence-transformers model finetuned from Alibaba-NLP/gte-Qwen2-1.5B-instruct. It maps sentences & paragraphs to a 1536-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Alibaba-NLP/gte-Qwen2-1.5B-instruct
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1536 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'Qwen2Model'})
  (1): Pooling({'word_embedding_dimension': 1536, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'General supervision means that the physician need not be physically present at the patient\'s place of residence when the service is performed; however, the service must be performed under his or her overall supervision and control The physician orders the service(s) to be performed, and contact is maintained between the nurse or other employee and the physician, e.g., the employee contacts the physician directly if additional instructions are needed, and the physician must retain professional responsibility for the service All other "incident to" requirements must be met (see §§60-60.4). 3 The services are included in the physician\'s/clinic\'s bill, and the physician or clinic has incurred an expense for them (see §60.2). 4 The services of the paramedical are required for the patient\'s care; that is, they are reasonable and necessary as defined in the Medicare Benefit Policy Manual, Chapter 16, "General Exclusions from Coverage," §20. 5 When the service can be furnished by an HHA in the local area, it cannot be covered when furnished by a physician/clinic to a homebound patient under this provision, except as described in §60.4.C.',
    'General supervision means that the physician need not be physically present at the patient\'s place of residence when the service is performed; however, the service must be performed under his or her overall supervision and control The physician orders the service(s) to be performed, and contact is maintained between the nurse or other employee and the physician, e.g., the employee contacts the physician directly if additional instructions are needed, and the physician must retain professional responsibility for the service All other "incident to" requirements must be met (see §§60-60.4). 3 The services are included in the physician\'s/clinic\'s bill, and the physician or clinic has incurred an expense for them (see §60.2). 4 The services of the paramedical are required for the patient\'s care; that is, they are reasonable and necessary as defined in the Medicare Benefit Policy Manual, Chapter 16, "General Exclusions from Coverage," §20. 5 When the service can be furnished by an HHA in the local area, it cannot be covered when furnished by a physician/clinic to a homebound patient under this provision, except as described in §60.4.C.',
    'Implementation: 11-01-24) Transmit the CMS-2591 to CO via PC or terminal Use instructions in the CROWD User Guide available via the CMS Enterprise Portal The report is due as soon as possible after the end of the reporting month but no later than the 15th of the month following the end of the reporting month.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1536]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  1.0000, -0.1049],
#         [ 1.0000,  1.0000, -0.1049],
#         [-0.1049, -0.1049,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 60,906 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 32 tokens
    • mean: 145.93 tokens
    • max: 512 tokens
    • min: 32 tokens
    • mean: 145.93 tokens
    • max: 512 tokens
  • Samples:
    sentence_0 sentence_1
    The preadmission screening in the patient's IRF medical record serves as the primary documentation by the IRF clinical staff of the patient's status prior to admission and of the specific reasons that led the IRF clinical staff to conclude that the IRF admission would be reasonable and necessary As such, IRFs must make this documentation detailed and comprehensive In accordance with 42 CFR § 412.622(a)(4)(i)(B) the preadmission screening documentation must indicate the patient's prior level of function (prior to the event or condition that led to the patient's need for intensive rehabilitation therapy), expected level of improvement, and the expected length of time necessary to achieve that level of improvement The preadmission screening in the patient's IRF medical record serves as the primary documentation by the IRF clinical staff of the patient's status prior to admission and of the specific reasons that led the IRF clinical staff to conclude that the IRF admission would be reasonable and necessary As such, IRFs must make this documentation detailed and comprehensive In accordance with 42 CFR § 412.622(a)(4)(i)(B) the preadmission screening documentation must indicate the patient's prior level of function (prior to the event or condition that led to the patient's need for intensive rehabilitation therapy), expected level of improvement, and the expected length of time necessary to achieve that level of improvement
    and (C) An attestation that the component organization will prominently post notification on its Web site and publish in any promotional materials for dissemination to providers, a summary of the information that is required by paragraph (c)(4)(i)(A) of this section. (ii) Comply with the following requirements during its period of listing: (A) The component organization may not share staff with its parent organization(s). (B) The component organization may enter into a written agreement pursuant to paragraph (c)(3) but such agreements are limited to units or individuals of the parent organization(s) whose responsibilities do not involve the activities specified in the restrictions in paragraph (a)(2)(ii) of this section and (C) An attestation that the component organization will prominently post notification on its Web site and publish in any promotional materials for dissemination to providers, a summary of the information that is required by paragraph (c)(4)(i)(A) of this section. (ii) Comply with the following requirements during its period of listing: (A) The component organization may not share staff with its parent organization(s). (B) The component organization may enter into a written agreement pursuant to paragraph (c)(3) but such agreements are limited to units or individuals of the parent organization(s) whose responsibilities do not involve the activities specified in the restrictions in paragraph (a)(2)(ii) of this section
    Review of the person-centered active treatment plan. (d) The CMHC interdisciplinary treatment team must review, revise, and document the individualized active treatment plan as frequently as the client's condition requires, but no less frequently than every 30-calendar day A revised active treatment plan must include information from the client's initial evaluation and comprehensive assessments, the client's progress toward outcomes and goals specified in the active treatment plan, and changes in the client's goals The CMHC must also meet partial hospitalization program requirements specified under § 424.24(e) of this chapter or intensive outpatient service requirements as specified under § 424.24(d) of this chapter, as applicable, if such services are included in the active treatment plan Review of the person-centered active treatment plan. (d) The CMHC interdisciplinary treatment team must review, revise, and document the individualized active treatment plan as frequently as the client's condition requires, but no less frequently than every 30-calendar day A revised active treatment plan must include information from the client's initial evaluation and comprehensive assessments, the client's progress toward outcomes and goals specified in the active treatment plan, and changes in the client's goals The CMHC must also meet partial hospitalization program requirements specified under § 424.24(e) of this chapter or intensive outpatient service requirements as specified under § 424.24(d) of this chapter, as applicable, if such services are included in the active treatment plan
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1313 500 0.173
0.2627 1000 0.1505
0.3940 1500 0.1613
0.5253 2000 0.1568
0.6567 2500 0.1677
0.7880 3000 0.1611
0.9194 3500 0.1571
1.0507 4000 0.1688
1.1820 4500 0.1682
1.3134 5000 0.1609
1.4447 5500 0.1621
1.5760 6000 0.1528
1.7074 6500 0.1576
1.8387 7000 0.1581
1.9701 7500 0.1591
2.1014 8000 0.1479
2.2327 8500 0.1623
2.3641 9000 0.1572
2.4954 9500 0.1577
2.6267 10000 0.158
2.7581 10500 0.16
2.8894 11000 0.1693

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 5.2.0
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu129
  • Accelerate: 1.10.1
  • Datasets: 4.4.1
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
11
Safetensors
Model size
2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for atx-labs/gte-qwen-2.5-1.5B-ecfr_cms

Finetuned
(22)
this model

Papers for atx-labs/gte-qwen-2.5-1.5B-ecfr_cms