SentenceTransformer based on hkunlp/instructor-xl

This is a sentence-transformers model finetuned from hkunlp/instructor-xl. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: hkunlp/instructor-xl
  • Maximum Sequence Length: 128 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 128, 'do_lower_case': False, 'architecture': 'T5EncoderModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': False})
  (2): Dense({'in_features': 1024, 'out_features': 768, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
  (3): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ahmedHamdi/ir-es-en-instructor-xl")
# Run inference
sentences = [
    'Represent the plot: A group of taxi drivers, calling themselves "the family," dedicate themselves to cleaning up the city of Madrid of Black people ("shit"), homosexuals, transsexuals, and transvestites ("fish"), and drug addicts ("meat") during their nightly patrols. One of them, Velasco, has a daughter, Paz, who has failed her exams and decided to drop out of school. Angry, Velasco forces her to work with him in the taxi: she drives during the day, and he drives at night. Dani is the son of a family member: Reme. He hadn\'t seen Paz since they were children, and now they fall in love and start dating. In one scene of the film, the family, along with Dani and Paz, visit Reme\'s husband in the hospital: a retired taxi driver who was paralyzed after being shot in the spinal cord by a couple of drug addicts. He survived thanks to one of the family members, Calero, a taxi driver and former police officer, who happened to be passing by. That was the reason the group decided to wage their own personal battle, and Dani is almost convinced he has to do the same. Little by little, Paz learns about the escapades of her boyfriend, her father, and his friends, something she vehemently disapproves of. It is then that the ex-policeman sees her as a threat, and fearing she will report them, decides to kill her.',
    "Represent the plot: After failing one of her midterm exams in a Madrid high school, Paz Velasco decides to drop out of the academic circuit. She takes this prompt decision much to the chagrin of her father, an overbearing taxi driver. In exchange, he forces her to join him at work so she can learn the trade and earn a living. Her reconnection with the so-called Family, the close-knit group of her father's co-workers, is initially uneventful. One of its members is Dani, Paz's childhood sweetheart and a young taxi apprentice about to be discharged from the military service. Upon meeting for the first time in many years, they rekindle their mutual attraction and start dating, much to the apparent satisfaction of Dani's mother Reme—a fellow cabbie who is having a secret affair with Paz's father. Reme's husband is a former taxi driver who, according to the explanations given to Paz, was shot by two drug dealers during a robbery. One of the shots left him paralyzed and the Family's visits to him in the hospital are a staple in their common social life. Aside from Dani, Velasco and Reme, this group is completed by a shady individual named Calero and a thuggish brute nicknamed Niño who seems to act as an informal henchman of sorts for the former. Paz realizes Calero is a former cop even before being told so, and their mutual distrust is vivid and ever-increasing throughout the film. From the outset, it is made clear to the audience that the Family is in fact a neo-fascist death squad whose members use their cabs to abduct non-white, immigrant, LGTB and drug-addicted customers with murderous intent. There is also some evidence of rudimentary tactics, e.g. using code words for each kind of victim in order to lure unwitting victims to the killing box. Dani has been recently groomed into the group and partakes in their activities, although it is made apparent to the audience that he does so reluctantly. The Family seems to be generally successful in its exploits: two murders are shown on-screen, as well as a raid on a Moroccan camp during which Dani accidentally kills a man. The film does show, however, one botched attack during which the victim manages to run away unharmed. It is left unclear whether or not the group is restricted to the core members shown onscreen. It is also left unclear whether the group's activities were prompted by the attack that crippled Dani's father or it was just a casual event that fuelled them further, or whether the version given to Paz was a lie and the attack was a direct result of the group's illegal pursuits. Their activities are spontaneous and seemingly not officially sanctioned, but there are ominous references by Calero to people up there who know and approve of the group's activities, leaving the door open to speculation on a wider conspiracy. Although initially ignorant of the actions and general demeanour of the Family, Paz gradually finds out the truth through a series of details and coincidences, such as a profusion of fascist and pre-democratic antics and regalia, purportedly intimate family reunions attracting skinheads and other sinister individuals and ending in massive Roman salutes, casual bigoted remarks from her father and Dani (as well as the latter's violent outburst on a black street vendor) and a disturbing pattern of codenames or labels directed at different collectivities. Paz's unassuming and long-suffering mother seems increasingly distraught at the prospect of having Dani close to her daughter, and although Paz initially blames this on his general lack of education and ambition, it becomes increasingly clear that her mother knows more than she says about him and the whole group. Once confronted with the evidence, her disgust and adamant disapproval set up a chain of events leading to the climax. The paranoid and increasingly unstable Calero, who saw her as a potential liability to the group from the very beginning, now wishes to silence Paz at all costs. He shoots (and presumably kills) her father when he tries to stop him, and confronts her and Dani in the Retiro Park, shooting Dani, and being shot to death by Paz seconds before he is about to finish him off. The camera pans over the park as she walks away with the injured Dani, and credits roll. The fate of the rest of the group is left undisclosed.",
    "Represent the plot: The film outlines the daily life of a punk named Stevo in Salt Lake City, Utah, in the fall of 1985. Stevo's best friend, Heroin Bob, is also a punk. The nickname Heroin is ironic, as Bob is afraid of needles and actually believes that any drug (with the notable exception of alcohol and cigarettes) is inherently dangerous. Stevo and Bob go from party to party while living in a dilapidated apartment. They spend much of their time fighting with members of other subcultures, particularly rednecks. Stevo has a casual relationship with a girl named Sandy, while Bob is in love with Trish, the owner of a head shop. The two of them are shaped by their experiences with their parents. Stevo's parents, now divorced, are former hippies who are proud of their youthful endeavors; however, Stevo is revolted by what he perceives as their selling out by becoming affluent Reagan Republicans, which they try to justify. Stevo's grades are excellent, and when his father sends an application to Harvard Law School and Stevo is accepted, he nevertheless rejects it because of his beliefs. By contrast, Bob's father is a mentally ill alcoholic who mistakes his son and his friend for Central Intelligence Agency operatives and chases them away with a shotgun when they visit him on his birthday. Stevo begins to see the drawbacks of living the punk life. Sean, a fellow punk, was a drug dealer who once attempted to stab his mother while under the influence of an entire 100-dose sheet of acid; in the present, Stevo finds him panhandling on the street with some obvious mental issues. While Stevo understands that his relationship with Sandy is casual, he is still enraged when he discovers her having sex with another man and savagely beats him, later loathing himself because his action contradicts his own belief in anarchism. His social circle also begins to drift away, as his dealer, Mark, and his friend, Mike, both leave Salt Lake City (Mark to return to Miami, Mike to attend the University of Notre Dame). Soon after, Stevo attends a party and falls in love with a rich girl named Brandy, who points out that his clothing and hair are fashion as opposed to true rebellion. Rather than being offended, Stevo takes the criticism thoughtfully, and they passionately kiss. At the same party, Bob complains of a headache (induced by Spandau Ballet's She Loved Like Diamond playing on a stereo), and is given Percodan, which he consumes with alcohol after being told the pills are simply vitamins that will help his headache. The accidental drug overdose kills him in his sleep. When Stevo discovers Bob's body, he breaks down completely. At the funeral, he appears with a shaved head and changed clothing, having decided he is done with being a punk. He plans to go to Harvard, and earlier narration suggests that he eventually marries Brandy. He notes in his closing narration that his youthful self would probably kick his future self's ass, wryly describing himself as having been ultimately just another poser.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6338, 0.2186],
#         [0.6338, 1.0000, 0.1956],
#         [0.2186, 0.1956, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,336 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 9 tokens
    • mean: 104.8 tokens
    • max: 128 tokens
    • min: 22 tokens
    • mean: 122.47 tokens
    • max: 128 tokens
  • Samples:
    sentence_0 sentence_1
    Represent the plot: A biographical film that recounts parts of the life of the legendary World War I flying ace Manfred von Richthofen, whose skill and exploits led him to become a legend during that conflict, feared by his adversaries and acclaimed by the Blue Barons. Represent the plot: In 1916, Manfred von Richthofen is serving as a fighter pilot with the Imperial German Air Service along the Western Front. After dropping a wreath over the funeral of an Allied pilot, Richthofen and his fellow pilots Werner Voss and Friedrich Sternberg encounter a squadron of the Royal Flying Corps led by Captain Lanoe Hawker. Richthofen shoots down Canadian pilot Arthur Roy Brown. After pulling Brown out of the wreckage of his aircraft, Richthofen assists Nurse Käte Otersdorf with a tourniquet on Brown's leg. After killing Hawker, Richthofen is awarded the Pour le Mérite medal and promoted to command a squadron. He is joined by his brother Lothar von Richthofen (Volker Bruch). He orders his men to avoid killing enemy pilots unless absolutely necessary and is dismayed when Lothar deliberately strafes and kills a British pilot who has already been forced into a landing. Later, during an aerial dogfight, Richthofen again encounters Captain Brown, who has escaped from...
    Represent the plot: The film is set in early 1900s Paris. Amedeo Modigliani, a talented and passionate artist, tries to find his way to the top of the art world. Modigliani's main rival is Pablo Picasso, who also lived in Paris at the time. Besides Modigliani and Picasso, the film portrays the Parisian artists of the early 1900s; for example, Maurice Utrillo, Chaim Soutine, and Valhir Voorijaard, Modigliani's friends. Modigliani falls in love with Jeanne Hébuterne, who soon becomes pregnant. However, Hébuterne's parents don't think Modigliani would be a suitable father because he is Jewish. Modigliani tries to gain recognition for his paintings not only for himself but also for his child. Represent the plot: Set in Paris in 1919, this biopic presents the life of Italian artist Amedeo Modigliani, centering, artistically, on his relationship to and rivalry with Pablo Picasso when they both lived in Paris. Modigliani, an Italian Jew from Livorno, has fallen in love with Jeanne Hébuterne, a young and beautiful French Catholic girl. The couple have a child, and Jeanne's bigoted father sends the baby to a faraway convent to be raised by nuns. Modigliani is distraught but needs money to rescue and raise his child. Paris' annual art competition is in the offing. Prize money and a guaranteed career await the winner. Neither Modigliani nor his rival Picasso have ever entered the competition, believing that it is beneath true artists like themselves. But push comes to shove with the welfare of his child on the line, and the impoverished Modigliani signs up for the competition in a drunken and drug-induced act at the center of a café frequented by artists, including Picasso, who is...
    Represent the plot: Izzie is a fish who lives with her father, Harold, in an aquarium, and is frequently bullied by other fish in the surrounding area. Harold tries to protect Izzie from being returned to the ocean by the human who maintains the aquarium, since that same event is what separated them from Izzie's mother. Izzie and Harold end up being returned to the sea and are separated during an underwater volcanic eruption. A boat supporting the aquarium capsizes, causing the other fish to spill into the water. Izzie befriends the other fish while she and her father search for each other. Represent the plot: Izzie is a fish who lives with her father Harold in an aquarium, and is frequently bullied by the other fish in the vicinity. Harold tries to protect Izzie from being returned to the ocean by the human who maintains the aquarium, as that very event is what separated them from Izzie's mother. Izzie and Harold do end up being returned to the sea, and are separated during the eruption of an underwater volcano. A boat holding the aquarium tips over, causing the other fish to spill into the waters. Izzie befriends the other fish as she and her father search for one another.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
1.7123 500 0.5724

Framework Versions

  • Python: 3.9.21
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.6
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahmedHamdi/ir-es-en-instructor-xl

Finetuned
(16)
this model

Papers for ahmedHamdi/ir-es-en-instructor-xl