SentenceTransformer based on sentence-transformers/all-distilroberta-v1

This is a sentence-transformers model finetuned from sentence-transformers/all-distilroberta-v1 on the ai-job-embedding-finetuning dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("khengkok/distilroberta-ai-job-embeddings")
# Run inference
queries = [
    "Revenue Operations data analysis, sales forecasting models, sales territory and quota optimization",
]
documents = [
    "skills:\n\nExperience with “Lean Management” and/or “Six Sigma” concepts.Be able to analyze processes/workflows and find opportunities to streamline/improve/eliminate waste.Be able to create value stream maps Experience with Microsoft Viso.Office products (MS Word/MS Excel/Teams) MS Access\n\nMinimum required work experience:\n\nExcellent entry level opportunity!\n\nJob/class description:\n\nExtracts data from multiple systems and departments using various data manipulation and extraction techniques for regular, cyclical, and ad hoc reporting.Performs research, analyzes reports, and creates statistical models for presentation/review. Summarizes findings and communicates results to management.Identifies operational inadequacies and uses various skills and resources to retool processes.Communicates with other areas regarding outcomes and reporting.\n\nRequired knowledge, skills, and abilities:\n\nGood organizational, customer service, communications, and analytical skills.Ability to use complex mathematical calculations and understand mathematical and statistical concepts.Knowledge of relevant computer support systems.Microsoft Office.Ability to acquire programming skills across various software platforms.Good communication verbal/written, good organization, good analysis, customer service, cross team facilitation.\n\nPreferred knowledge, skills, and abilities:\n\nNegotiation or persuasion skills.Ability to acquire or knowledge of ICD9/CPT4 coding.SAS and/or DB2, or other relational database.\n\nWork environment:\n\nTypical office environment. Some travel between buildings and out of town.The team has 11 members, each are diverse individuals whom strive to exceed customer expectations. With in the greater team is a smaller team of 3 individuals whom compose the “plan” team.This person would be a part of this sub team.They work as a close-knit group and embrace a team atmosphere.They enjoy having fun while getting the work done\n\nRequired education/equivalencies:\n\nBachelor's degree Statistics, Computer Science, Mathematics, Business, Healthcare, or other related field.OR 2 year degree in Computer Science, Business or related field and 2 years of reporting and data analysis work experienceOR 4 years reporting and data analysis experience.\n\nInterested? Learn more:\n\nClick the apply button or contact our recruiter Kyle at Kyle.Croft@dppit.com to learn more about this position (#24-00288).\n\nDPP offers a range of compensation and benefits packages to our employees and their eligible dependents. Call today to learn more about working with DPP.\n\nUS Citizen: This role requires the ability to obtain a low-level US security clearance, which requires a thorough background search and US citizenship. Residency requirements may apply.",
    'requirements, collect data, lead cleansing efforts, and load/support data into SAPthe gap between business and IT teams, effectively communicating data models and setting clear expectations of deliverablesand maintain trackers to showcase progress and hurdles to Project Managers and Stakeholders\nQualifications\nknowledge of SAP and MDGcommunication skillsto manage multiple high-priority, fast-paced projects with attention to detail and organizationan excellent opportunity to learn an in-demand area of SAP MDGa strong willingness to learn, with unlimited potential for growth and plenty of opportunities to expand skills\nThis role offers a dynamic environment where you can directly impact IT projects and contribute to the company’s success. You will work alongside a supportive team of professionals, with ample opportunities for personal and professional development. \nIf you’re ready to take on new challenges and grow your career in data analytics and SAP, apply now and be part of our journey toward excellence.',
    "experience with a minimum of 0+ years of experience in a Computer Science or Data Management related fieldTrack record of implementing software engineering best practices for multiple use cases.Experience of automation of the entire machine learning model lifecycle.Experience with optimization of distributed training of machine learning models.Use of Kubernetes and implementation of machine learning tools in that context.Experience partnering and/or collaborating with teams that have different competences.The role holder will possess a blend of design skills needed for Agile data development projects.Proficiency or passion for learning, in data engineer techniques and testing methodologies and Postgraduate degree in data related field of study will also help. \n\n\nDesirable for the role\n\n\nExperience with DevOps or DataOps concepts, preferably hands-on experience implementing continuous integration or highly automated end-to-end environments.Interest in machine learning will also be advantageous.Experience implementing a microservices architecture.Demonstrate initiative, strong customer orientation, and cross-cultural working.Strong communication and interpersonal skills.Prior significant experience working in Pharmaceutical or Healthcare industry environment.Experience of applying policies, procedures, and guidelines.\n\n\nWhy AstraZeneca?\n\nWe follow all applicable laws and regulations on non-discrimination in employment (and recruitment), as well as work authorization and employment eligibility verification requirements. We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment.\n\nWhen we put unexpected teams in the same room, we unleash bold thinking with the power to inspire life-changing medicines. In-person working gives us the platform we need to connect, work at pace and challenge perceptions. That’s why we work, on average, a minimum of three days per week from the office. But that doesn't mean we’re not flexible. We balance the expectation of being in the office while respecting individual flexibility. Join us in our unique and ambitious world.\n\nCompetitive Salary & Benefits\n\nClose date: 10/05/2024\n\nSo, what’s next! \n\n\nAre you already imagining yourself joining our team? Good, because we can’t wait to hear from you. Don't delay, apply today!\n\n\nWhere can I find out more?\n\nOur Social Media, Follow AstraZeneca on LinkedIn: https://www.linkedin.com/company/1603/\n\nInclusion & Diversity: https://careers.astrazeneca.com/inclusion-diversity\n\nCareer Site: https://careers.astrazeneca.com/",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 768] [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2975, 0.3023, 0.2415]])

Evaluation

Metrics

Triplet

Metric ai-job-validation ai-job-test
cosine_accuracy 0.4545 0.6

Training Details

Training Dataset

ai-job-embedding-finetuning

  • Dataset: ai-job-embedding-finetuning at f10e8c6
  • Size: 269 training samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 269 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 8 tokens
    • mean: 14.81 tokens
    • max: 26 tokens
    • min: 14 tokens
    • mean: 330.53 tokens
    • max: 512 tokens
    • min: 10 tokens
    • mean: 333.68 tokens
    • max: 512 tokens
  • Samples:
    query job_description_pos job_description_neg
    Orlando data analyst SQL query optimization advanced dashboard development utilities industry insights skills in a global environment. Finally, you will interact with other members of our United States Health and Benefits team and can make important contributions to process improvements and new analytical tools.

    This position requires an analytical mind who is detail oriented with work product and outputs using Microsoft Office tools. The position also requires the ability to accurately execute written and verbal instructions.

    The Role

    Manage NQTL Operational Data Portion Of Parity Assessment, Including

    Prepare NQTL carrier operational data requests on behalf of each client/carrierCoordinate with Project Manager regarding sending requests, timing, status, and follow-upAttend internal and client kick off meeting with QTL/NQTL team Monitor carrier and vendor responsiveness to data requestsValidate completeness of response and report any issues or impact to timeline proactively to Project ManagerComplete initial review of carrier responses for parity projectsMap carrier responses to ap...
    skills:Proficiency in Python programming languageKnowledge of natural language processing (NLP), data science, and deep learning algorithms (RNN, CNN, etc.)Ability to implement machine learning algorithms and statistical analysisStrong presentation and teaching skills to articulate complex concepts to non-technical audiencesUnderstanding of data structures and algorithms in PythonExcellent research skills, utilizing papers, textbooks, online resources, and GitHub repositoriesPotential involvement in writing and publishing academic papers
    Qualifications2nd or 3rd-year undergraduate student in computer science or statisticsRequired experience: candidates must have completed at least three of the following courses: Statistics, Machine Learning, Deep Learning, AI, and Data Structures and Algorithms.GPA of 3.5 or higher.Ability to work independently and collaborativelyExcellent problem-solving and analytical skillsStrong written and verbal communication skills
    Relevant coursework projects o...
    Clarity PPM data analysis, project portfolio reporting, resource capacity planning requirements into an efficient process and/or system solution? If so, DHL Supply Chain has the opportunity for you.
    Job DescriptionTo apply knowledge and analytics to develop and communicate timely, accurate, and actionable insight to the business through the use of modeling, visualization, and optimization. Responsible for the reporting, analyzing, and predicting of operational processes, performance, and Key Performance Indicators. Communication with site leadership, operations, and finance on efficiency, customer requirements, account specific issues, and insight into to the business, operations, and customer.
    Applies hindsight, insight, and foresight techniques to communicate complex findings and recommendations to influence others to take actionUses knowledge of business and data structure to discover and/or anticipate problems where data can be used to solve the problemUses spreadsheets, databases, and relevant software to provide ongoing analysis of operational activitiesApplies...
    Qualifications)

    Bachelor's degree in a relevant field such as mathematics, statistics, or computer science Minimum of 5 years of experience as a data analyst or similar role Proficiency in SQL, Python, and data visualization tools Strong analytical and problem-solving skills Excellent written and verbal communication skills

    How To Stand Out (Preferred Qualifications)

    Master's degree in a relevant field Experience with machine learning and predictive modeling Knowledge of cloud-based data platforms such as AWS or Google Cloud Familiarity with Agile methodologies and project management tools Strong attention to detail and ability to work independently

    #RecruitingSoftware #DataAnalysis #RemoteWork #CareerOpportunity #CompetitivePay

    At Talentify, we prioritize candidate privacy and champion equal-opportunity employment. Central to our mission is our partnership with companies that share this commitment. We aim to foster a fair, transparent, and secure hiring environment for all. If ...
    AAA game AI engineer pathfinding vehicle navigation skills and knowledge in a supportive and empowering environment.
    Technology StackWe utilize the Google Cloud Platform, Python, SQL, BigQuery, and Looker Studio for data analysis and management.We ingest data from a variety of third-party tools, each providing unique insights.Our stack includes DBT and Fivetran for efficient data integration and transformation.
    Key ResponsibilitiesCollaborate with teams to understand data needs and deliver tailored solutions.Analyze large sets of structured and unstructured data to identify trends and insights.Develop and maintain databases and data systems for improved data quality and accessibility.Create clear and effective data visualizations for stakeholders.Stay updated with the latest trends in data analysis and technologies.
    Qualifications and Skills2-3 years of hands-on experience in data.You can distill complex data into easy to read and interpret dashboards to enable leadership / business teams to gather data insights and monitor KPIs.Solid u...
    Qualifications: Good communication verbal/written, Good organization, Good analysis, Customer service, cross team facilitation.Experience with “Lean Management” and/or “Six Sigma” concepts.Be able to analyze processes/workflows and find opportunities to streamline/improve/eliminate waste.Be able to create value stream maps.Experience with Microsoft Visio.Office products (MS Word/MS Excel/Teams) MS AccessBachelors degree Statistics, Computer Science, Mathematics, Business, Healthcare, or other related field. or 2 year degree in Computer Science, Business or related field and 2 years of reporting and data analysis work experience OR 4 years reporting and data analysis experience.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

ai-job-embedding-finetuning

  • Dataset: ai-job-embedding-finetuning at f10e8c6
  • Size: 33 evaluation samples
  • Columns: query, job_description_pos, and job_description_neg
  • Approximate statistics based on the first 33 samples:
    query job_description_pos job_description_neg
    type string string string
    details
    • min: 10 tokens
    • mean: 15.36 tokens
    • max: 33 tokens
    • min: 21 tokens
    • mean: 329.55 tokens
    • max: 512 tokens
    • min: 31 tokens
    • mean: 321.73 tokens
    • max: 512 tokens
  • Samples:
    query job_description_pos job_description_neg
    Power BI dashboards, SQL data transformation, Databricks analytics consultant experience desired Extensive experience with database and SQL tools including MS SQL, Tableau, Visual BASIC, and EXCEL Ability to work with counterparts in the organization with varying levels of technical expertise, including Marketing, Product, and IT personnel Ability to work independently and efficiently on a high volume of tasks Stay updated with emerging trends and best practices in data visualization and analytics to continuously improve reporting capabilities

    Why Work For Us

    4 weeks accrued paid time off + 9 paid national holidays per year Tuition Reimbursement Low cost and excellent coverage health insurance options (medical, dental, vision) Gym membership reimbursement Robust health and wellness program and fitness reimbursements Auto and home insurance discounts Matching gift opportunities Annual 401(k) Employer Contribution (up to 7.5% of your base salary) Various Paid Family leave options including Paid Parental Leave $3,000 one-time bonus payment on healt...
    Qualifications:Relevant educational qualification or degree in Data analytics or Data Science or Statistics or Applied Mathematics or equivalent qualification. (Required)Experience with Tableau.(Optional)Familiar with Python, Big Data. (Optional)Proficient in SQL.Candidates who are missing the required skills, might be provided an option to enhance their skills, so that they can also apply for the role and can make a career in the IT industry.Freshers can also apply
    product analyst SQL data migration Agile user stories requirements, developing reporting, and enabling efficiencies. You will also encourage analytics independence as a subject matter expert and champion of business intelligence software (e.g. Power BI, Tableau, etc.). The group also leads the Accounting Department’s Robotic Process Automation efforts.

    Kiewit is known as an organization that encourages high performers to challenge themselves by operating in roles they may not be classically trained for. This position embodies this spirit as the experiences will lend themselves nicely into several potential paths including accounting roles / leadership, operations management, data analysis roles and technology group positions.

    District Overview

    At Kiewit, the scale of our operations is huge. Our construction and engineering projects span across the United States, Canada and Mexico, improving and connecting communities with every initiative. We depend on our high-performing operations support professionals — they’re the glue that holds m...
    QualificationsBachelor's degree in Computer Science, Statistics, Mathematics, Economics, or related field. At least five years of experience as a Data Analyst in a digital media or ecommerce setting.Proficiency in SQL, Python, R, or other programming languages for data manipulation and analysis.Experience with Google Data Studio or other data visualization tools.Experience creating custom data pipelines, automated reports, and data visualizations.Expertise in web and mobile analytics platforms (e.g. Google Analytics, Adobe Analytics, AppsFlyer, Amplitude).Current understanding of internet consumer data privacy matters.Excellent communication and collaboration skills, with the ability to present findings and recommendations to both technical and non-technical stakeholders.Strong analytical skills and attention to detail, with the ability to translate complex data into actionable insights.

    Preferred QualificationsExperience with video delivery systems (encoding platforms, video players,...
    healthcare claims data analysis, complex SQL optimization, ETL process support requirements, and integrated management systems for our countries civilian agencies (FAA, FDIC, HOR, etc.).Our primary mission is to best serve the needs of our clients by solutioning with our stakeholder teams to ensure that the goals and objectives of our customers are proactively solutioned, such that opportunities to invest our time in developing long-term solutions and assets are abundant and move our clients forward efficiently.At DEVIS, we are enthusiastic about our research, our work and embracing an environment where all are supported in the mission, while maintaining a healthy work-life balance.
    We are currently seeking a Data Analyst to join one of our Department of State programs. The candidate would support the Bureau of Population, Refugees, and Migration (PRM) Refugee Processing Center (RPC) in Rosslyn, VA. The ideal candidate must be well-versed in ETL services and adept at gathering business requirements from diverse stakeholders, assessing the pros/cons of ETL tools, ...
    experience in data analysis, preferably in a data warehouse environment.Strong proficiency in SQL and experience with data modeling and mapping.Familiarity with star schema design and data warehousing concepts.Excellent analytical and problem-solving skills.Strong communication and interpersonal skills, with the ability to explain complex data concepts to non-technical stakeholders.Ability to manage multiple projects and meet deadlines in a fast-paced environment.Experience with data visualization tools (e.g., Tableau) is a plus. Required Soft Skills:Good analytical and problem-solving skillsExceptional communication skills (written and verbal)Good documentation skillsProficiency in English language (as a medium of communication)Frank and open communication with peers and higher-ups about realistic estimations and meeting timelines/expectations and proactive communication of issues and concerns thereof.Nice to have:Dimensional Modeling using Star SchemaKnowledge about ETL tools and how...
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 1
  • warmup_ratio: 0.1
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step ai-job-validation_cosine_accuracy ai-job-test_cosine_accuracy
-1 -1 0.4545 0.6000

Framework Versions

  • Python: 3.12.12
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.1
  • PyTorch: 2.9.0+cu126
  • Accelerate: 1.11.0
  • Datasets: 4.0.0
  • Tokenizers: 0.22.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
1
Safetensors
Model size
82.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for khengkok/distilroberta-ai-job-embeddings

Finetuned
(50)
this model

Dataset used to train khengkok/distilroberta-ai-job-embeddings

Papers for khengkok/distilroberta-ai-job-embeddings

Evaluation results