IoannisKat1's picture
Add finetuned model
3d08814 verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:391
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: nomic-ai/modernbert-embed-base
widget:
  - source_sentence: What does 'personal data breach' entail?
    sentences:
      - >-
        1.Processing of personal data revealing racial or ethnic origin,
        political opinions, religious or philosophical beliefs, or trade union
        membership, and the processing of genetic data, biometric data for the
        purpose of uniquely identifying a natural person, data concerning health
        or data concerning a natural person's sex life or sexual orientation
        shall be prohibited.

        2.Paragraph 1 shall not apply if one of the following applies: (a)  the
        data subject has given explicit consent to the processing of those
        personal data for one or more specified purposes, except where Union or
        Member State law provide that the prohibition referred to in paragraph 1
        may not be lifted by the data subject; (b)  processing is necessary for
        the purposes of carrying out the obligations and exercising specific
        rights of the controller or of the data subject in the field of
        employment and social security and social protection law in so far as it
        is authorised by Union or Member State law or a collective agreement
        pursuant to Member State law providing for appropriate safeguards for
        the fundamental rights and the interests of the data subject; (c) 
        processing is necessary to protect the vital interests of the data
        subject or of another natural person where the data subject is
        physically or legally incapable of giving consent; (d)  processing is
        carried out in the course of its legitimate activities with appropriate
        safeguards by a foundation, association or any other not-for-profit body
        with a political, philosophical, religious or trade union aim and on
        condition that the processing relates solely to the members or to former
        members of the body or to persons who have regular contact with it in
        connection with its purposes and that the personal data are not
        disclosed outside that body without the consent of the data subjects;
        (e)  processing relates to personal data which are manifestly made
        public by the data subject; (f)  processing is necessary for the
        establishment, exercise or defence of legal claims or whenever courts
        are acting in their judicial capacity; (g)  processing is necessary for
        reasons of substantial public interest, on the basis of Union or Member
        State law which shall be proportionate to the aim pursued, respect the
        essence of the right to data protection and provide for suitable and
        specific measures to safeguard the fundamental rights and the interests
        of the data subject; (h)  processing is necessary for the purposes of
        preventive or occupational medicine, for the assessment of the working
        capacity of the employee, medical diagnosis, the provision of health or
        social care or treatment or the management of health or social care
        systems and services on the basis of Union or Member State law or
        pursuant to contract with a health professional and subject to the
        conditions and safeguards referred to in paragraph 3; (i)  processing is
        necessary for reasons of public interest in the area of public health,
        such as protecting against serious cross-border threats to health or
        ensuring high standards of quality and safety of health care and of
        medicinal products or medical devices, on the basis of Union or Member
        State law which provides for suitable and specific measures to safeguard
        the rights and freedoms of the data subject, in particular professional
        secrecy; 4.5.2016 L 119/38   (j)  processing is necessary for archiving
        purposes in the public interest, scientific or historical research
        purposes or statistical purposes in accordance with Article 89(1) based
        on Union or Member State law which shall be proportionate to the aim
        pursued, respect the essence of the right to data protection and provide
        for suitable and specific measures to safeguard the fundamental rights
        and the interests of the data subject.

        3.Personal data referred to in paragraph 1 may be processed for the
        purposes referred to in point (h) of paragraph 2 when those data are
        processed by or under the responsibility of a professional subject to
        the obligation of professional secrecy under Union or Member State law
        or rules established by national competent bodies or by another person
        also subject to an obligation of secrecy under Union or Member State law
        or rules established by national competent bodies.

        4.Member States may maintain or introduce further conditions, including
        limitations, with regard to the processing of genetic data, biometric
        data or data concerning health.
      - >-
        1) 'personal data' means any information relating to an identified or
        identifiable natural person ('data subject'); an identifiable natural
        person is one who can be identified, directly or indirectly, in
        particular by reference to an identifier such as a name, an
        identification number, location data, an online identifier or to one or
        more factors specific to the physical, physiological, genetic, mental,
        economic, cultural or social identity of that natural person;

        (2) ‘processing’ means any operation or set of operations which is
        performed on personal data or on sets of personal data, whether or not
        by automated means, such as collection, recording, organisation,
        structuring, storage, adaptation or alteration, retrieval, consultation,
        use, disclosure by transmission, dissemination or otherwise making
        available, alignment or combination, restriction, erasure or
        destruction;

        (3) ‘restriction of processing’ means the marking of stored personal
        data with the aim of limiting their processing in the future;

        (4) ‘profiling’ means any form of automated processing of personal data
        consisting of the use of personal data to evaluate certain personal
        aspects relating to a natural person, in particular to analyse or
        predict aspects concerning that natural person's performance at work,
        economic situation, health, personal preferences, interests,
        reliability, behaviour, location or movements;

        (5) ‘pseudonymisation’ means the processing of personal data in such a
        manner that the personal data can no longer be attributed to a specific
        data subject without the use of additional information, provided that
        such additional information is kept separately and is subject to
        technical and organisational measures to ensure that the personal data
        are not attributed to an identified or identifiable natural person;

        (6) ‘filing system’ means any structured set of personal data which are
        accessible according to specific criteria, whether centralised,
        decentralised or dispersed on a functional or geographical basis;

        (7) ‘controller’ means the natural or legal person, public authority,
        agency or other body which, alone or jointly with others, determines the
        purposes and means of the processing of personal data; where the
        purposes and means of such processing are determined by Union or Member
        State law, the controller or the specific criteria for its nomination
        may be provided for by Union or Member State law;

        (8) ‘processor’ means a natural or legal person, public authority,
        agency or other body which processes personal data on behalf of the
        controller;

        (9) ‘recipient’ means a natural or legal person, public authority,
        agency or another body, to which the personal data are disclosed,
        whether a third party or not. However, public authorities which may
        receive personal data in the framework of a particular inquiry in
        accordance with Union or Member State law shall not be regarded as
        recipients; the processing of those data by those public authorities
        shall be in compliance with the applicable data protection rules
        according to the purposes of the processing;

        (10) ‘third party’ means a natural or legal person, public authority,
        agency or body other than the data subject, controller, processor and
        persons who, under the direct authority of the controller or processor,
        are authorised to process personal data;

        (11) ‘consent’ of the data subject means any freely given, specific,
        informed and unambiguous indication of the data subject's wishes by
        which he or she, by a statement or by a clear affirmative action,
        signifies agreement to the processing of personal data relating to him
        or her;

        (12) ‘personal data breach’ means a breach of security leading to the
        accidental or unlawful destruction, loss, alteration, unauthorised
        disclosure of, or access to, personal data transmitted, stored or
        otherwise processed;

        (13) ‘genetic data’ means personal data relating to the inherited or
        acquired genetic characteristics of a natural person which give unique
        information about the physiology or the health of that natural person
        and which result, in particular, from an analysis of a biological sample
        from the natural person in question;

        (14) ‘biometric data’ means personal data resulting from specific
        technical processing relating to the physical, physiological or
        behavioural characteristics of a natural person, which allow or confirm
        the unique identification of that natural person, such as facial images
        or dactyloscopic data;

        (15) ‘data concerning health’ means personal data related to the
        physical or mental health of a natural person, including the provision
        of health care services, which reveal information about his or her
        health status;

        (16) ‘main establishment’ means: (a) as regards a controller with
        establishments in more than one Member State, the place of its central
        administration in the Union, unless the decisions on the purposes and
        means of the processing of personal data are taken in another
        establishment of the controller in the Union and the latter
        establishment has the power to have such decisions implemented, in which
        case the establishment having taken such decisions is to be considered
        to be the main establishment; (b) as regards a processor with
        establishments in more than one Member State, the place of its central
        administration in the Union, or, if the processor has no central
        administration in the Union, the establishment of the processor in the
        Union where the main processing activities in the context of the
        activities of an establishment of the processor take place to the extent
        that the processor is subject to specific obligations under this
        Regulation;

        (17) ‘representative’ means a natural or legal person established in the
        Union who, designated by the controller or processor in writing pursuant
        to Article 27, represents the controller or processor with regard to
        their respective obligations under this Regulation;

        (18) ‘enterprise’ means a natural or legal person engaged in an economic
        activity, irrespective of its legal form, including partnerships or
        associations regularly engaged in an economic activity;

        (19) ‘group of undertakings’ means a controlling undertaking and its
        controlled undertakings;

        (20) ‘binding corporate rules’ means personal data protection policies
        which are adhered to by a controller or processor established on the
        territory of a Member State for transfers or a set of transfers of
        personal data to a controller or processor in one or more third
        countries within a group of undertakings, or group of enterprises
        engaged in a joint economic activity;

        (21) ‘supervisory authority’ means an independent public authority which
        is established by a Member State pursuant to Article 51;

        (22) ‘supervisory authority concerned’ means a supervisory authority
        which is concerned by the processing of personal data because: (a) the
        controller or processor is established on the territory of the Member
        State of that supervisory authority; (b) data subjects residing in the
        Member State of that supervisory authority are substantially affected or
        likely to be substantially affected by the processing; or (c) a
        complaint has been lodged with that supervisory authority;

        (23) ‘cross-border processing’ means either: (a) processing of personal
        data which takes place in the context of the activities of
        establishments in more than one Member State of a controller or
        processor in the Union where the controller or processor is established
        in more than one Member State; or (b) processing of personal data which
        takes place in the context of the activities of a single establishment
        of a controller or processor in the Union but which substantially
        affects or is likely to substantially affect data subjects in more than
        one Member State.

        (24) ‘relevant and reasoned objection’ means an objection to a draft
        decision as to whether there is an infringement of this Regulation, or
        whether envisaged action in relation to the controller or processor
        complies with this Regulation, which clearly demonstrates the
        significance of the risks posed by the draft decision as regards the
        fundamental rights and freedoms of data subjects and, where applicable,
        the free flow of personal data within the Union;

        (25) ‘information society service’ means a service as defined in point
        (b) of Article 1(1) of Directive (EU) 2015/1535 of the European
        Parliament and of the Council (1);

        (26) ‘international organisation’ means an organisation and its
        subordinate bodies governed by public international law, or any other
        body which is set up by, or on the basis of, an agreement between two or
        more countries.
      - >-
        Any processing of personal data should be lawful and fair. It should be
        transparent to natural persons that personal data concerning them are
        collected, used, consulted or otherwise processed and to what extent the
        personal data are or will be processed. The principle of transparency
        requires that any information and communication relating to the
        processing of those personal data be easily accessible and easy to
        understand, and that clear and plain language be used. That principle
        concerns, in particular, information to the data subjects on the
        identity of the controller and the purposes of the processing and
        further information to ensure fair and transparent processing in respect
        of the natural persons concerned and their right to obtain confirmation
        and communication of personal data concerning them which are being
        processed. Natural persons should be made aware of risks, rules,
        safeguards and rights in relation to the processing of personal data and
        how to exercise their rights in relation to such processing. In
        particular, the specific purposes for which personal data are processed
        should be explicit and legitimate and determined at the time of the
        collection of the personal data. The personal data should be adequate,
        relevant and limited to what is necessary for the purposes for which
        they are processed. This requires, in particular, ensuring that the
        period for which the personal data are stored is limited to a strict
        minimum. Personal data should be processed only if the purpose of the
        processing could not reasonably be fulfilled by other means. In order to
        ensure that the personal data are not kept longer than necessary, time
        limits should be established by the controller for erasure or for a
        periodic review. Every reasonable step should be taken to ensure that
        personal data which are inaccurate are rectified or deleted. Personal
        data should be processed in a manner that ensures appropriate security
        and confidentiality of the personal data, including for preventing
        unauthorised access to or use of personal data and the equipment used
        for the processing.
  - source_sentence: >-
      In what situations could providing information to the data subject be
      considered impossible or involve a disproportionate effort?
    sentences:
      - >-
        1.The controller shall consult the supervisory authority prior to
        processing where a data protection impact assessment under Article 35
        indicates that the processing would result in a high risk in the absence
        of measures taken by the controller to mitigate the risk.

        2.Where the supervisory authority is of the opinion that the intended
        processing referred to in paragraph 1 would infringe this Regulation, in
        particular where the controller has insufficiently identified or
        mitigated the risk, the supervisory authority shall, within period of up
        to eight weeks of receipt of the request for consultation, provide
        written advice to the controller and, where applicable to the processor,
        and may use any of its powers referred to in Article 58. That period may
        be extended by six weeks, taking into account the complexity of the
        intended processing. The supervisory authority shall inform the
        controller and, where applicable, the processor, of any such extension
        within one month of receipt of the request for consultation together
        with the reasons for the delay. Those periods may be suspended until the
        supervisory authority has obtained information it has requested for the
        purposes of the consultation.

        3.When consulting the supervisory authority pursuant to paragraph 1, the
        controller shall provide the supervisory authority with: (a)  where
        applicable, the respective responsibilities of the controller, joint
        controllers and processors involved in the processing, in particular for
        processing within a group of undertakings; (b)  the purposes and means
        of the intended processing; (c)  the measures and safeguards provided to
        protect the rights and freedoms of data subjects pursuant to this
        Regulation; (d)  where applicable, the contact details of the data
        protection officer; 4.5.2016 L 119/54   (e)  the data protection impact
        assessment provided for in Article 35; and (f)  any other information
        requested by the supervisory authority.

        4.Member States shall consult the supervisory authority during the
        preparation of a proposal for a legislative measure to be adopted by a
        national parliament, or of a regulatory measure based on such a
        legislative measure, which relates to processing.

        5.Notwithstanding paragraph 1, Member State law may require controllers
        to consult with, and obtain prior authorisation from, the supervisory
        authority in relation to processing by a controller for the performance
        of a task carried out by the controller in the public interest,
        including processing in relation to social protection and public health
      - >-
        1.The Member States, the supervisory authorities, the Board and the
        Commission shall encourage, in particular at Union level, the
        establishment of data protection certification mechanisms and of data
        protection seals and marks, for the purpose of demonstrating compliance
        with this Regulation of processing operations by controllers and
        processors. The specific needs of micro, small and medium-sized
        enterprises shall be taken into account. 4.5.2016 L 119/58  

        2.In addition to adherence by controllers or processors subject to this
        Regulation, data protection certification mechanisms, seals or marks
        approved pursuant to paragraph 5 of this Article may be established for
        the purpose of demonstrating the existence of appropriate safeguards
        provided by controllers or processors that are not subject to this
        Regulation pursuant to Article 3 within the framework of personal data
        transfers to third countries or international organisations under the
        terms referred to in point (f) of Article 46(2). Such controllers or
        processors shall make binding and enforceable commitments, via
        contractual or other legally binding instruments, to apply those
        appropriate safeguards, including with regard to the rights of data
        subjects.

        3.The certification shall be voluntary and available via a process that
        is transparent.

        4.A certification pursuant to this Article does not reduce the
        responsibility of the controller or the processor for compliance with
        this Regulation and is without prejudice to the tasks and powers of the
        supervisory authorities which are competent pursuant to Article 55 or 56

        5.A certification pursuant to this Article shall be issued by the
        certification bodies referred to in Article 43 or by the competent
        supervisory authority, on the basis of criteria approved by that
        competent supervisory authority pursuant to Article 58(3) or by the
        Board pursuant to Article 63. Where the criteria are approved by the
        Board, this may result in a common certification, the European Data
        Protection Seal.

        6.The controller or processor which submits its processing to the
        certification mechanism shall provide the certification body referred to
        in Article 43, or where applicable, the competent supervisory authority,
        with all information and access to its processing activities which are
        necessary to conduct the certification procedure.

        7.Certification shall be issued to a controller or processor for a
        maximum period of three years and may be renewed, under the same
        conditions, provided that the relevant requirements continue to be met.
        Certification shall be withdrawn, as applicable, by the certification
        bodies referred to in Article 43 or by the competent supervisory
        authority where the requirements for the certification are not or are no
        longer met.

        8.The Board shall collate all certification mechanisms and data
        protection seals and marks in a register and shall make them publicly
        available by any appropriate means.
      - >-
        However, it is not necessary to impose the obligation to provide
        information where the data subject already possesses the information,
        where the recording or disclosure of the personal data is expressly laid
        down by law or where the provision of information to the data subject
        proves to be impossible or would involve a disproportionate effort. The
        latter could in particular be the case where processing is carried out
        for archiving purposes in the public interest, scientific or historical
        research purposes or statistical purposes. In that regard, the number of
        data subjects, the age of the data and any appropriate safeguards
        adopted should be taken into consideration.
  - source_sentence: >-
      What is the data subject provided with prior to further processing of
      personal data?
    sentences:
      - >-
        1.Where personal data relating to a data subject are collected from the
        data subject, the controller shall, at the time when personal data are
        obtained, provide the data subject with all of the following
        information: (a)  the identity and the contact details of the controller
        and, where applicable, of the controller's representative; (b)  the
        contact details of the data protection officer, where applicable; (c) 
        the purposes of the processing for which the personal data are intended
        as well as the legal basis for the processing; 4.5.2016 L 119/40   (d) 
        where the processing is based on point (f) of Article 6(1), the
        legitimate interests pursued by the controller or by a third party; (e) 
        the recipients or categories of recipients of the personal data, if any;
        (f)  where applicable, the fact that the controller intends to transfer
        personal data to a third country or international organisation and the
        existence or absence of an adequacy decision by the Commission, or in
        the case of transfers referred to in Article 46 or 47, or the second
        subparagraph of Article 49(1), reference to the appropriate or suitable
        safeguards and the means by which to obtain a copy of them or where they
        have been made available.

        2.In addition to the information referred to in paragraph 1, the
        controller shall, at the time when personal data are obtained, provide
        the data subject with the following further information necessary to
        ensure fair and transparent processing: (a)  the period for which the
        personal data will be stored, or if that is not possible, the criteria
        used to determine that period; (b)  the existence of the right to
        request from the controller access to and rectification or erasure of
        personal data or restriction of processing concerning the data subject
        or to object to processing as well as the right to data portability;
        (c)  where the processing is based on point (a) of Article 6(1) or point
        (a) of Article 9(2), the existence of the right to withdraw consent at
        any time, without affecting the lawfulness of processing based on
        consent before its withdrawal; (d)  the right to lodge a complaint with
        a supervisory authority; (e)  whether the provision of personal data is
        a statutory or contractual requirement, or a requirement necessary to
        enter into a contract, as well as whether the data subject is obliged to
        provide the personal data and of the possible consequences of failure to
        provide such data; (f)  the existence of automated decision-making,
        including profiling, referred to in Article 22(1) and (4) and, at least
        in those cases, meaningful information about the logic involved, as well
        as the significance and the envisaged consequences of such processing
        for the data subject.

        3.Where the controller intends to further process the personal data for
        a purpose other than that for which the personal data were collected,
        the controller shall provide the data subject prior to that further
        processing with information on that other purpose and with any relevant
        further information as referred to in paragraph 2

        4.Paragraphs 1, 2 and 3 shall not apply where and insofar as the data
        subject already has the information.
      - >-
        This Regulation respects and does not prejudice the status under
        existing constitutional law of churches and religious associations or
        communities in the Member States, as recognised in Article 17 TFEU.
      - >-
        1) 'personal data' means any information relating to an identified or
        identifiable natural person ('data subject'); an identifiable natural
        person is one who can be identified, directly or indirectly, in
        particular by reference to an identifier such as a name, an
        identification number, location data, an online identifier or to one or
        more factors specific to the physical, physiological, genetic, mental,
        economic, cultural or social identity of that natural person;

        (2) ‘processing’ means any operation or set of operations which is
        performed on personal data or on sets of personal data, whether or not
        by automated means, such as collection, recording, organisation,
        structuring, storage, adaptation or alteration, retrieval, consultation,
        use, disclosure by transmission, dissemination or otherwise making
        available, alignment or combination, restriction, erasure or
        destruction;

        (3) ‘restriction of processing’ means the marking of stored personal
        data with the aim of limiting their processing in the future;

        (4) ‘profiling’ means any form of automated processing of personal data
        consisting of the use of personal data to evaluate certain personal
        aspects relating to a natural person, in particular to analyse or
        predict aspects concerning that natural person's performance at work,
        economic situation, health, personal preferences, interests,
        reliability, behaviour, location or movements;

        (5) ‘pseudonymisation’ means the processing of personal data in such a
        manner that the personal data can no longer be attributed to a specific
        data subject without the use of additional information, provided that
        such additional information is kept separately and is subject to
        technical and organisational measures to ensure that the personal data
        are not attributed to an identified or identifiable natural person;

        (6) ‘filing system’ means any structured set of personal data which are
        accessible according to specific criteria, whether centralised,
        decentralised or dispersed on a functional or geographical basis;

        (7) ‘controller’ means the natural or legal person, public authority,
        agency or other body which, alone or jointly with others, determines the
        purposes and means of the processing of personal data; where the
        purposes and means of such processing are determined by Union or Member
        State law, the controller or the specific criteria for its nomination
        may be provided for by Union or Member State law;

        (8) ‘processor’ means a natural or legal person, public authority,
        agency or other body which processes personal data on behalf of the
        controller;

        (9) ‘recipient’ means a natural or legal person, public authority,
        agency or another body, to which the personal data are disclosed,
        whether a third party or not. However, public authorities which may
        receive personal data in the framework of a particular inquiry in
        accordance with Union or Member State law shall not be regarded as
        recipients; the processing of those data by those public authorities
        shall be in compliance with the applicable data protection rules
        according to the purposes of the processing;

        (10) ‘third party’ means a natural or legal person, public authority,
        agency or body other than the data subject, controller, processor and
        persons who, under the direct authority of the controller or processor,
        are authorised to process personal data;

        (11) ‘consent’ of the data subject means any freely given, specific,
        informed and unambiguous indication of the data subject's wishes by
        which he or she, by a statement or by a clear affirmative action,
        signifies agreement to the processing of personal data relating to him
        or her;

        (12) ‘personal data breach’ means a breach of security leading to the
        accidental or unlawful destruction, loss, alteration, unauthorised
        disclosure of, or access to, personal data transmitted, stored or
        otherwise processed;

        (13) ‘genetic data’ means personal data relating to the inherited or
        acquired genetic characteristics of a natural person which give unique
        information about the physiology or the health of that natural person
        and which result, in particular, from an analysis of a biological sample
        from the natural person in question;

        (14) ‘biometric data’ means personal data resulting from specific
        technical processing relating to the physical, physiological or
        behavioural characteristics of a natural person, which allow or confirm
        the unique identification of that natural person, such as facial images
        or dactyloscopic data;

        (15) ‘data concerning health’ means personal data related to the
        physical or mental health of a natural person, including the provision
        of health care services, which reveal information about his or her
        health status;

        (16) ‘main establishment’ means: (a) as regards a controller with
        establishments in more than one Member State, the place of its central
        administration in the Union, unless the decisions on the purposes and
        means of the processing of personal data are taken in another
        establishment of the controller in the Union and the latter
        establishment has the power to have such decisions implemented, in which
        case the establishment having taken such decisions is to be considered
        to be the main establishment; (b) as regards a processor with
        establishments in more than one Member State, the place of its central
        administration in the Union, or, if the processor has no central
        administration in the Union, the establishment of the processor in the
        Union where the main processing activities in the context of the
        activities of an establishment of the processor take place to the extent
        that the processor is subject to specific obligations under this
        Regulation;

        (17) ‘representative’ means a natural or legal person established in the
        Union who, designated by the controller or processor in writing pursuant
        to Article 27, represents the controller or processor with regard to
        their respective obligations under this Regulation;

        (18) ‘enterprise’ means a natural or legal person engaged in an economic
        activity, irrespective of its legal form, including partnerships or
        associations regularly engaged in an economic activity;

        (19) ‘group of undertakings’ means a controlling undertaking and its
        controlled undertakings;

        (20) ‘binding corporate rules’ means personal data protection policies
        which are adhered to by a controller or processor established on the
        territory of a Member State for transfers or a set of transfers of
        personal data to a controller or processor in one or more third
        countries within a group of undertakings, or group of enterprises
        engaged in a joint economic activity;

        (21) ‘supervisory authority’ means an independent public authority which
        is established by a Member State pursuant to Article 51;

        (22) ‘supervisory authority concerned’ means a supervisory authority
        which is concerned by the processing of personal data because: (a) the
        controller or processor is established on the territory of the Member
        State of that supervisory authority; (b) data subjects residing in the
        Member State of that supervisory authority are substantially affected or
        likely to be substantially affected by the processing; or (c) a
        complaint has been lodged with that supervisory authority;

        (23) ‘cross-border processing’ means either: (a) processing of personal
        data which takes place in the context of the activities of
        establishments in more than one Member State of a controller or
        processor in the Union where the controller or processor is established
        in more than one Member State; or (b) processing of personal data which
        takes place in the context of the activities of a single establishment
        of a controller or processor in the Union but which substantially
        affects or is likely to substantially affect data subjects in more than
        one Member State.

        (24) ‘relevant and reasoned objection’ means an objection to a draft
        decision as to whether there is an infringement of this Regulation, or
        whether envisaged action in relation to the controller or processor
        complies with this Regulation, which clearly demonstrates the
        significance of the risks posed by the draft decision as regards the
        fundamental rights and freedoms of data subjects and, where applicable,
        the free flow of personal data within the Union;

        (25) ‘information society service’ means a service as defined in point
        (b) of Article 1(1) of Directive (EU) 2015/1535 of the European
        Parliament and of the Council (1);

        (26) ‘international organisation’ means an organisation and its
        subordinate bodies governed by public international law, or any other
        body which is set up by, or on the basis of, an agreement between two or
        more countries.
  - source_sentence: >-
      What type of data may be processed for purposes related to point (h) of
      paragraph 2?
    sentences:
      - >-
        1.Processing of personal data revealing racial or ethnic origin,
        political opinions, religious or philosophical beliefs, or trade union
        membership, and the processing of genetic data, biometric data for the
        purpose of uniquely identifying a natural person, data concerning health
        or data concerning a natural person's sex life or sexual orientation
        shall be prohibited.

        2.Paragraph 1 shall not apply if one of the following applies: (a)  the
        data subject has given explicit consent to the processing of those
        personal data for one or more specified purposes, except where Union or
        Member State law provide that the prohibition referred to in paragraph 1
        may not be lifted by the data subject; (b)  processing is necessary for
        the purposes of carrying out the obligations and exercising specific
        rights of the controller or of the data subject in the field of
        employment and social security and social protection law in so far as it
        is authorised by Union or Member State law or a collective agreement
        pursuant to Member State law providing for appropriate safeguards for
        the fundamental rights and the interests of the data subject; (c) 
        processing is necessary to protect the vital interests of the data
        subject or of another natural person where the data subject is
        physically or legally incapable of giving consent; (d)  processing is
        carried out in the course of its legitimate activities with appropriate
        safeguards by a foundation, association or any other not-for-profit body
        with a political, philosophical, religious or trade union aim and on
        condition that the processing relates solely to the members or to former
        members of the body or to persons who have regular contact with it in
        connection with its purposes and that the personal data are not
        disclosed outside that body without the consent of the data subjects;
        (e)  processing relates to personal data which are manifestly made
        public by the data subject; (f)  processing is necessary for the
        establishment, exercise or defence of legal claims or whenever courts
        are acting in their judicial capacity; (g)  processing is necessary for
        reasons of substantial public interest, on the basis of Union or Member
        State law which shall be proportionate to the aim pursued, respect the
        essence of the right to data protection and provide for suitable and
        specific measures to safeguard the fundamental rights and the interests
        of the data subject; (h)  processing is necessary for the purposes of
        preventive or occupational medicine, for the assessment of the working
        capacity of the employee, medical diagnosis, the provision of health or
        social care or treatment or the management of health or social care
        systems and services on the basis of Union or Member State law or
        pursuant to contract with a health professional and subject to the
        conditions and safeguards referred to in paragraph 3; (i)  processing is
        necessary for reasons of public interest in the area of public health,
        such as protecting against serious cross-border threats to health or
        ensuring high standards of quality and safety of health care and of
        medicinal products or medical devices, on the basis of Union or Member
        State law which provides for suitable and specific measures to safeguard
        the rights and freedoms of the data subject, in particular professional
        secrecy; 4.5.2016 L 119/38   (j)  processing is necessary for archiving
        purposes in the public interest, scientific or historical research
        purposes or statistical purposes in accordance with Article 89(1) based
        on Union or Member State law which shall be proportionate to the aim
        pursued, respect the essence of the right to data protection and provide
        for suitable and specific measures to safeguard the fundamental rights
        and the interests of the data subject.

        3.Personal data referred to in paragraph 1 may be processed for the
        purposes referred to in point (h) of paragraph 2 when those data are
        processed by or under the responsibility of a professional subject to
        the obligation of professional secrecy under Union or Member State law
        or rules established by national competent bodies or by another person
        also subject to an obligation of secrecy under Union or Member State law
        or rules established by national competent bodies.

        4.Member States may maintain or introduce further conditions, including
        limitations, with regard to the processing of genetic data, biometric
        data or data concerning health.
      - >-
        1.The data protection officer shall have at least the following tasks:
        (a)  to inform and advise the controller or the processor and the
        employees who carry out processing of their obligations pursuant to this
        Regulation and to other Union or Member State data protection
        provisions; (b)  to monitor compliance with this Regulation, with other
        Union or Member State data protection provisions and with the policies
        of the controller or processor in relation to the protection of personal
        data, including the assignment of responsibilities, awareness-raising
        and training of staff involved in processing operations, and the related
        audits; (c)  to provide advice where requested as regards the data
        protection impact assessment and monitor its performance pursuant to
        Article 35; (d)  to cooperate with the supervisory authority; (e)  to
        act as the contact point for the supervisory authority on issues
        relating to processing, including the prior consultation referred to in
        Article 36, and to consult, where appropriate, with regard to any other
        matter.

        2.The data protection officer shall in the performance of his or her
        tasks have due regard to the risk associated with processing operations,
        taking into account the nature, scope, context and purposes of
        processing. Section 5 Codes of conduct and certification
      - >-
        Processing should be lawful where it is necessary in the context of a
        contract or the intention to enter into a contract.
  - source_sentence: >-
      What may impede authorities in the discharge of their responsibilities
      under Union law?
    sentences:
      - >-
        1.The controller and the processor shall designate a data protection
        officer in any case where: (a)  the processing is carried out by a
        public authority or body, except for courts acting in their judicial
        capacity; (b)  the core activities of the controller or the processor
        consist of processing operations which, by virtue of their nature, their
        scope and/or their purposes, require regular and systematic monitoring
        of data subjects on a large scale; or (c)  the core activities of the
        controller or the processor consist of processing on a large scale of
        special categories of data pursuant to Article 9 and personal data
        relating to criminal convictions and offences referred to in Article 10

        2.A group of undertakings may appoint a single data protection officer
        provided that a data protection officer is easily accessible from each
        establishment.

        3.Where the controller or the processor is a public authority or body, a
        single data protection officer may be designated for several such
        authorities or bodies, taking account of their organisational structure
        and size.

        4.In cases other than those referred to in paragraph 1, the controller
        or processor or associations and other bodies representing categories of
        controllers or processors may or, where required by Union or Member
        State law shall, designate a data protection officer. The data
        protection officer may act for such associations and other bodies
        representing controllers or processors.

        5.The data protection officer shall be designated on the basis of
        professional qualities and, in particular, expert knowledge of data
        protection law and practices and the ability to fulfil the tasks
        referred to in Article 39

        6.The data protection officer may be a staff member of the controller or
        processor, or fulfil the tasks on the basis of a service contract.

        7.The controller or the processor shall publish the contact details of
        the data protection officer and communicate them to the supervisory
        authority.
      - >-
        This Regulation is without prejudice to international agreements
        concluded between the Union and third countries regulating the transfer
        of personal data including appropriate safeguards for the data subjects.
        Member States may conclude international agreements which involve the
        transfer of personal data to third countries or international
        organisations, as far as such agreements do not affect this Regulation
        or any other provisions of Union law and include an appropriate level of
        protection for the fundamental rights of the data subjects.
      - >-
        The objectives and principles of Directive 95/46/EC remain sound, but it
        has not prevented fragmentation in the implementation of data protection
        across the Union, legal uncertainty or a widespread public perception
        that there are significant risks to the protection of natural persons,
        in particular with regard to online activity. Differences in the level
        of protection of the rights and freedoms of natural persons, in
        particular the right to the protection of personal data, with regard to
        the processing of personal data in the Member States may prevent the
        free flow of personal data throughout the Union. Those differences may
        therefore constitute an obstacle to the pursuit of economic activities
        at the level of the Union, distort competition and impede authorities in
        the discharge of their responsibilities under Union law. Such a
        difference in levels of protection is due to the existence of
        differences in the implementation and application of Directive 95/46/EC.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: modernbert-embed-base
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.4020486555697823
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.4052496798975672
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.42893725992317544
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.46094750320102434
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.4020486555697823
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.401195049082373
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.39129321382842513
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3589628681177977
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.04175313555284748
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.12278476862052412
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.18536806181354978
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.2777345271673647
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.4220301651533148
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.4118049204316806
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4807606113925945
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.39436619718309857
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.39884763124199746
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.4180537772087068
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.4532650448143406
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.39436619718309857
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.39436619718309857
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.38412291933418696
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.35262483994878363
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.04042853140523698
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.1196927035383132
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.18113736600353658
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.2725004471030787
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.41401334483433183
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.4040899437026195
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.47229660803723117
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.3860435339308579
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.39244558258642764
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.4167733674775928
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.44814340588988477
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.3860435339308579
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3866837387964149
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.37836107554417414
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3476952624839948
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.03962321211820953
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.11744680009464445
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.17843204724958808
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.26814407122008105
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.40839722669645323
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.3969033900371925
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4642655010516818
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.353393085787452
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.35979513444302175
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.3847631241997439
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.4142125480153649
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.353393085787452
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.35381988903115663
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.34609475032010245
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.31952624839948784
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.036464605000445925
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.10761057769873429
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.1630330675703894
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.2478046074721795
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.37609461891444046
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.36410660935308786
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4332448707193373
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.3079385403329065
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.31562099871959026
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.3348271446862996
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.36939820742637647
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.3079385403329065
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3092189500640205
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.3026888604353393
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.28040973111395645
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.03154669390524976
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.09368644321809669
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.14256307415283676
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.21725520447019792
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.3297439353321731
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.3185400280470698
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.3853517715303568
            name: Cosine Map@100

modernbert-embed-base

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'What may impede authorities in the discharge of their responsibilities under Union law?',
    'The objectives and principles of Directive 95/46/EC remain sound, but it has not prevented fragmentation in the implementation of data protection across the Union, legal uncertainty or a widespread public perception that there are significant risks to the protection of natural persons, in particular with regard to online activity. Differences in the level of protection of the rights and freedoms of natural persons, in particular the right to the protection of personal data, with regard to the processing of personal data in the Member States may prevent the free flow of personal data throughout the Union. Those differences may therefore constitute an obstacle to the pursuit of economic activities at the level of the Union, distort competition and impede authorities in the discharge of their responsibilities under Union law. Such a difference in levels of protection is due to the existence of differences in the implementation and application of Directive 95/46/EC.',
    'This Regulation is without prejudice to international agreements concluded between the Union and third countries regulating the transfer of personal data including appropriate safeguards for the data subjects. Member States may conclude international agreements which involve the transfer of personal data to third countries or international organisations, as far as such agreements do not affect this Regulation or any other provisions of Union law and include an appropriate level of protection for the fundamental rights of the data subjects.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.5042, 0.0865],
#         [0.5042, 1.0000, 0.2632],
#         [0.0865, 0.2632, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.402
cosine_accuracy@3 0.4052
cosine_accuracy@5 0.4289
cosine_accuracy@10 0.4609
cosine_precision@1 0.402
cosine_precision@3 0.4012
cosine_precision@5 0.3913
cosine_precision@10 0.359
cosine_recall@1 0.0418
cosine_recall@3 0.1228
cosine_recall@5 0.1854
cosine_recall@10 0.2777
cosine_ndcg@10 0.422
cosine_mrr@10 0.4118
cosine_map@100 0.4808

Information Retrieval

Metric Value
cosine_accuracy@1 0.3944
cosine_accuracy@3 0.3988
cosine_accuracy@5 0.4181
cosine_accuracy@10 0.4533
cosine_precision@1 0.3944
cosine_precision@3 0.3944
cosine_precision@5 0.3841
cosine_precision@10 0.3526
cosine_recall@1 0.0404
cosine_recall@3 0.1197
cosine_recall@5 0.1811
cosine_recall@10 0.2725
cosine_ndcg@10 0.414
cosine_mrr@10 0.4041
cosine_map@100 0.4723

Information Retrieval

Metric Value
cosine_accuracy@1 0.386
cosine_accuracy@3 0.3924
cosine_accuracy@5 0.4168
cosine_accuracy@10 0.4481
cosine_precision@1 0.386
cosine_precision@3 0.3867
cosine_precision@5 0.3784
cosine_precision@10 0.3477
cosine_recall@1 0.0396
cosine_recall@3 0.1174
cosine_recall@5 0.1784
cosine_recall@10 0.2681
cosine_ndcg@10 0.4084
cosine_mrr@10 0.3969
cosine_map@100 0.4643

Information Retrieval

Metric Value
cosine_accuracy@1 0.3534
cosine_accuracy@3 0.3598
cosine_accuracy@5 0.3848
cosine_accuracy@10 0.4142
cosine_precision@1 0.3534
cosine_precision@3 0.3538
cosine_precision@5 0.3461
cosine_precision@10 0.3195
cosine_recall@1 0.0365
cosine_recall@3 0.1076
cosine_recall@5 0.163
cosine_recall@10 0.2478
cosine_ndcg@10 0.3761
cosine_mrr@10 0.3641
cosine_map@100 0.4332

Information Retrieval

Metric Value
cosine_accuracy@1 0.3079
cosine_accuracy@3 0.3156
cosine_accuracy@5 0.3348
cosine_accuracy@10 0.3694
cosine_precision@1 0.3079
cosine_precision@3 0.3092
cosine_precision@5 0.3027
cosine_precision@10 0.2804
cosine_recall@1 0.0315
cosine_recall@3 0.0937
cosine_recall@5 0.1426
cosine_recall@10 0.2173
cosine_ndcg@10 0.3297
cosine_mrr@10 0.3185
cosine_map@100 0.3854

Training Details

Training Dataset

Unnamed Dataset

  • Size: 391 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 391 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 15.05 tokens
    • max: 30 tokens
    • min: 25 tokens
    • mean: 667.99 tokens
    • max: 2429 tokens
  • Samples:
    anchor positive
    On what date did the act occur? Court (Civil/Criminal): Civil
    Provisions: Directive 2015/366, Law 4537/2018
    Time of the act: 31.08.2022
    Outcome (not guilty, guilty): Partially accepts the claim.
    Reasoning: The Athens Peace Court ordered the bank to return the amount that was withdrawn from the plaintiffs' account and to pay additional compensation for the moral damage they suffered.
    Facts: The case concerns plaintiffs who fell victim to electronic fraud via phishing, resulting in the withdrawal of money from their bank account. The plaintiffs claimed that the bank did not take the necessary security measures to protect their accounts and sought compensation for the financial loss and moral damage they suffered. The court determined that the bank is responsible for the loss of the money, as it did not prove that the transactions were authorized by the plaintiffs. Furthermore, the court recognized that the bank's refusal to return the funds constitutes an infringement of the plaintiffs' personal rights, as it...
    For what purposes can more specific rules be provided regarding the employment context? 1.Member States may, by law or by collective agreements, provide for more specific rules to ensure the protection of the rights and freedoms in respect of the processing of employees' personal data in the employment context, in particular for the purposes of the recruitment, the performance of the contract of employment, including discharge of obligations laid down by law or by collective agreements, management, planning and organisation of work, equality and diversity in the workplace, health and safety at work, protection of employer's or customer's property and for the purposes of the exercise and enjoyment, on an individual or collective basis, of rights and benefits related to employment, and for the purpose of the termination of the employment relationship.
    2.Those rules shall include suitable and specific measures to safeguard the data subject's human dignity, legitimate interests and fundamental rights, with particular regard to the transparency of processing, the transfer of p...
    On which date were transactions detailed in the provided text conducted? Court (Civil/Criminal): Civil

    Provisions:

    Time of commission of the act:

    Outcome (not guilty, guilty):

    Rationale:

    Facts:
    The plaintiff holds credit card number ............ with the defendant banking corporation. Based on the application for alternative networks dated 19/7/2015 with number ......... submitted at a branch of the defendant, he was granted access to the electronic banking service (e-banking) to conduct banking transactions (debit, credit, updates, payments) remotely. On 30/11/2020, the plaintiff fell victim to electronic fraud through the "phishing" method, whereby an unknown perpetrator managed to withdraw a total amount of €3,121.75 from the aforementioned credit card. Specifically, the plaintiff received an email at 1:35 PM on 29/11/2020 from sender ...... with address ........, informing him that due to an impending system change, he needed to verify the mobile phone number linked to the credit card, urging him to complete the verification...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 20
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 2
  • per_device_eval_batch_size: 2
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.0102 1 0.0001 - - - - -
0.0204 2 0.001 - - - - -
0.0306 3 0.0938 - - - - -
0.0408 4 0.0084 - - - - -
0.0510 5 0.0 - - - - -
0.0612 6 0.0004 - - - - -
0.0714 7 0.003 - - - - -
0.0816 8 0.0012 - - - - -
0.0918 9 0.0001 - - - - -
0.1020 10 0.0053 - - - - -
0.1122 11 0.0068 - - - - -
0.1224 12 0.0006 - - - - -
0.1327 13 0.0007 - - - - -
0.1429 14 0.0003 - - - - -
0.1531 15 0.0096 - - - - -
0.1633 16 0.0004 - - - - -
0.1735 17 0.016 - - - - -
0.1837 18 0.0 - - - - -
0.1939 19 0.0005 - - - - -
0.2041 20 0.0 - - - - -
0.2143 21 0.003 - - - - -
0.2245 22 0.1395 - - - - -
0.2347 23 0.3967 - - - - -
0.2449 24 0.0023 - - - - -
0.2551 25 0.0003 - - - - -
0.2653 26 0.0027 - - - - -
0.2755 27 0.0147 - - - - -
0.2857 28 0.0522 - - - - -
0.2959 29 0.0001 - - - - -
0.3061 30 0.0008 - - - - -
0.3163 31 0.0044 - - - - -
0.3265 32 0.0 - - - - -
0.3367 33 0.0028 - - - - -
0.3469 34 0.0007 - - - - -
0.3571 35 0.0002 - - - - -
0.3673 36 0.0168 - - - - -
0.3776 37 0.0023 - - - - -
0.3878 38 0.0041 - - - - -
0.3980 39 0.0081 - - - - -
0.4082 40 0.0004 - - - - -
0.4184 41 0.0 - - - - -
0.4286 42 0.005 - - - - -
0.4388 43 0.0031 - - - - -
0.4490 44 0.0216 - - - - -
0.4592 45 0.0004 - - - - -
0.4694 46 0.0018 - - - - -
0.4796 47 0.0 - - - - -
0.4898 48 0.0044 - - - - -
0.5 49 0.0004 - - - - -
0.5102 50 0.0019 - - - - -
0.5204 51 0.0005 - - - - -
0.5306 52 0.0016 - - - - -
0.5408 53 0.1806 - - - - -
0.5510 54 0.0 - - - - -
0.5612 55 0.0025 - - - - -
0.5714 56 0.0002 - - - - -
0.5816 57 0.0 - - - - -
0.5918 58 0.0111 - - - - -
0.6020 59 0.0011 - - - - -
0.6122 60 0.0003 - - - - -
0.6224 61 1.8072 - - - - -
0.6327 62 0.0009 - - - - -
0.6429 63 0.0011 - - - - -
0.6531 64 0.0013 - - - - -
0.6633 65 0.0 - - - - -
0.6735 66 0.0007 - - - - -
0.6837 67 0.4116 - - - - -
0.6939 68 0.008 - - - - -
0.7041 69 0.0009 - - - - -
0.7143 70 0.0004 - - - - -
0.7245 71 0.0019 - - - - -
0.7347 72 0.0005 - - - - -
0.7449 73 0.0004 - - - - -
0.7551 74 0.0005 - - - - -
0.7653 75 0.0001 - - - - -
0.7755 76 0.0005 - - - - -
0.7857 77 0.0 - - - - -
0.7959 78 0.0001 - - - - -
0.8061 79 0.0025 - - - - -
0.8163 80 0.0 - - - - -
0.8265 81 0.0012 - - - - -
0.8367 82 0.0003 - - - - -
0.8469 83 0.0002 - - - - -
0.8571 84 0.0 - - - - -
0.8673 85 0.0 - - - - -
0.8776 86 0.0 - - - - -
0.8878 87 0.0002 - - - - -
0.8980 88 0.0009 - - - - -
0.9082 89 0.0067 - - - - -
0.9184 90 0.0 - - - - -
0.9286 91 0.0001 - - - - -
0.9388 92 0.0008 - - - - -
0.9490 93 0.0031 - - - - -
0.9592 94 0.0004 - - - - -
0.9694 95 0.0004 - - - - -
0.9796 96 0.0001 - - - - -
0.9898 97 0.0004 - - - - -
1.0 98 0.0005 0.4261 0.4154 0.4098 0.379 0.3357
1.0102 99 0.0006 - - - - -
1.0204 100 0.0011 - - - - -
1.0306 101 0.0006 - - - - -
1.0408 102 0.0 - - - - -
1.0510 103 0.0009 - - - - -
1.0612 104 0.0008 - - - - -
1.0714 105 0.0004 - - - - -
1.0816 106 0.0 - - - - -
1.0918 107 0.0005 - - - - -
1.1020 108 0.0007 - - - - -
1.1122 109 0.0003 - - - - -
1.1224 110 0.0001 - - - - -
1.1327 111 0.0001 - - - - -
1.1429 112 0.0006 - - - - -
1.1531 113 0.0005 - - - - -
1.1633 114 0.0013 - - - - -
1.1735 115 0.0 - - - - -
1.1837 116 0.0003 - - - - -
1.1939 117 0.0001 - - - - -
1.2041 118 0.0003 - - - - -
1.2143 119 0.001 - - - - -
1.2245 120 0.0 - - - - -
1.2347 121 0.0 - - - - -
1.2449 122 0.0001 - - - - -
1.2551 123 0.0011 - - - - -
1.2653 124 0.0019 - - - - -
1.2755 125 0.0 - - - - -
1.2857 126 0.0004 - - - - -
1.2959 127 0.0 - - - - -
1.3061 128 0.0 - - - - -
1.3163 129 0.0002 - - - - -
1.3265 130 0.0004 - - - - -
1.3367 131 0.0012 - - - - -
1.3469 132 0.0002 - - - - -
1.3571 133 0.0001 - - - - -
1.3673 134 0.0001 - - - - -
1.3776 135 0.0001 - - - - -
1.3878 136 0.0001 - - - - -
1.3980 137 0.0002 - - - - -
1.4082 138 0.0002 - - - - -
1.4184 139 0.0003 - - - - -
1.4286 140 0.0001 - - - - -
1.4388 141 0.0003 - - - - -
1.4490 142 0.0023 - - - - -
1.4592 143 0.0008 - - - - -
1.4694 144 0.0004 - - - - -
1.4796 145 0.0009 - - - - -
1.4898 146 0.0002 - - - - -
1.5 147 0.0 - - - - -
1.5102 148 0.0001 - - - - -
1.5204 149 0.0002 - - - - -
1.5306 150 0.0002 - - - - -
1.5408 151 0.0001 - - - - -
1.5510 152 0.0005 - - - - -
1.5612 153 0.0 - - - - -
1.5714 154 0.0001 - - - - -
1.5816 155 0.0003 - - - - -
1.5918 156 0.0001 - - - - -
1.6020 157 0.0006 - - - - -
1.6122 158 0.0002 - - - - -
1.6224 159 0.0201 - - - - -
1.6327 160 0.0003 - - - - -
1.6429 161 0.0003 - - - - -
1.6531 162 0.0001 - - - - -
1.6633 163 0.6487 - - - - -
1.6735 164 0.0013 - - - - -
1.6837 165 0.0 - - - - -
1.6939 166 0.0001 - - - - -
1.7041 167 0.0003 - - - - -
1.7143 168 0.0 - - - - -
1.7245 169 0.0001 - - - - -
1.7347 170 0.0 - - - - -
1.7449 171 0.0001 - - - - -
1.7551 172 0.0001 - - - - -
1.7653 173 0.0 - - - - -
1.7755 174 0.0001 - - - - -
1.7857 175 0.0001 - - - - -
1.7959 176 0.0006 - - - - -
1.8061 177 0.0006 - - - - -
1.8163 178 0.0001 - - - - -
1.8265 179 0.0026 - - - - -
1.8367 180 0.0003 - - - - -
1.8469 181 0.0001 - - - - -
1.8571 182 0.0003 - - - - -
1.8673 183 0.0068 - - - - -
1.8776 184 0.0004 - - - - -
1.8878 185 0.0 - - - - -
1.8980 186 0.0002 - - - - -
1.9082 187 0.0004 - - - - -
1.9184 188 0.0 - - - - -
1.9286 189 0.0002 - - - - -
1.9388 190 0.0002 - - - - -
1.9490 191 0.0001 - - - - -
1.9592 192 0.0 - - - - -
1.9694 193 0.0005 - - - - -
1.9796 194 0.0 - - - - -
1.9898 195 0.0002 - - - - -
2.0 196 0.0 0.4021 0.4038 0.4032 0.3706 0.3269
2.0102 197 0.0038 - - - - -
2.0204 198 0.0002 - - - - -
2.0306 199 0.3615 - - - - -
2.0408 200 0.0003 - - - - -
2.0510 201 0.0001 - - - - -
2.0612 202 0.0013 - - - - -
2.0714 203 0.0018 - - - - -
2.0816 204 0.0003 - - - - -
2.0918 205 0.0012 - - - - -
2.1020 206 0.0186 - - - - -
2.1122 207 0.0002 - - - - -
2.1224 208 0.0 - - - - -
2.1327 209 0.0 - - - - -
2.1429 210 0.0029 - - - - -
2.1531 211 0.0037 - - - - -
2.1633 212 0.0001 - - - - -
2.1735 213 0.0005 - - - - -
2.1837 214 0.0032 - - - - -
2.1939 215 0.0005 - - - - -
2.2041 216 0.0069 - - - - -
2.2143 217 0.0063 - - - - -
2.2245 218 0.0027 - - - - -
2.2347 219 0.0003 - - - - -
2.2449 220 0.0015 - - - - -
2.2551 221 0.0382 - - - - -
2.2653 222 0.0012 - - - - -
2.2755 223 0.0001 - - - - -
2.2857 224 0.007 - - - - -
2.2959 225 0.0 - - - - -
2.3061 226 0.0001 - - - - -
2.3163 227 0.0 - - - - -
2.3265 228 0.0003 - - - - -
2.3367 229 0.0001 - - - - -
2.3469 230 0.0013 - - - - -
2.3571 231 0.0038 - - - - -
2.3673 232 0.0161 - - - - -
2.3776 233 0.0 - - - - -
2.3878 234 0.0001 - - - - -
2.3980 235 0.0011 - - - - -
2.4082 236 0.0209 - - - - -
2.4184 237 0.0001 - - - - -
2.4286 238 0.0001 - - - - -
2.4388 239 1.2667 - - - - -
2.4490 240 0.0025 - - - - -
2.4592 241 0.023 - - - - -
2.4694 242 0.0001 - - - - -
2.4796 243 0.0 - - - - -
2.4898 244 0.0002 - - - - -
2.5 245 0.0037 - - - - -
2.5102 246 5.2145 - - - - -
2.5204 247 0.0072 - - - - -
2.5306 248 0.0006 - - - - -
2.5408 249 0.162 - - - - -
2.5510 250 0.0043 - - - - -
2.5612 251 0.0004 - - - - -
2.5714 252 0.0006 - - - - -
2.5816 253 0.0079 - - - - -
2.5918 254 0.002 - - - - -
2.6020 255 0.0003 - - - - -
2.6122 256 0.0003 - - - - -
2.6224 257 0.0046 - - - - -
2.6327 258 0.0002 - - - - -
2.6429 259 0.0001 - - - - -
2.6531 260 0.0001 - - - - -
2.6633 261 0.0118 - - - - -
2.6735 262 0.0 - - - - -
2.6837 263 0.0001 - - - - -
2.6939 264 0.0746 - - - - -
2.7041 265 0.0007 - - - - -
2.7143 266 0.0009 - - - - -
2.7245 267 0.0005 - - - - -
2.7347 268 0.8332 - - - - -
2.7449 269 0.0002 - - - - -
2.7551 270 0.0001 - - - - -
2.7653 271 0.0013 - - - - -
2.7755 272 0.0002 - - - - -
2.7857 273 0.0002 - - - - -
2.7959 274 0.0001 - - - - -
2.8061 275 0.0 - - - - -
2.8163 276 0.0008 - - - - -
2.8265 277 0.0001 - - - - -
2.8367 278 0.0008 - - - - -
2.8469 279 0.0077 - - - - -
2.8571 280 0.0078 - - - - -
2.8673 281 0.0021 - - - - -
2.8776 282 0.0 - - - - -
2.8878 283 0.5116 - - - - -
2.8980 284 0.0015 - - - - -
2.9082 285 0.0014 - - - - -
2.9184 286 0.0002 - - - - -
2.9286 287 0.0002 - - - - -
2.9388 288 0.0041 - - - - -
2.9490 289 0.0058 - - - - -
2.9592 290 0.0001 - - - - -
2.9694 291 0.0009 - - - - -
2.9796 292 0.0001 - - - - -
2.9898 293 0.0 - - - - -
3.0 294 0.0004 0.4220 0.4140 0.4084 0.3761 0.3297
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.51.3
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}