IoannisKat1 commited on
Commit
4d14832
·
verified ·
1 Parent(s): 82daac8

Finetuned Reranker

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,448 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - cross-encoder
5
+ - reranker
6
+ - generated_from_trainer
7
+ - dataset_size:443
8
+ - loss:BinaryCrossEntropyLoss
9
+ base_model: BAAI/bge-reranker-v2-m3
10
+ pipeline_tag: text-ranking
11
+ library_name: sentence-transformers
12
+ metrics:
13
+ - map
14
+ - mrr@10
15
+ - ndcg@10
16
+ - accuracy
17
+ - accuracy_threshold
18
+ - f1
19
+ - f1_threshold
20
+ - precision
21
+ - recall
22
+ - average_precision
23
+ model-index:
24
+ - name: CrossEncoder based on BAAI/bge-reranker-v2-m3
25
+ results:
26
+ - task:
27
+ type: cross-encoder-reranking
28
+ name: Cross Encoder Reranking
29
+ dataset:
30
+ name: gooaq dev
31
+ type: gooaq-dev
32
+ metrics:
33
+ - type: map
34
+ value: 0.6246
35
+ name: Map
36
+ - type: mrr@10
37
+ value: 0.6246
38
+ name: Mrr@10
39
+ - type: ndcg@10
40
+ value: 0.7177
41
+ name: Ndcg@10
42
+ - task:
43
+ type: cross-encoder-classification
44
+ name: Cross Encoder Classification
45
+ dataset:
46
+ name: cls dev
47
+ type: cls-dev
48
+ metrics:
49
+ - type: accuracy
50
+ value: 0.9523809523809523
51
+ name: Accuracy
52
+ - type: accuracy_threshold
53
+ value: 6.58235585433431e-05
54
+ name: Accuracy Threshold
55
+ - type: f1
56
+ value: 0.975609756097561
57
+ name: F1
58
+ - type: f1_threshold
59
+ value: 6.58235585433431e-05
60
+ name: F1 Threshold
61
+ - type: precision
62
+ value: 1.0
63
+ name: Precision
64
+ - type: recall
65
+ value: 0.9523809523809523
66
+ name: Recall
67
+ - type: average_precision
68
+ value: 1.0000000000000002
69
+ name: Average Precision
70
+ ---
71
+
72
+ # CrossEncoder based on BAAI/bge-reranker-v2-m3
73
+
74
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
75
+
76
+ ## Model Details
77
+
78
+ ### Model Description
79
+ - **Model Type:** Cross Encoder
80
+ - **Base model:** [BAAI/bge-reranker-v2-m3](https://huggingface.co/BAAI/bge-reranker-v2-m3) <!-- at revision 953dc6f6f85a1b2dbfca4c34a2796e7dde08d41e -->
81
+ - **Maximum Sequence Length:** 512 tokens
82
+ - **Number of Output Labels:** 1 label
83
+ <!-- - **Training Dataset:** Unknown -->
84
+ <!-- - **Language:** Unknown -->
85
+ <!-- - **License:** Unknown -->
86
+
87
+ ### Model Sources
88
+
89
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
90
+ - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
91
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
92
+ - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)
93
+
94
+ ## Usage
95
+
96
+ ### Direct Usage (Sentence Transformers)
97
+
98
+ First install the Sentence Transformers library:
99
+
100
+ ```bash
101
+ pip install -U sentence-transformers
102
+ ```
103
+
104
+ Then you can load this model and run inference.
105
+ ```python
106
+ from sentence_transformers import CrossEncoder
107
+
108
+ # Download from the 🤗 Hub
109
+ model = CrossEncoder("cross_encoder_model_id")
110
+ # Get scores for pairs of texts
111
+ pairs = [
112
+ ['What does dolus refer to?', 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision, it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without requiring that the benefit actually materialize;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and performs an act, omission, or acquiescence; and\n\nc) Damage to another’s property, according to civil law, which must be causally connected to the perpetrator’s deceptive acts or omissions. It is not required that the deceived person and the person who suffered the loss be the same.\n\nThe term “facts,” within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those that will occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts relating to the present or the past, in such a way as to create the impression of future fulfillment, based on a false present situation fabricated by the perpetrator—who has already made the decision not to fulfill their obligation—then the crime of fraud is established.\n\nThe term “property” denotes the totality of a person’s economic assets possessing monetary value, while damage to property refers to its reduction—specifically, the difference between the property’s monetary value before the disposition caused by the fraudulent conduct and its value afterward. Property damage exists even if the victim has an active claim for its restitution.\n\nThe time of commission of fraud is considered to be the moment when the perpetrator acted and completed the deceptive conduct, that is, when they made the false representations which deceived the victim or a third party. Any later time at which the victim’s financial loss occurred—thus completing the fraud—or the time when the harmful act or omission of the deceived person took place, is irrelevant.\n\nThe reference to multiple modes of commission of fraud (i.e., both the misrepresentation of false facts and the concealment of true ones) may create ambiguity and contradiction, unless it is made clear from the overall findings that the offense was committed in one particular manner, and that the reference to the other merely serves to define the intent (mens rea) of the perpetrator—specifically, that the representations were false.\n\nFurthermore, a conviction must contain the specific and well-reasoned justification required by Articles 93 paragraph 3 of the Constitution and 139 of the Code of Criminal Procedure. The absence of such reasoning constitutes grounds for cassation (appeal) under Article 510 paragraph 1(d) of the Code of Criminal Procedure, when the judgment does not set out, with clarity, completeness, and consistency, the factual circumstances established by the evidence, upon which the court based its findings regarding the objective and subjective elements of the offense, the evidence supporting those findings, and the legal reasoning through which those facts were subsumed under the applicable substantive criminal provision.\n\nFor the existence of such reasoning, the explanatory and operative parts of the decision may complement each other, as they form a single, unified whole.\n\nThe existence of intent (dolus) does not generally need to be specially justified, since it is inherent in the will to bring about the factual circumstances constituting the objective elements of the offense, and it is presumed from their realization in each particular case—unless the law requires additional elements for criminal liability, such as the act being committed with knowledge of a specific circumstance (direct intent) or with the pursuit of a further purpose, i.e., the achievement of an additional result (offenses requiring a special subjective element).\n\nFurthermore, under Article 510 paragraph 1(e) of the Code of Criminal Procedure, a misapplication of substantive criminal law also constitutes grounds for cassation. Such misapplication occurs when the trial court incorrectly applies the law to the facts it has found to be true, or when the violation occurs indirectly, namely when the reasoning of the judgment—comprising the combination of its factual and operative parts and relating to the elements and identity of the offense—contains ambiguities, contradictions, or logical gaps, rendering it impossible to verify, on appeal, whether the law was applied correctly. In such cases, the judgment lacks a lawful basis.'],
113
+ ['What does dolus refer to?', 'According to Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom these provisions, it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence detrimental to themselves or another; and\n\nc) Damage to another’s property, as defined under civil law, which must be causally connected to the perpetrator’s deceptive acts.\n\nFrom the above provisions, it is deduced that the crime of fraud is established both objectively and subjectively through the knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true ones, by which another person is deceived and, as a result, performs an act, omission, or acquiescence involving a disposition of property that directly and necessarily causes financial damage to the deceived person or another, with the intent that the perpetrator or another gain an unlawful benefit. It is irrelevant whether this intended benefit was ultimately achieved.\n\nThe term “facts,” within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those expected to occur in the future, such as mere promises or contractual obligations. The false fact must have existed in the past or must be a present circumstance at the time it is asserted, and cannot relate to the future.\n\nHowever, when future circumstances—that is, promises or contractual obligations—are accompanied by false assurances and representations of other false facts referring to the present or past, in such a way as to create the impression of future fulfillment, based on a false present situation or supposed ability of the perpetrator, who had already made the decision not to fulfill their obligation, then the crime of fraud is established.'],
114
+ ['What does dolus refer to?', 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without it being necessary that the benefit actually materialize;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence that is detrimental to themselves or another; and\n\nc) Damage to another person’s property, as defined under civil law, which must be causally linked to the deceptive acts or omissions of the perpetrator. It is not required that the person deceived and the person who suffered the damage be the same individual.\n\nThe term “facts”, within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those that will occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts referring to the present or the past, in such a manner as to create the impression of future fulfillment based on a false present situation fabricated by the perpetrator, who has already formed the decision not to fulfill their obligation, the crime of fraud is established.\n\nThe term “property” refers to the totality of a person’s economic assets that possess monetary value, while damage to property means its reduction—specifically, the difference between the monetary value the property had before the disposition caused by the fraudulent conduct and the value remaining after it. Property damage exists even if the victim possesses an active claim for restitution.\n\nThe time of commission of the fraud is considered to be the moment when the perpetrator acted and completed their fraudulent conduct, namely when they made the false representations that deceived the victim or a third party. Any subsequent moment at which the victim’s damage actually occurred—thereby completing the fraud—or the time when the victim carried out the harmful act or omission, is irrelevant.'],
115
+ ['What does dolus refer to?', 'According to Article 386 paragraph 1 of the Greek Penal Code, the crime of fraud is established both objectively and subjectively when a person knowingly presents false facts as true, or unlawfully conceals or suppresses true facts, thereby deceiving another person who, as a result of this deception, performs an act, omission, or acquiescence that involves a disposition of property, which directly and necessarily causes financial harm to the deceived person or another, with the intent of obtaining an unlawful benefit for themselves or another. It is irrelevant whether or not this intended benefit was actually achieved.\n\nFurthermore, within the meaning of the law, “facts” also include those referring to future events and promises, when they are accompanied by false assurances and representations of other false facts relating to the past or present, in such a manner as to create the impression of future fulfillment, based on the false situation presented by the perpetrator, who has already made the decision not to fulfill their obligation.'],
116
+ ['What does dolus refer to?', 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another person’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision, it follows that for the crime of fraud to be established, the following elements are required:\n\na) Intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, regardless of whether this benefit was actually realized;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which, as a causal factor, someone is deceived and acts in a way that is detrimental to themselves or another (by an act, omission, or acquiescence); and\n\nc) Damage to another’s property, in the sense recognized by civil law, which must be causally linked to the fraudulent conduct (the deceptive act or omission of the perpetrator) and to the resulting deception of the person who made the property disposition. It is not required that the person deceived be the same person who suffered the damage.\n\nProperty damage exists when there is a reduction or deterioration in the victim’s assets, even if the victim has an active claim to restitution. However, as an element of the objective aspect of the crime of fraud, the damage must be the direct, necessary, and exclusive result of the property disposition—namely, the act, omission, or acquiescence performed by the person deceived by the perpetrator’s fraudulent conduct.\n\nThere must therefore be a causal connection between the perpetrator’s deceptive behavior and the deception it caused, as well as between this deception and the resulting property damage, which must be the direct, necessary, and exclusive outcome of the deception and of the act, omission, or acquiescence of the deceived person.\n\nThe term “facts” refers to real circumstances relating to the past or present, and not to those expected to occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts relating to the present or the past, in such a way as to create the impression of future fulfillment, based on the false present situation presented by a perpetrator who has already made the decision not to fulfill their obligation, then the crime of fraud is established.\n\nThe time of commission of the fraud is considered to be the moment when the perpetrator acted and completed their deceptive conduct—that is, when they made the false representations that deceived the victim or a third party. Any later time at which the victim’s financial loss actually occurred—thus completing the fraud—or the time when the deceived person performed the harmful act or omission, is irrelevant.'],
117
+ ]
118
+ scores = model.predict(pairs)
119
+ print(scores.shape)
120
+ # (5,)
121
+
122
+ # Or rank different texts based on similarity to a single text
123
+ ranks = model.rank(
124
+ 'What does dolus refer to?',
125
+ [
126
+ 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision, it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without requiring that the benefit actually materialize;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and performs an act, omission, or acquiescence; and\n\nc) Damage to another’s property, according to civil law, which must be causally connected to the perpetrator’s deceptive acts or omissions. It is not required that the deceived person and the person who suffered the loss be the same.\n\nThe term “facts,” within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those that will occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts relating to the present or the past, in such a way as to create the impression of future fulfillment, based on a false present situation fabricated by the perpetrator—who has already made the decision not to fulfill their obligation—then the crime of fraud is established.\n\nThe term “property” denotes the totality of a person’s economic assets possessing monetary value, while damage to property refers to its reduction—specifically, the difference between the property’s monetary value before the disposition caused by the fraudulent conduct and its value afterward. Property damage exists even if the victim has an active claim for its restitution.\n\nThe time of commission of fraud is considered to be the moment when the perpetrator acted and completed the deceptive conduct, that is, when they made the false representations which deceived the victim or a third party. Any later time at which the victim’s financial loss occurred—thus completing the fraud—or the time when the harmful act or omission of the deceived person took place, is irrelevant.\n\nThe reference to multiple modes of commission of fraud (i.e., both the misrepresentation of false facts and the concealment of true ones) may create ambiguity and contradiction, unless it is made clear from the overall findings that the offense was committed in one particular manner, and that the reference to the other merely serves to define the intent (mens rea) of the perpetrator—specifically, that the representations were false.\n\nFurthermore, a conviction must contain the specific and well-reasoned justification required by Articles 93 paragraph 3 of the Constitution and 139 of the Code of Criminal Procedure. The absence of such reasoning constitutes grounds for cassation (appeal) under Article 510 paragraph 1(d) of the Code of Criminal Procedure, when the judgment does not set out, with clarity, completeness, and consistency, the factual circumstances established by the evidence, upon which the court based its findings regarding the objective and subjective elements of the offense, the evidence supporting those findings, and the legal reasoning through which those facts were subsumed under the applicable substantive criminal provision.\n\nFor the existence of such reasoning, the explanatory and operative parts of the decision may complement each other, as they form a single, unified whole.\n\nThe existence of intent (dolus) does not generally need to be specially justified, since it is inherent in the will to bring about the factual circumstances constituting the objective elements of the offense, and it is presumed from their realization in each particular case—unless the law requires additional elements for criminal liability, such as the act being committed with knowledge of a specific circumstance (direct intent) or with the pursuit of a further purpose, i.e., the achievement of an additional result (offenses requiring a special subjective element).\n\nFurthermore, under Article 510 paragraph 1(e) of the Code of Criminal Procedure, a misapplication of substantive criminal law also constitutes grounds for cassation. Such misapplication occurs when the trial court incorrectly applies the law to the facts it has found to be true, or when the violation occurs indirectly, namely when the reasoning of the judgment—comprising the combination of its factual and operative parts and relating to the elements and identity of the offense—contains ambiguities, contradictions, or logical gaps, rendering it impossible to verify, on appeal, whether the law was applied correctly. In such cases, the judgment lacks a lawful basis.',
127
+ 'According to Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom these provisions, it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence detrimental to themselves or another; and\n\nc) Damage to another’s property, as defined under civil law, which must be causally connected to the perpetrator’s deceptive acts.\n\nFrom the above provisions, it is deduced that the crime of fraud is established both objectively and subjectively through the knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true ones, by which another person is deceived and, as a result, performs an act, omission, or acquiescence involving a disposition of property that directly and necessarily causes financial damage to the deceived person or another, with the intent that the perpetrator or another gain an unlawful benefit. It is irrelevant whether this intended benefit was ultimately achieved.\n\nThe term “facts,” within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those expected to occur in the future, such as mere promises or contractual obligations. The false fact must have existed in the past or must be a present circumstance at the time it is asserted, and cannot relate to the future.\n\nHowever, when future circumstances—that is, promises or contractual obligations—are accompanied by false assurances and representations of other false facts referring to the present or past, in such a way as to create the impression of future fulfillment, based on a false present situation or supposed ability of the perpetrator, who had already made the decision not to fulfill their obligation, then the crime of fraud is established.',
128
+ 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision it follows that, for the crime of fraud to be established, the following elements are required:\n\na) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without it being necessary that the benefit actually materialize;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence that is detrimental to themselves or another; and\n\nc) Damage to another person’s property, as defined under civil law, which must be causally linked to the deceptive acts or omissions of the perpetrator. It is not required that the person deceived and the person who suffered the damage be the same individual.\n\nThe term “facts”, within the meaning of the above provision, refers to real circumstances relating to the past or present, and not to those that will occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts referring to the present or the past, in such a manner as to create the impression of future fulfillment based on a false present situation fabricated by the perpetrator, who has already formed the decision not to fulfill their obligation, the crime of fraud is established.\n\nThe term “property” refers to the totality of a person’s economic assets that possess monetary value, while damage to property means its reduction—specifically, the difference between the monetary value the property had before the disposition caused by the fraudulent conduct and the value remaining after it. Property damage exists even if the victim possesses an active claim for restitution.\n\nThe time of commission of the fraud is considered to be the moment when the perpetrator acted and completed their fraudulent conduct, namely when they made the false representations that deceived the victim or a third party. Any subsequent moment at which the victim’s damage actually occurred—thereby completing the fraud—or the time when the victim carried out the harmful act or omission, is irrelevant.',
129
+ 'According to Article 386 paragraph 1 of the Greek Penal Code, the crime of fraud is established both objectively and subjectively when a person knowingly presents false facts as true, or unlawfully conceals or suppresses true facts, thereby deceiving another person who, as a result of this deception, performs an act, omission, or acquiescence that involves a disposition of property, which directly and necessarily causes financial harm to the deceived person or another, with the intent of obtaining an unlawful benefit for themselves or another. It is irrelevant whether or not this intended benefit was actually achieved.\n\nFurthermore, within the meaning of the law, “facts” also include those referring to future events and promises, when they are accompanied by false assurances and representations of other false facts relating to the past or present, in such a manner as to create the impression of future fulfillment, based on the false situation presented by the perpetrator, who has already made the decision not to fulfill their obligation.',
130
+ 'According to the provision of Article 386 paragraph 1 of the Greek Penal Code,\n\n"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another person’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."\n\nFrom this provision, it follows that for the crime of fraud to be established, the following elements are required:\n\na) Intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, regardless of whether this benefit was actually realized;\n\nb) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which, as a causal factor, someone is deceived and acts in a way that is detrimental to themselves or another (by an act, omission, or acquiescence); and\n\nc) Damage to another’s property, in the sense recognized by civil law, which must be causally linked to the fraudulent conduct (the deceptive act or omission of the perpetrator) and to the resulting deception of the person who made the property disposition. It is not required that the person deceived be the same person who suffered the damage.\n\nProperty damage exists when there is a reduction or deterioration in the victim’s assets, even if the victim has an active claim to restitution. However, as an element of the objective aspect of the crime of fraud, the damage must be the direct, necessary, and exclusive result of the property disposition—namely, the act, omission, or acquiescence performed by the person deceived by the perpetrator’s fraudulent conduct.\n\nThere must therefore be a causal connection between the perpetrator’s deceptive behavior and the deception it caused, as well as between this deception and the resulting property damage, which must be the direct, necessary, and exclusive outcome of the deception and of the act, omission, or acquiescence of the deceived person.\n\nThe term “facts” refers to real circumstances relating to the past or present, and not to those expected to occur in the future, such as mere promises or contractual obligations. However, when such promises or obligations are accompanied by false assurances and representations of other false facts relating to the present or the past, in such a way as to create the impression of future fulfillment, based on the false present situation presented by a perpetrator who has already made the decision not to fulfill their obligation, then the crime of fraud is established.\n\nThe time of commission of the fraud is considered to be the moment when the perpetrator acted and completed their deceptive conduct—that is, when they made the false representations that deceived the victim or a third party. Any later time at which the victim’s financial loss actually occurred—thus completing the fraud—or the time when the deceived person performed the harmful act or omission, is irrelevant.',
131
+ ]
132
+ )
133
+ # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
134
+ ```
135
+
136
+ <!--
137
+ ### Direct Usage (Transformers)
138
+
139
+ <details><summary>Click to see the direct usage in Transformers</summary>
140
+
141
+ </details>
142
+ -->
143
+
144
+ <!--
145
+ ### Downstream Usage (Sentence Transformers)
146
+
147
+ You can finetune this model on your own dataset.
148
+
149
+ <details><summary>Click to expand</summary>
150
+
151
+ </details>
152
+ -->
153
+
154
+ <!--
155
+ ### Out-of-Scope Use
156
+
157
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
158
+ -->
159
+
160
+ ## Evaluation
161
+
162
+ ### Metrics
163
+
164
+ #### Cross Encoder Reranking
165
+
166
+ * Dataset: `gooaq-dev`
167
+ * Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
168
+ ```json
169
+ {
170
+ "at_k": 10,
171
+ "always_rerank_positives": false
172
+ }
173
+ ```
174
+
175
+ | Metric | Value |
176
+ |:------------|:---------------------|
177
+ | map | 0.6246 (-0.0024) |
178
+ | mrr@10 | 0.6246 (-0.0024) |
179
+ | **ndcg@10** | **0.7177 (-0.0021)** |
180
+
181
+ #### Cross Encoder Classification
182
+
183
+ * Dataset: `cls-dev`
184
+ * Evaluated with [<code>CrossEncoderClassificationEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderClassificationEvaluator)
185
+
186
+ | Metric | Value |
187
+ |:----------------------|:--------|
188
+ | accuracy | 0.9524 |
189
+ | accuracy_threshold | 0.0001 |
190
+ | f1 | 0.9756 |
191
+ | f1_threshold | 0.0001 |
192
+ | precision | 1.0 |
193
+ | recall | 0.9524 |
194
+ | **average_precision** | **1.0** |
195
+
196
+ <!--
197
+ ## Bias, Risks and Limitations
198
+
199
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
200
+ -->
201
+
202
+ <!--
203
+ ### Recommendations
204
+
205
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
206
+ -->
207
+
208
+ ## Training Details
209
+
210
+ ### Training Dataset
211
+
212
+ #### Unnamed Dataset
213
+
214
+ * Size: 443 training samples
215
+ * Columns: <code>query</code>, <code>response</code>, and <code>label</code>
216
+ * Approximate statistics based on the first 443 samples:
217
+ | | query | response | label |
218
+ |:--------|:------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------|:------------------------------------------------|
219
+ | type | string | string | int |
220
+ | details | <ul><li>min: 25 characters</li><li>mean: 74.23 characters</li><li>max: 167 characters</li></ul> | <ul><li>min: 294 characters</li><li>mean: 1983.03 characters</li><li>max: 5250 characters</li></ul> | <ul><li>0: ~81.49%</li><li>1: ~18.51%</li></ul> |
221
+ * Samples:
222
+ | query | response | label |
223
+ |:---------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------|
224
+ | <code>What does dolus refer to?</code> | <code>According to the provision of Article 386 paragraph 1 of the Greek Penal Code,<br><br>"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."<br><br>From this provision, it follows that, for the crime of fraud to be established, the following elements are required:<br><br>a) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without requiring that the benefit actually materialize;<br><br>b) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is dece...</code> | <code>1</code> |
225
+ | <code>What does dolus refer to?</code> | <code>According to Article 386 paragraph 1 of the Greek Penal Code,<br><br>"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."<br><br>From these provisions, it follows that, for the crime of fraud to be established, the following elements are required:<br><br>a) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit;<br><br>b) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone is deceived and proceeds to an act, omission, or acquiescence detrimental to th...</code> | <code>0</code> |
226
+ | <code>What does dolus refer to?</code> | <code>According to the provision of Article 386 paragraph 1 of the Greek Penal Code,<br><br>"Whoever, with the intent to obtain for themselves or another an unlawful pecuniary benefit, causes damage to another’s property by persuading someone to act, omit, or tolerate something through the knowing misrepresentation of false facts as true, or through the unlawful concealment or suppression of true facts, shall be punished by imprisonment of at least three months, and if the damage caused is particularly large, by imprisonment of at least two years."<br><br>From this provision it follows that, for the crime of fraud to be established, the following elements are required:<br><br>a) The intent of the perpetrator to obtain for themselves or another an unlawful pecuniary benefit, without it being necessary that the benefit actually materialize;<br><br>b) The knowing misrepresentation of false facts as true, or the unlawful concealment or suppression of true facts, as a result of which—serving as the causal factor—someone...</code> | <code>0</code> |
227
+ * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
228
+ ```json
229
+ {
230
+ "activation_fn": "torch.nn.modules.linear.Identity",
231
+ "pos_weight": null
232
+ }
233
+ ```
234
+
235
+ ### Training Hyperparameters
236
+ #### Non-Default Hyperparameters
237
+
238
+ - `eval_strategy`: steps
239
+ - `per_device_train_batch_size`: 16
240
+ - `per_device_eval_batch_size`: 16
241
+ - `learning_rate`: 2e-05
242
+ - `num_train_epochs`: 20
243
+ - `warmup_ratio`: 0.1
244
+ - `seed`: 12
245
+ - `dataloader_num_workers`: 4
246
+ - `load_best_model_at_end`: True
247
+
248
+ #### All Hyperparameters
249
+ <details><summary>Click to expand</summary>
250
+
251
+ - `overwrite_output_dir`: False
252
+ - `do_predict`: False
253
+ - `eval_strategy`: steps
254
+ - `prediction_loss_only`: True
255
+ - `per_device_train_batch_size`: 16
256
+ - `per_device_eval_batch_size`: 16
257
+ - `per_gpu_train_batch_size`: None
258
+ - `per_gpu_eval_batch_size`: None
259
+ - `gradient_accumulation_steps`: 1
260
+ - `eval_accumulation_steps`: None
261
+ - `torch_empty_cache_steps`: None
262
+ - `learning_rate`: 2e-05
263
+ - `weight_decay`: 0.0
264
+ - `adam_beta1`: 0.9
265
+ - `adam_beta2`: 0.999
266
+ - `adam_epsilon`: 1e-08
267
+ - `max_grad_norm`: 1.0
268
+ - `num_train_epochs`: 20
269
+ - `max_steps`: -1
270
+ - `lr_scheduler_type`: linear
271
+ - `lr_scheduler_kwargs`: {}
272
+ - `warmup_ratio`: 0.1
273
+ - `warmup_steps`: 0
274
+ - `log_level`: passive
275
+ - `log_level_replica`: warning
276
+ - `log_on_each_node`: True
277
+ - `logging_nan_inf_filter`: True
278
+ - `save_safetensors`: True
279
+ - `save_on_each_node`: False
280
+ - `save_only_model`: False
281
+ - `restore_callback_states_from_checkpoint`: False
282
+ - `no_cuda`: False
283
+ - `use_cpu`: False
284
+ - `use_mps_device`: False
285
+ - `seed`: 12
286
+ - `data_seed`: None
287
+ - `jit_mode_eval`: False
288
+ - `use_ipex`: False
289
+ - `bf16`: False
290
+ - `fp16`: False
291
+ - `fp16_opt_level`: O1
292
+ - `half_precision_backend`: auto
293
+ - `bf16_full_eval`: False
294
+ - `fp16_full_eval`: False
295
+ - `tf32`: None
296
+ - `local_rank`: 0
297
+ - `ddp_backend`: None
298
+ - `tpu_num_cores`: None
299
+ - `tpu_metrics_debug`: False
300
+ - `debug`: []
301
+ - `dataloader_drop_last`: False
302
+ - `dataloader_num_workers`: 4
303
+ - `dataloader_prefetch_factor`: None
304
+ - `past_index`: -1
305
+ - `disable_tqdm`: False
306
+ - `remove_unused_columns`: True
307
+ - `label_names`: None
308
+ - `load_best_model_at_end`: True
309
+ - `ignore_data_skip`: False
310
+ - `fsdp`: []
311
+ - `fsdp_min_num_params`: 0
312
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
313
+ - `tp_size`: 0
314
+ - `fsdp_transformer_layer_cls_to_wrap`: None
315
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
316
+ - `deepspeed`: None
317
+ - `label_smoothing_factor`: 0.0
318
+ - `optim`: adamw_torch
319
+ - `optim_args`: None
320
+ - `adafactor`: False
321
+ - `group_by_length`: False
322
+ - `length_column_name`: length
323
+ - `ddp_find_unused_parameters`: None
324
+ - `ddp_bucket_cap_mb`: None
325
+ - `ddp_broadcast_buffers`: False
326
+ - `dataloader_pin_memory`: True
327
+ - `dataloader_persistent_workers`: False
328
+ - `skip_memory_metrics`: True
329
+ - `use_legacy_prediction_loop`: False
330
+ - `push_to_hub`: False
331
+ - `resume_from_checkpoint`: None
332
+ - `hub_model_id`: None
333
+ - `hub_strategy`: every_save
334
+ - `hub_private_repo`: None
335
+ - `hub_always_push`: False
336
+ - `gradient_checkpointing`: False
337
+ - `gradient_checkpointing_kwargs`: None
338
+ - `include_inputs_for_metrics`: False
339
+ - `include_for_metrics`: []
340
+ - `eval_do_concat_batches`: True
341
+ - `fp16_backend`: auto
342
+ - `push_to_hub_model_id`: None
343
+ - `push_to_hub_organization`: None
344
+ - `mp_parameters`:
345
+ - `auto_find_batch_size`: False
346
+ - `full_determinism`: False
347
+ - `torchdynamo`: None
348
+ - `ray_scope`: last
349
+ - `ddp_timeout`: 1800
350
+ - `torch_compile`: False
351
+ - `torch_compile_backend`: None
352
+ - `torch_compile_mode`: None
353
+ - `include_tokens_per_second`: False
354
+ - `include_num_input_tokens_seen`: False
355
+ - `neftune_noise_alpha`: None
356
+ - `optim_target_modules`: None
357
+ - `batch_eval_metrics`: False
358
+ - `eval_on_start`: False
359
+ - `use_liger_kernel`: False
360
+ - `eval_use_gather_object`: False
361
+ - `average_tokens_across_devices`: False
362
+ - `prompts`: None
363
+ - `batch_sampler`: batch_sampler
364
+ - `multi_dataset_batch_sampler`: proportional
365
+ - `router_mapping`: {}
366
+ - `learning_rate_mapping`: {}
367
+
368
+ </details>
369
+
370
+ ### Training Logs
371
+ | Epoch | Step | Training Loss | gooaq-dev_ndcg@10 | cls-dev_average_precision |
372
+ |:-------:|:----:|:-------------:|:-----------------:|:-------------------------:|
373
+ | -1 | -1 | - | 0.8520 (+0.1323) | - |
374
+ | 0.0357 | 1 | 0.9641 | - | - |
375
+ | 0.7143 | 20 | 0.5838 | - | - |
376
+ | 1.4286 | 40 | 0.3681 | - | - |
377
+ | 2.1429 | 60 | 0.3216 | - | - |
378
+ | 2.8571 | 80 | 0.3218 | - | - |
379
+ | 3.5714 | 100 | 0.2768 | - | - |
380
+ | 4.2857 | 120 | 0.2571 | - | - |
381
+ | 5.0 | 140 | 0.2517 | - | - |
382
+ | 5.7143 | 160 | 0.214 | - | - |
383
+ | 6.4286 | 180 | 0.1952 | - | - |
384
+ | 7.1429 | 200 | 0.1884 | - | - |
385
+ | 7.8571 | 220 | 0.1793 | - | - |
386
+ | 8.5714 | 240 | 0.1328 | - | - |
387
+ | 9.2857 | 260 | 0.1964 | - | - |
388
+ | 10.0 | 280 | 0.1444 | - | - |
389
+ | 10.7143 | 300 | 0.1408 | - | - |
390
+ | 11.4286 | 320 | 0.1308 | - | - |
391
+ | 12.1429 | 340 | 0.1147 | - | - |
392
+ | 12.8571 | 360 | 0.1495 | - | - |
393
+ | 13.5714 | 380 | 0.0991 | - | - |
394
+ | 14.2857 | 400 | 0.168 | - | - |
395
+ | 15.0 | 420 | 0.0972 | - | - |
396
+ | 15.7143 | 440 | 0.0859 | - | - |
397
+ | 16.4286 | 460 | 0.1162 | - | - |
398
+ | 17.1429 | 480 | 0.0693 | - | - |
399
+ | 17.8571 | 500 | 0.0836 | - | - |
400
+ | 18.5714 | 520 | 0.056 | - | - |
401
+ | 19.2857 | 540 | 0.0727 | - | - |
402
+ | 20.0 | 560 | 0.0715 | - | - |
403
+ | -1 | -1 | - | 0.7177 (-0.0021) | 1.0000 |
404
+
405
+
406
+ ### Framework Versions
407
+ - Python: 3.12.12
408
+ - Sentence Transformers: 5.1.1
409
+ - Transformers: 4.51.3
410
+ - PyTorch: 2.8.0+cu126
411
+ - Accelerate: 1.11.0
412
+ - Datasets: 4.0.0
413
+ - Tokenizers: 0.21.4
414
+
415
+ ## Citation
416
+
417
+ ### BibTeX
418
+
419
+ #### Sentence Transformers
420
+ ```bibtex
421
+ @inproceedings{reimers-2019-sentence-bert,
422
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
423
+ author = "Reimers, Nils and Gurevych, Iryna",
424
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
425
+ month = "11",
426
+ year = "2019",
427
+ publisher = "Association for Computational Linguistics",
428
+ url = "https://arxiv.org/abs/1908.10084",
429
+ }
430
+ ```
431
+
432
+ <!--
433
+ ## Glossary
434
+
435
+ *Clearly define terms in order to be accessible across audiences.*
436
+ -->
437
+
438
+ <!--
439
+ ## Model Card Authors
440
+
441
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
442
+ -->
443
+
444
+ <!--
445
+ ## Model Card Contact
446
+
447
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
448
+ -->
config.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "XLMRobertaForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "LABEL_0"
14
+ },
15
+ "initializer_range": 0.02,
16
+ "intermediate_size": 4096,
17
+ "label2id": {
18
+ "LABEL_0": 0
19
+ },
20
+ "layer_norm_eps": 1e-05,
21
+ "max_position_embeddings": 8194,
22
+ "model_type": "xlm-roberta",
23
+ "num_attention_heads": 16,
24
+ "num_hidden_layers": 24,
25
+ "output_past": true,
26
+ "pad_token_id": 1,
27
+ "position_embedding_type": "absolute",
28
+ "sentence_transformers": {
29
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
30
+ "version": "5.1.1"
31
+ },
32
+ "torch_dtype": "float32",
33
+ "transformers_version": "4.51.3",
34
+ "type_vocab_size": 1,
35
+ "use_cache": true,
36
+ "vocab_size": 250002
37
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce1483c47d31ce657985d4dfe229ef16a1326faa195dc0de24d2c52d60813ac5
3
+ size 2271071852
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cf44dabfaa82b1276a7af64a2ea2c76c047d560cf7bfb5711d6135382372c93d
3
+ size 17083153
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 512,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "sp_model_kwargs": {},
54
+ "tokenizer_class": "XLMRobertaTokenizer",
55
+ "unk_token": "<unk>"
56
+ }