ColeH0415 commited on
Commit
ed58e66
·
verified ·
1 Parent(s): 5413416

CE fine-tuned epoch 1/3 best_val=0.5636

Browse files
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - generated_from_trainer
7
  - dataset_size:7419
8
  - loss:BinaryCrossEntropyLoss
9
- base_model: cross-encoder/ms-marco-MiniLM-L6-v2
10
  pipeline_tag: text-ranking
11
  library_name: sentence-transformers
12
  metrics:
@@ -18,7 +18,7 @@ metrics:
18
  - recall
19
  - average_precision
20
  model-index:
21
- - name: CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
22
  results:
23
  - task:
24
  type: cross-encoder-classification
@@ -28,38 +28,38 @@ model-index:
28
  type: ce-val
29
  metrics:
30
  - type: accuracy
31
- value: 0.5587878787878788
32
  name: Accuracy
33
  - type: accuracy_threshold
34
- value: 0.892870306968689
35
  name: Accuracy Threshold
36
  - type: f1
37
- value: 0.6677471636952999
38
  name: F1
39
  - type: f1_threshold
40
- value: -8.320409774780273
41
  name: F1 Threshold
42
  - type: precision
43
- value: 0.5018270401948843
44
  name: Precision
45
  - type: recall
46
- value: 0.9975786924939467
47
  name: Recall
48
  - type: average_precision
49
- value: 0.5638184038907261
50
  name: Average Precision
51
  ---
52
 
53
- # CrossEncoder based on cross-encoder/ms-marco-MiniLM-L6-v2
54
 
55
- This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
56
 
57
  ## Model Details
58
 
59
  ### Model Description
60
  - **Model Type:** Cross Encoder
61
- - **Base model:** [cross-encoder/ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2) <!-- at revision c5ee24cb16019beea0893ab7796b1df96625c6b8 -->
62
- - **Maximum Sequence Length:** 512 tokens
63
  - **Number of Output Labels:** 1 label
64
  - **Supported Modality:** Text
65
  <!-- - **Training Dataset:** Unknown -->
@@ -77,7 +77,7 @@ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.h
77
 
78
  ```
79
  CrossEncoder(
80
- (0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'BertForSequenceClassification'})
81
  )
82
  ```
83
 
@@ -99,25 +99,25 @@ from sentence_transformers import CrossEncoder
99
  model = CrossEncoder("cross_encoder_model_id")
100
  # Get scores for pairs of inputs
101
  pairs = [
102
- ['The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.', 'Almost all scientists acknowledge that the rate of species loss is greater now than at any time in human history, with extinctions occurring at rates hundreds of times higher than background extinction rates.'],
103
- ['[S]unspot activity on the surface of our star has dropped to a new low.', "Patches of the star's surface with a lower temperature and luminosity than average are known as starspots."],
104
- ['More money is dedicated within the Department of Homeland Security to climate change than what\'s spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."', "The center works on the Internet's routing infrastructure (the SPRI program) and Domain Name System (DNSSEC), identity theft and other online criminal activity (ITTC), Internet traffic and networks research (PREDICT datasets and the DETER testbed), Department of Defense and HSARPA exercises (Livewire and Determined Promise), and wireless security in cooperation with Canada."],
105
- ['Worst-case global heating scenarios may need to be revised upwards in light of a better understanding of the role of clouds, scientists have said.', 'With this information, scientists can produce scenarios of how greenhouse gas emissions may vary in the future.'],
106
- ['Prof Adam Scaife, a climate modelling expert at the UK’s Met Office, said the evidence for a link to shrinking Arctic ice was now good: ‘The consensus points towards that being a real effect.’”', 'Some models of modern climate exhibit Arctic amplification without changes in snow and ice cover.'],
107
  ]
108
  scores = model.predict(pairs)
109
  print(scores)
110
- # [-0.0067 0.3615 -3.1055 -0.8462 0.1024]
111
 
112
  # Or rank different texts based on similarity to a single text
113
  ranks = model.rank(
114
- 'The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.',
115
  [
116
- 'Almost all scientists acknowledge that the rate of species loss is greater now than at any time in human history, with extinctions occurring at rates hundreds of times higher than background extinction rates.',
117
- "Patches of the star's surface with a lower temperature and luminosity than average are known as starspots.",
118
- "The center works on the Internet's routing infrastructure (the SPRI program) and Domain Name System (DNSSEC), identity theft and other online criminal activity (ITTC), Internet traffic and networks research (PREDICT datasets and the DETER testbed), Department of Defense and HSARPA exercises (Livewire and Determined Promise), and wireless security in cooperation with Canada.",
119
- 'With this information, scientists can produce scenarios of how greenhouse gas emissions may vary in the future.',
120
- 'Some models of modern climate exhibit Arctic amplification without changes in snow and ice cover.',
121
  ]
122
  )
123
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
@@ -158,13 +158,13 @@ You can finetune this model on your own dataset.
158
 
159
  | Metric | Value |
160
  |:----------------------|:-----------|
161
- | accuracy | 0.5588 |
162
- | accuracy_threshold | 0.8929 |
163
- | f1 | 0.6677 |
164
- | f1_threshold | -8.3204 |
165
- | precision | 0.5018 |
166
- | recall | 0.9976 |
167
- | **average_precision** | **0.5638** |
168
 
169
  <!--
170
  ## Bias, Risks and Limitations
@@ -190,13 +190,13 @@ You can finetune this model on your own dataset.
190
  | | sentence_0 | sentence_1 | label |
191
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
192
  | type | string | string | float |
193
- | details | <ul><li>min: 7 tokens</li><li>mean: 27.57 tokens</li><li>max: 82 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 33.03 tokens</li><li>max: 333 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.52</li><li>max: 1.0</li></ul> |
194
  * Samples:
195
- | sentence_0 | sentence_1 | label |
196
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
197
- | <code>The last time the planet was even four degrees warmer, Peter Brannen points out in The Ends of the World, his new history of the planet’s major extinction events, the oceans were hundreds of feet higher.</code> | <code>Almost all scientists acknowledge that the rate of species loss is greater now than at any time in human history, with extinctions occurring at rates hundreds of times higher than background extinction rates.</code> | <code>0.0</code> |
198
- | <code>[S]unspot activity on the surface of our star has dropped to a new low.</code> | <code>Patches of the star's surface with a lower temperature and luminosity than average are known as starspots.</code> | <code>1.0</code> |
199
- | <code>More money is dedicated within the Department of Homeland Security to climate change than what's spent combating "Islamist terrorists radicalizing over the Internet in the United States of America."</code> | <code>The center works on the Internet's routing infrastructure (the SPRI program) and Domain Name System (DNSSEC), identity theft and other online criminal activity (ITTC), Internet traffic and networks research (PREDICT datasets and the DETER testbed), Department of Defense and HSARPA exercises (Livewire and Determined Promise), and wireless security in cooperation with Canada.</code> | <code>1.0</code> |
200
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
201
  ```json
202
  {
@@ -208,17 +208,18 @@ You can finetune this model on your own dataset.
208
  ### Training Hyperparameters
209
  #### Non-Default Hyperparameters
210
 
211
- - `per_device_train_batch_size`: 16
212
- - `per_device_eval_batch_size`: 16
213
  - `num_train_epochs`: 1
 
214
 
215
  #### All Hyperparameters
216
  <details><summary>Click to expand</summary>
217
 
218
  - `do_predict`: False
219
  - `prediction_loss_only`: True
220
- - `per_device_train_batch_size`: 16
221
- - `per_device_eval_batch_size`: 16
222
  - `gradient_accumulation_steps`: 1
223
  - `eval_accumulation_steps`: None
224
  - `torch_empty_cache_steps`: None
@@ -246,7 +247,7 @@ You can finetune this model on your own dataset.
246
  - `seed`: 42
247
  - `data_seed`: None
248
  - `bf16`: False
249
- - `fp16`: False
250
  - `bf16_full_eval`: False
251
  - `fp16_full_eval`: False
252
  - `tf32`: None
@@ -315,13 +316,16 @@ You can finetune this model on your own dataset.
315
  </details>
316
 
317
  ### Training Logs
318
- | Epoch | Step | ce-val_average_precision |
319
- |:-----:|:----:|:------------------------:|
320
- | -1 | -1 | 0.5638 |
 
 
 
321
 
322
 
323
  ### Training Time
324
- - **Training**: 33.0 seconds
325
 
326
  ### Framework Versions
327
  - Python: 3.12.13
 
6
  - generated_from_trainer
7
  - dataset_size:7419
8
  - loss:BinaryCrossEntropyLoss
9
+ base_model: cross-encoder/nli-deberta-v3-base
10
  pipeline_tag: text-ranking
11
  library_name: sentence-transformers
12
  metrics:
 
18
  - recall
19
  - average_precision
20
  model-index:
21
+ - name: CrossEncoder based on cross-encoder/nli-deberta-v3-base
22
  results:
23
  - task:
24
  type: cross-encoder-classification
 
28
  type: ce-val
29
  metrics:
30
  - type: accuracy
31
+ value: 0.5636363636363636
32
  name: Accuracy
33
  - type: accuracy_threshold
34
+ value: 0.512444794178009
35
  name: Accuracy Threshold
36
  - type: f1
37
+ value: 0.6711297071129707
38
  name: F1
39
  - type: f1_threshold
40
+ value: 0.4493926167488098
41
  name: F1 Threshold
42
  - type: precision
43
+ value: 0.5127877237851662
44
  name: Precision
45
  - type: recall
46
+ value: 0.9709443099273608
47
  name: Recall
48
  - type: average_precision
49
+ value: 0.565654586279988
50
  name: Average Precision
51
  ---
52
 
53
+ # CrossEncoder based on cross-encoder/nli-deberta-v3-base
54
 
55
+ This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [cross-encoder/nli-deberta-v3-base](https://huggingface.co/cross-encoder/nli-deberta-v3-base) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
56
 
57
  ## Model Details
58
 
59
  ### Model Description
60
  - **Model Type:** Cross Encoder
61
+ - **Base model:** [cross-encoder/nli-deberta-v3-base](https://huggingface.co/cross-encoder/nli-deberta-v3-base) <!-- at revision 6c749ce3425cd33b46d187e45b92bbf96ee12ec7 -->
62
+ - **Maximum Sequence Length:** 256 tokens
63
  - **Number of Output Labels:** 1 label
64
  - **Supported Modality:** Text
65
  <!-- - **Training Dataset:** Unknown -->
 
77
 
78
  ```
79
  CrossEncoder(
80
+ (0): Transformer({'transformer_task': 'sequence-classification', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'logits'}}, 'module_output_name': 'scores', 'architecture': 'DebertaV2ForSequenceClassification'})
81
  )
82
  ```
83
 
 
99
  model = CrossEncoder("cross_encoder_model_id")
100
  # Get scores for pairs of inputs
101
  pairs = [
102
+ ['An independent inquiry found CRU is a small research unit with limited resources and their rigour and honesty are not in doubt.', 'The media and other scientific organisations were criticised for having "sometimes neglected" to reflect the uncertainties, doubts and assumptions of the work done by the CRU.'],
103
+ ['As president, Obama will immediately close the Mississippi River Gulf Outlet, which experts say funneled floodwater into New Orleans.', 'Levees along the MRGO and the Intracoastal Waterway were breached in approximately 20 places, directly flooding most of St. Bernard Parish and New Orleans East.'],
104
+ ['If we double atmospheric carbon dioxide[…] we’d only raise global surface temperatures by about a degree Celsius.', 'Not only do increasing carbon dioxide concentrations lead to increases in global surface temperature, but increasing global temperatures also cause increasing concentrations of carbon dioxide.'],
105
+ ['But as that upper layer warms up, the oxygen-rich waters are less likely to mix down into cooler layers of the ocean because the warm waters are less dense and do not sink as readily.', 'Water that is saltier or cooler will be denser, and will sink in relation to the surrounding water.'],
106
+ ['Less than half of published scientists endorse global warming.', 'Scientists Reach 100% Consensus on Anthropogenic Global Warming.'],
107
  ]
108
  scores = model.predict(pairs)
109
  print(scores)
110
+ # [0.5286 0.4566 0.49 0.5111 0.6522]
111
 
112
  # Or rank different texts based on similarity to a single text
113
  ranks = model.rank(
114
+ 'An independent inquiry found CRU is a small research unit with limited resources and their rigour and honesty are not in doubt.',
115
  [
116
+ 'The media and other scientific organisations were criticised for having "sometimes neglected" to reflect the uncertainties, doubts and assumptions of the work done by the CRU.',
117
+ 'Levees along the MRGO and the Intracoastal Waterway were breached in approximately 20 places, directly flooding most of St. Bernard Parish and New Orleans East.',
118
+ 'Not only do increasing carbon dioxide concentrations lead to increases in global surface temperature, but increasing global temperatures also cause increasing concentrations of carbon dioxide.',
119
+ 'Water that is saltier or cooler will be denser, and will sink in relation to the surrounding water.',
120
+ 'Scientists Reach 100% Consensus on Anthropogenic Global Warming.',
121
  ]
122
  )
123
  # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
158
 
159
  | Metric | Value |
160
  |:----------------------|:-----------|
161
+ | accuracy | 0.5636 |
162
+ | accuracy_threshold | 0.5124 |
163
+ | f1 | 0.6711 |
164
+ | f1_threshold | 0.4494 |
165
+ | precision | 0.5128 |
166
+ | recall | 0.9709 |
167
+ | **average_precision** | **0.5657** |
168
 
169
  <!--
170
  ## Bias, Risks and Limitations
 
190
  | | sentence_0 | sentence_1 | label |
191
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------|
192
  | type | string | string | float |
193
+ | details | <ul><li>min: 7 tokens</li><li>mean: 26.73 tokens</li><li>max: 66 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 31.55 tokens</li><li>max: 133 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.53</li><li>max: 1.0</li></ul> |
194
  * Samples:
195
+ | sentence_0 | sentence_1 | label |
196
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------|
197
+ | <code>An independent inquiry found CRU is a small research unit with limited resources and their rigour and honesty are not in doubt.</code> | <code>The media and other scientific organisations were criticised for having "sometimes neglected" to reflect the uncertainties, doubts and assumptions of the work done by the CRU.</code> | <code>0.0</code> |
198
+ | <code>As president, Obama will immediately close the Mississippi River Gulf Outlet, which experts say funneled floodwater into New Orleans.</code> | <code>Levees along the MRGO and the Intracoastal Waterway were breached in approximately 20 places, directly flooding most of St. Bernard Parish and New Orleans East.</code> | <code>1.0</code> |
199
+ | <code>If we double atmospheric carbon dioxide[…] we’d only raise global surface temperatures by about a degree Celsius.</code> | <code>Not only do increasing carbon dioxide concentrations lead to increases in global surface temperature, but increasing global temperatures also cause increasing concentrations of carbon dioxide.</code> | <code>0.0</code> |
200
  * Loss: [<code>BinaryCrossEntropyLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss) with these parameters:
201
  ```json
202
  {
 
208
  ### Training Hyperparameters
209
  #### Non-Default Hyperparameters
210
 
211
+ - `per_device_train_batch_size`: 4
212
+ - `per_device_eval_batch_size`: 4
213
  - `num_train_epochs`: 1
214
+ - `fp16`: True
215
 
216
  #### All Hyperparameters
217
  <details><summary>Click to expand</summary>
218
 
219
  - `do_predict`: False
220
  - `prediction_loss_only`: True
221
+ - `per_device_train_batch_size`: 4
222
+ - `per_device_eval_batch_size`: 4
223
  - `gradient_accumulation_steps`: 1
224
  - `eval_accumulation_steps`: None
225
  - `torch_empty_cache_steps`: None
 
247
  - `seed`: 42
248
  - `data_seed`: None
249
  - `bf16`: False
250
+ - `fp16`: True
251
  - `bf16_full_eval`: False
252
  - `fp16_full_eval`: False
253
  - `tf32`: None
 
316
  </details>
317
 
318
  ### Training Logs
319
+ | Epoch | Step | Training Loss | ce-val_average_precision |
320
+ |:------:|:----:|:-------------:|:------------------------:|
321
+ | 0.2695 | 500 | 0.7103 | - |
322
+ | 0.5391 | 1000 | 0.6983 | - |
323
+ | 0.8086 | 1500 | 0.6982 | - |
324
+ | -1 | -1 | - | 0.5657 |
325
 
326
 
327
  ### Training Time
328
+ - **Training**: 7.1 minutes
329
 
330
  ### Framework Versions
331
  - Python: 3.12.13
config.json CHANGED
@@ -1,36 +1,45 @@
1
  {
2
- "add_cross_attention": false,
3
  "architectures": [
4
- "BertForSequenceClassification"
5
  ],
6
  "attention_probs_dropout_prob": 0.1,
7
- "bos_token_id": null,
8
- "classifier_dropout": null,
9
  "dtype": "float32",
10
- "eos_token_id": null,
11
- "gradient_checkpointing": false,
12
  "hidden_act": "gelu",
13
  "hidden_dropout_prob": 0.1,
14
- "hidden_size": 384,
15
  "id2label": {
16
  "0": "LABEL_0"
17
  },
18
  "initializer_range": 0.02,
19
- "intermediate_size": 1536,
20
- "is_decoder": false,
21
  "label2id": {
22
  "LABEL_0": 0
23
  },
24
- "layer_norm_eps": 1e-12,
 
25
  "max_position_embeddings": 512,
26
- "model_type": "bert",
 
 
27
  "num_attention_heads": 12,
28
- "num_hidden_layers": 6,
29
  "pad_token_id": 0,
30
- "position_embedding_type": "absolute",
 
 
 
 
 
 
 
 
 
 
31
  "tie_word_embeddings": true,
32
  "transformers_version": "5.0.0",
33
- "type_vocab_size": 2,
34
  "use_cache": false,
35
- "vocab_size": 30522
36
  }
 
1
  {
 
2
  "architectures": [
3
+ "DebertaV2ForSequenceClassification"
4
  ],
5
  "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 1,
 
7
  "dtype": "float32",
8
+ "eos_token_id": 2,
 
9
  "hidden_act": "gelu",
10
  "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
  "id2label": {
13
  "0": "LABEL_0"
14
  },
15
  "initializer_range": 0.02,
16
+ "intermediate_size": 3072,
 
17
  "label2id": {
18
  "LABEL_0": 0
19
  },
20
+ "layer_norm_eps": 1e-07,
21
+ "legacy": true,
22
  "max_position_embeddings": 512,
23
+ "max_relative_positions": -1,
24
+ "model_type": "deberta-v2",
25
+ "norm_rel_ebd": "layer_norm",
26
  "num_attention_heads": 12,
27
+ "num_hidden_layers": 12,
28
  "pad_token_id": 0,
29
+ "pooler_dropout": 0,
30
+ "pooler_hidden_act": "gelu",
31
+ "pooler_hidden_size": 768,
32
+ "pos_att_type": [
33
+ "p2c",
34
+ "c2p"
35
+ ],
36
+ "position_biased_input": false,
37
+ "position_buckets": 256,
38
+ "relative_attention": true,
39
+ "share_att_key": true,
40
  "tie_word_embeddings": true,
41
  "transformers_version": "5.0.0",
42
+ "type_vocab_size": 0,
43
  "use_cache": false,
44
+ "vocab_size": 128100
45
  }
config_sentence_transformers.json CHANGED
@@ -4,7 +4,7 @@
4
  "sentence_transformers": "5.4.1",
5
  "transformers": "5.0.0"
6
  },
7
- "activation_fn": "torch.nn.modules.linear.Identity",
8
  "default_prompt_name": null,
9
  "model_type": "CrossEncoder",
10
  "prompts": {}
 
4
  "sentence_transformers": "5.4.1",
5
  "transformers": "5.0.0"
6
  },
7
+ "activation_fn": "torch.nn.modules.activation.Sigmoid",
8
  "default_prompt_name": null,
9
  "model_type": "CrossEncoder",
10
  "prompts": {}
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5677755caa216f568eba3369d5320dff550650bc5e66300950b7802b4b550ac5
3
- size 90866404
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:922a849772df9d09c87547738e9d2e009866a84a654b460b49743b396fc81a74
3
+ size 737716172
tokenizer.json CHANGED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json CHANGED
@@ -1,18 +1,21 @@
1
  {
 
2
  "backend": "tokenizers",
3
- "clean_up_tokenization_spaces": true,
 
4
  "cls_token": "[CLS]",
5
- "do_basic_tokenize": true,
6
- "do_lower_case": true,
7
  "is_local": false,
8
  "mask_token": "[MASK]",
9
- "model_max_length": 512,
10
  "model_specific_special_tokens": {},
11
- "never_split": null,
12
  "pad_token": "[PAD]",
13
  "sep_token": "[SEP]",
14
- "strip_accents": null,
15
- "tokenize_chinese_chars": true,
16
- "tokenizer_class": "BertTokenizer",
17
- "unk_token": "[UNK]"
 
 
18
  }
 
1
  {
2
+ "add_prefix_space": true,
3
  "backend": "tokenizers",
4
+ "bos_token": "[CLS]",
5
+ "clean_up_tokenization_spaces": false,
6
  "cls_token": "[CLS]",
7
+ "do_lower_case": false,
8
+ "eos_token": "[SEP]",
9
  "is_local": false,
10
  "mask_token": "[MASK]",
11
+ "model_max_length": 256,
12
  "model_specific_special_tokens": {},
 
13
  "pad_token": "[PAD]",
14
  "sep_token": "[SEP]",
15
+ "sp_model_kwargs": {},
16
+ "split_by_punct": false,
17
+ "tokenizer_class": "DebertaV2Tokenizer",
18
+ "unk_id": 3,
19
+ "unk_token": "[UNK]",
20
+ "vocab_type": "spm"
21
  }