nielsr HF Staff commited on
Commit
d1b0876
·
verified ·
1 Parent(s): 9317b64

Add paper and code links to model card

Browse files

This PR improves the model card by adding a link to the research paper "[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)" and the official source code repository. It also adds the BibTeX citation and slightly updates the usage snippet to match the examples found in the official repository.

Files changed (1) hide show
  1. README.md +24 -184
README.md CHANGED
@@ -1,20 +1,22 @@
1
  ---
 
 
 
2
  language:
3
  - am
 
4
  license: mit
 
 
 
 
 
5
  tags:
6
  - sentence-transformers
7
  - cross-encoder
8
  - generated_from_trainer
9
  - dataset_size:491752
10
  - loss:BinaryCrossEntropyLoss
11
- base_model: rasyosef/roberta-base-amharic
12
- pipeline_tag: text-ranking
13
- library_name: sentence-transformers
14
- metrics:
15
- - map
16
- - mrr@10
17
- - ndcg@10
18
  model-index:
19
  - name: reranker-amharic-base
20
  results:
@@ -31,22 +33,23 @@ model-index:
31
  - type: ndcg@10
32
  value: 0.856
33
  name: Ndcg@10
34
- datasets:
35
- - rasyosef/Amharic-Passage-Retrieval-Dataset-V2
36
  ---
37
 
38
  # reranker-amharic-base
39
 
40
  This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
41
 
 
 
 
 
42
  ## Model Details
43
 
44
  ### Model Description
45
  - **Model Type:** Cross Encoder
46
- - **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) <!-- at revision b1a3d2c267262e2b82c83be9d4e59db762a5e931 -->
47
  - **Maximum Sequence Length:** 510 tokens
48
  - **Number of Output Labels:** 1 label
49
- <!-- - **Training Dataset:** Unknown -->
50
  - **Language:** am
51
  - **License:** mit
52
 
@@ -91,33 +94,10 @@ ranks = model.rank(
91
  'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
92
  ]
93
  )
94
- # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
95
  ```
96
 
97
- <!--
98
- ### Direct Usage (Transformers)
99
-
100
- <details><summary>Click to see the direct usage in Transformers</summary>
101
-
102
- </details>
103
- -->
104
-
105
- <!--
106
- ### Downstream Usage (Sentence Transformers)
107
-
108
- You can finetune this model on your own dataset.
109
-
110
- <details><summary>Click to expand</summary>
111
-
112
- </details>
113
- -->
114
-
115
- <!--
116
- ### Out-of-Scope Use
117
-
118
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
119
- -->
120
-
121
  ## Evaluation
122
 
123
  ### Metrics
@@ -137,18 +117,6 @@ You can finetune this model on your own dataset.
137
  | mrr@10 | 0.830 |
138
  | **ndcg@10** | **0.856** |
139
 
140
- <!--
141
- ## Bias, Risks and Limitations
142
-
143
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
144
- -->
145
-
146
- <!--
147
- ### Recommendations
148
-
149
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
150
- -->
151
-
152
  ## Training Details
153
 
154
  <details>
@@ -193,125 +161,6 @@ You can finetune this model on your own dataset.
193
  - `dataloader_num_workers`: 2
194
  - `load_best_model_at_end`: True
195
 
196
- #### All Hyperparameters
197
- <details><summary>Click to expand</summary>
198
-
199
- - `overwrite_output_dir`: False
200
- - `do_predict`: False
201
- - `eval_strategy`: epoch
202
- - `prediction_loss_only`: True
203
- - `per_device_train_batch_size`: 64
204
- - `per_device_eval_batch_size`: 64
205
- - `per_gpu_train_batch_size`: None
206
- - `per_gpu_eval_batch_size`: None
207
- - `gradient_accumulation_steps`: 1
208
- - `eval_accumulation_steps`: None
209
- - `torch_empty_cache_steps`: None
210
- - `learning_rate`: 4e-05
211
- - `weight_decay`: 0.1
212
- - `adam_beta1`: 0.9
213
- - `adam_beta2`: 0.999
214
- - `adam_epsilon`: 1e-08
215
- - `max_grad_norm`: 1.0
216
- - `num_train_epochs`: 4
217
- - `max_steps`: -1
218
- - `lr_scheduler_type`: cosine
219
- - `lr_scheduler_kwargs`: {}
220
- - `warmup_ratio`: 0.05
221
- - `warmup_steps`: 0
222
- - `log_level`: passive
223
- - `log_level_replica`: warning
224
- - `log_on_each_node`: True
225
- - `logging_nan_inf_filter`: True
226
- - `save_safetensors`: True
227
- - `save_on_each_node`: False
228
- - `save_only_model`: False
229
- - `restore_callback_states_from_checkpoint`: False
230
- - `no_cuda`: False
231
- - `use_cpu`: False
232
- - `use_mps_device`: False
233
- - `seed`: 42
234
- - `data_seed`: None
235
- - `jit_mode_eval`: False
236
- - `use_ipex`: False
237
- - `bf16`: False
238
- - `fp16`: True
239
- - `fp16_opt_level`: O1
240
- - `half_precision_backend`: auto
241
- - `bf16_full_eval`: False
242
- - `fp16_full_eval`: False
243
- - `tf32`: None
244
- - `local_rank`: 0
245
- - `ddp_backend`: None
246
- - `tpu_num_cores`: None
247
- - `tpu_metrics_debug`: False
248
- - `debug`: []
249
- - `dataloader_drop_last`: False
250
- - `dataloader_num_workers`: 2
251
- - `dataloader_prefetch_factor`: None
252
- - `past_index`: -1
253
- - `disable_tqdm`: False
254
- - `remove_unused_columns`: True
255
- - `label_names`: None
256
- - `load_best_model_at_end`: True
257
- - `ignore_data_skip`: False
258
- - `fsdp`: []
259
- - `fsdp_min_num_params`: 0
260
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
261
- - `fsdp_transformer_layer_cls_to_wrap`: None
262
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
263
- - `deepspeed`: None
264
- - `label_smoothing_factor`: 0.0
265
- - `optim`: adamw_torch
266
- - `optim_args`: None
267
- - `adafactor`: False
268
- - `group_by_length`: False
269
- - `length_column_name`: length
270
- - `ddp_find_unused_parameters`: None
271
- - `ddp_bucket_cap_mb`: None
272
- - `ddp_broadcast_buffers`: False
273
- - `dataloader_pin_memory`: True
274
- - `dataloader_persistent_workers`: False
275
- - `skip_memory_metrics`: True
276
- - `use_legacy_prediction_loop`: False
277
- - `push_to_hub`: False
278
- - `resume_from_checkpoint`: None
279
- - `hub_model_id`: None
280
- - `hub_strategy`: every_save
281
- - `hub_private_repo`: None
282
- - `hub_always_push`: False
283
- - `gradient_checkpointing`: False
284
- - `gradient_checkpointing_kwargs`: None
285
- - `include_inputs_for_metrics`: False
286
- - `include_for_metrics`: []
287
- - `eval_do_concat_batches`: True
288
- - `fp16_backend`: auto
289
- - `push_to_hub_model_id`: None
290
- - `push_to_hub_organization`: None
291
- - `mp_parameters`:
292
- - `auto_find_batch_size`: False
293
- - `full_determinism`: False
294
- - `torchdynamo`: None
295
- - `ray_scope`: last
296
- - `ddp_timeout`: 1800
297
- - `torch_compile`: False
298
- - `torch_compile_backend`: None
299
- - `torch_compile_mode`: None
300
- - `include_tokens_per_second`: False
301
- - `include_num_input_tokens_seen`: False
302
- - `neftune_noise_alpha`: None
303
- - `optim_target_modules`: None
304
- - `batch_eval_metrics`: False
305
- - `eval_on_start`: False
306
- - `use_liger_kernel`: False
307
- - `eval_use_gather_object`: False
308
- - `average_tokens_across_devices`: False
309
- - `prompts`: None
310
- - `batch_sampler`: batch_sampler
311
- - `multi_dataset_batch_sampler`: proportional
312
-
313
- </details>
314
-
315
  ### Training Logs
316
  | Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
317
  |:-------:|:---------:|:-------------:|:---------------------------------:|
@@ -336,20 +185,11 @@ You can finetune this model on your own dataset.
336
 
337
  ## Citation
338
 
339
- <!--
340
- ## Glossary
341
-
342
- *Clearly define terms in order to be accessible across audiences.*
343
- -->
344
-
345
- <!--
346
- ## Model Card Authors
347
-
348
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
349
- -->
350
-
351
- <!--
352
- ## Model Card Contact
353
-
354
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
355
- -->
 
1
  ---
2
+ base_model: rasyosef/roberta-base-amharic
3
+ datasets:
4
+ - rasyosef/Amharic-Passage-Retrieval-Dataset-V2
5
  language:
6
  - am
7
+ library_name: sentence-transformers
8
  license: mit
9
+ metrics:
10
+ - map
11
+ - mrr@10
12
+ - ndcg@10
13
+ pipeline_tag: text-ranking
14
  tags:
15
  - sentence-transformers
16
  - cross-encoder
17
  - generated_from_trainer
18
  - dataset_size:491752
19
  - loss:BinaryCrossEntropyLoss
 
 
 
 
 
 
 
20
  model-index:
21
  - name: reranker-amharic-base
22
  results:
 
33
  - type: ndcg@10
34
  value: 0.856
35
  name: Ndcg@10
 
 
36
  ---
37
 
38
  # reranker-amharic-base
39
 
40
  This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
41
 
42
+ This model was presented in the paper **[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)**.
43
+
44
+ Official code repository: [https://github.com/rasyosef/amharic-neural-ir](https://github.com/rasyosef/amharic-neural-ir)
45
+
46
  ## Model Details
47
 
48
  ### Model Description
49
  - **Model Type:** Cross Encoder
50
+ - **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic)
51
  - **Maximum Sequence Length:** 510 tokens
52
  - **Number of Output Labels:** 1 label
 
53
  - **Language:** am
54
  - **License:** mit
55
 
 
94
  'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
95
  ]
96
  )
97
+ print(ranks)
98
+ # [{'corpus_id': 0, 'score': np.float32(0.9555243)}, {'corpus_id': 1, 'score': np.float32(0.0012893651)}]
99
  ```
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Evaluation
102
 
103
  ### Metrics
 
117
  | mrr@10 | 0.830 |
118
  | **ndcg@10** | **0.856** |
119
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## Training Details
121
 
122
  <details>
 
161
  - `dataloader_num_workers`: 2
162
  - `load_best_model_at_end`: True
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  ### Training Logs
165
  | Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
166
  |:-------:|:---------:|:-------------:|:---------------------------------:|
 
185
 
186
  ## Citation
187
 
188
+ ```bibtex
189
+ @inproceedings{alemneh2026amharicir,
190
+ title = {The Multilingual Curse at the Retrieval Layer: Evidence from Amharic},
191
+ author = {Alemneh, Yosef Worku and Mekonnen, Kidist Amde and de Rijke, Maarten},
192
+ booktitle = {Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM), ACL 2026},
193
+ year = {2026},
194
+ }
195
+ ```