Add paper and code links to model card

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +24 -184
README.md CHANGED
@@ -1,20 +1,22 @@
1
  ---
 
 
 
2
  language:
3
  - am
 
4
  license: mit
 
 
 
 
 
5
  tags:
6
  - sentence-transformers
7
  - cross-encoder
8
  - generated_from_trainer
9
  - dataset_size:491752
10
  - loss:BinaryCrossEntropyLoss
11
- base_model: rasyosef/roberta-base-amharic
12
- pipeline_tag: text-ranking
13
- library_name: sentence-transformers
14
- metrics:
15
- - map
16
- - mrr@10
17
- - ndcg@10
18
  model-index:
19
  - name: reranker-amharic-base
20
  results:
@@ -31,22 +33,23 @@ model-index:
31
  - type: ndcg@10
32
  value: 0.856
33
  name: Ndcg@10
34
- datasets:
35
- - rasyosef/Amharic-Passage-Retrieval-Dataset-V2
36
  ---
37
 
38
  # reranker-amharic-base
39
 
40
  This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
41
 
 
 
 
 
42
  ## Model Details
43
 
44
  ### Model Description
45
  - **Model Type:** Cross Encoder
46
- - **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) <!-- at revision b1a3d2c267262e2b82c83be9d4e59db762a5e931 -->
47
  - **Maximum Sequence Length:** 510 tokens
48
  - **Number of Output Labels:** 1 label
49
- <!-- - **Training Dataset:** Unknown -->
50
  - **Language:** am
51
  - **License:** mit
52
 
@@ -91,33 +94,10 @@ ranks = model.rank(
91
  'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
92
  ]
93
  )
94
- # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 
95
  ```
96
 
97
- <!--
98
- ### Direct Usage (Transformers)
99
-
100
- <details><summary>Click to see the direct usage in Transformers</summary>
101
-
102
- </details>
103
- -->
104
-
105
- <!--
106
- ### Downstream Usage (Sentence Transformers)
107
-
108
- You can finetune this model on your own dataset.
109
-
110
- <details><summary>Click to expand</summary>
111
-
112
- </details>
113
- -->
114
-
115
- <!--
116
- ### Out-of-Scope Use
117
-
118
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
119
- -->
120
-
121
  ## Evaluation
122
 
123
  ### Metrics
@@ -137,18 +117,6 @@ You can finetune this model on your own dataset.
137
  | mrr@10 | 0.830 |
138
  | **ndcg@10** | **0.856** |
139
 
140
- <!--
141
- ## Bias, Risks and Limitations
142
-
143
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
144
- -->
145
-
146
- <!--
147
- ### Recommendations
148
-
149
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
150
- -->
151
-
152
  ## Training Details
153
 
154
  <details>
@@ -193,125 +161,6 @@ You can finetune this model on your own dataset.
193
  - `dataloader_num_workers`: 2
194
  - `load_best_model_at_end`: True
195
 
196
- #### All Hyperparameters
197
- <details><summary>Click to expand</summary>
198
-
199
- - `overwrite_output_dir`: False
200
- - `do_predict`: False
201
- - `eval_strategy`: epoch
202
- - `prediction_loss_only`: True
203
- - `per_device_train_batch_size`: 64
204
- - `per_device_eval_batch_size`: 64
205
- - `per_gpu_train_batch_size`: None
206
- - `per_gpu_eval_batch_size`: None
207
- - `gradient_accumulation_steps`: 1
208
- - `eval_accumulation_steps`: None
209
- - `torch_empty_cache_steps`: None
210
- - `learning_rate`: 4e-05
211
- - `weight_decay`: 0.1
212
- - `adam_beta1`: 0.9
213
- - `adam_beta2`: 0.999
214
- - `adam_epsilon`: 1e-08
215
- - `max_grad_norm`: 1.0
216
- - `num_train_epochs`: 4
217
- - `max_steps`: -1
218
- - `lr_scheduler_type`: cosine
219
- - `lr_scheduler_kwargs`: {}
220
- - `warmup_ratio`: 0.05
221
- - `warmup_steps`: 0
222
- - `log_level`: passive
223
- - `log_level_replica`: warning
224
- - `log_on_each_node`: True
225
- - `logging_nan_inf_filter`: True
226
- - `save_safetensors`: True
227
- - `save_on_each_node`: False
228
- - `save_only_model`: False
229
- - `restore_callback_states_from_checkpoint`: False
230
- - `no_cuda`: False
231
- - `use_cpu`: False
232
- - `use_mps_device`: False
233
- - `seed`: 42
234
- - `data_seed`: None
235
- - `jit_mode_eval`: False
236
- - `use_ipex`: False
237
- - `bf16`: False
238
- - `fp16`: True
239
- - `fp16_opt_level`: O1
240
- - `half_precision_backend`: auto
241
- - `bf16_full_eval`: False
242
- - `fp16_full_eval`: False
243
- - `tf32`: None
244
- - `local_rank`: 0
245
- - `ddp_backend`: None
246
- - `tpu_num_cores`: None
247
- - `tpu_metrics_debug`: False
248
- - `debug`: []
249
- - `dataloader_drop_last`: False
250
- - `dataloader_num_workers`: 2
251
- - `dataloader_prefetch_factor`: None
252
- - `past_index`: -1
253
- - `disable_tqdm`: False
254
- - `remove_unused_columns`: True
255
- - `label_names`: None
256
- - `load_best_model_at_end`: True
257
- - `ignore_data_skip`: False
258
- - `fsdp`: []
259
- - `fsdp_min_num_params`: 0
260
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
261
- - `fsdp_transformer_layer_cls_to_wrap`: None
262
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
263
- - `deepspeed`: None
264
- - `label_smoothing_factor`: 0.0
265
- - `optim`: adamw_torch
266
- - `optim_args`: None
267
- - `adafactor`: False
268
- - `group_by_length`: False
269
- - `length_column_name`: length
270
- - `ddp_find_unused_parameters`: None
271
- - `ddp_bucket_cap_mb`: None
272
- - `ddp_broadcast_buffers`: False
273
- - `dataloader_pin_memory`: True
274
- - `dataloader_persistent_workers`: False
275
- - `skip_memory_metrics`: True
276
- - `use_legacy_prediction_loop`: False
277
- - `push_to_hub`: False
278
- - `resume_from_checkpoint`: None
279
- - `hub_model_id`: None
280
- - `hub_strategy`: every_save
281
- - `hub_private_repo`: None
282
- - `hub_always_push`: False
283
- - `gradient_checkpointing`: False
284
- - `gradient_checkpointing_kwargs`: None
285
- - `include_inputs_for_metrics`: False
286
- - `include_for_metrics`: []
287
- - `eval_do_concat_batches`: True
288
- - `fp16_backend`: auto
289
- - `push_to_hub_model_id`: None
290
- - `push_to_hub_organization`: None
291
- - `mp_parameters`:
292
- - `auto_find_batch_size`: False
293
- - `full_determinism`: False
294
- - `torchdynamo`: None
295
- - `ray_scope`: last
296
- - `ddp_timeout`: 1800
297
- - `torch_compile`: False
298
- - `torch_compile_backend`: None
299
- - `torch_compile_mode`: None
300
- - `include_tokens_per_second`: False
301
- - `include_num_input_tokens_seen`: False
302
- - `neftune_noise_alpha`: None
303
- - `optim_target_modules`: None
304
- - `batch_eval_metrics`: False
305
- - `eval_on_start`: False
306
- - `use_liger_kernel`: False
307
- - `eval_use_gather_object`: False
308
- - `average_tokens_across_devices`: False
309
- - `prompts`: None
310
- - `batch_sampler`: batch_sampler
311
- - `multi_dataset_batch_sampler`: proportional
312
-
313
- </details>
314
-
315
  ### Training Logs
316
  | Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
317
  |:-------:|:---------:|:-------------:|:---------------------------------:|
@@ -336,20 +185,11 @@ You can finetune this model on your own dataset.
336
 
337
  ## Citation
338
 
339
- <!--
340
- ## Glossary
341
-
342
- *Clearly define terms in order to be accessible across audiences.*
343
- -->
344
-
345
- <!--
346
- ## Model Card Authors
347
-
348
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
349
- -->
350
-
351
- <!--
352
- ## Model Card Contact
353
-
354
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
355
- -->
 
1
  ---
2
+ base_model: rasyosef/roberta-base-amharic
3
+ datasets:
4
+ - rasyosef/Amharic-Passage-Retrieval-Dataset-V2
5
  language:
6
  - am
7
+ library_name: sentence-transformers
8
  license: mit
9
+ metrics:
10
+ - map
11
+ - mrr@10
12
+ - ndcg@10
13
+ pipeline_tag: text-ranking
14
  tags:
15
  - sentence-transformers
16
  - cross-encoder
17
  - generated_from_trainer
18
  - dataset_size:491752
19
  - loss:BinaryCrossEntropyLoss
 
 
 
 
 
 
 
20
  model-index:
21
  - name: reranker-amharic-base
22
  results:
 
33
  - type: ndcg@10
34
  value: 0.856
35
  name: Ndcg@10
 
 
36
  ---
37
 
38
  # reranker-amharic-base
39
 
40
  This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
41
 
42
+ This model was presented in the paper **[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)**.
43
+
44
+ Official code repository: [https://github.com/rasyosef/amharic-neural-ir](https://github.com/rasyosef/amharic-neural-ir)
45
+
46
  ## Model Details
47
 
48
  ### Model Description
49
  - **Model Type:** Cross Encoder
50
+ - **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic)
51
  - **Maximum Sequence Length:** 510 tokens
52
  - **Number of Output Labels:** 1 label
 
53
  - **Language:** am
54
  - **License:** mit
55
 
 
94
  'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
95
  ]
96
  )
97
+ print(ranks)
98
+ # [{'corpus_id': 0, 'score': np.float32(0.9555243)}, {'corpus_id': 1, 'score': np.float32(0.0012893651)}]
99
  ```
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  ## Evaluation
102
 
103
  ### Metrics
 
117
  | mrr@10 | 0.830 |
118
  | **ndcg@10** | **0.856** |
119
 
 
 
 
 
 
 
 
 
 
 
 
 
120
  ## Training Details
121
 
122
  <details>
 
161
  - `dataloader_num_workers`: 2
162
  - `load_best_model_at_end`: True
163
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
164
  ### Training Logs
165
  | Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
166
  |:-------:|:---------:|:-------------:|:---------------------------------:|
 
185
 
186
  ## Citation
187
 
188
+ ```bibtex
189
+ @inproceedings{alemneh2026amharicir,
190
+ title = {The Multilingual Curse at the Retrieval Layer: Evidence from Amharic},
191
+ author = {Alemneh, Yosef Worku and Mekonnen, Kidist Amde and de Rijke, Maarten},
192
+ booktitle = {Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM), ACL 2026},
193
+ year = {2026},
194
+ }
195
+ ```