Text Ranking
sentence-transformers
Safetensors
Amharic
xlm-roberta
cross-encoder
Generated from Trainer
dataset_size:491752
loss:BinaryCrossEntropyLoss
Eval Results (legacy)
text-embeddings-inference
Instructions to use rasyosef/reranker-amharic-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use rasyosef/reranker-amharic-base with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("rasyosef/reranker-amharic-base") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Add paper and code links to model card
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,20 +1,22 @@
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
| 2 |
language:
|
| 3 |
- am
|
|
|
|
| 4 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- sentence-transformers
|
| 7 |
- cross-encoder
|
| 8 |
- generated_from_trainer
|
| 9 |
- dataset_size:491752
|
| 10 |
- loss:BinaryCrossEntropyLoss
|
| 11 |
-
base_model: rasyosef/roberta-base-amharic
|
| 12 |
-
pipeline_tag: text-ranking
|
| 13 |
-
library_name: sentence-transformers
|
| 14 |
-
metrics:
|
| 15 |
-
- map
|
| 16 |
-
- mrr@10
|
| 17 |
-
- ndcg@10
|
| 18 |
model-index:
|
| 19 |
- name: reranker-amharic-base
|
| 20 |
results:
|
|
@@ -31,22 +33,23 @@ model-index:
|
|
| 31 |
- type: ndcg@10
|
| 32 |
value: 0.856
|
| 33 |
name: Ndcg@10
|
| 34 |
-
datasets:
|
| 35 |
-
- rasyosef/Amharic-Passage-Retrieval-Dataset-V2
|
| 36 |
---
|
| 37 |
|
| 38 |
# reranker-amharic-base
|
| 39 |
|
| 40 |
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
|
| 41 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
## Model Details
|
| 43 |
|
| 44 |
### Model Description
|
| 45 |
- **Model Type:** Cross Encoder
|
| 46 |
-
- **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic)
|
| 47 |
- **Maximum Sequence Length:** 510 tokens
|
| 48 |
- **Number of Output Labels:** 1 label
|
| 49 |
-
<!-- - **Training Dataset:** Unknown -->
|
| 50 |
- **Language:** am
|
| 51 |
- **License:** mit
|
| 52 |
|
|
@@ -91,33 +94,10 @@ ranks = model.rank(
|
|
| 91 |
'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
|
| 92 |
]
|
| 93 |
)
|
| 94 |
-
|
|
|
|
| 95 |
```
|
| 96 |
|
| 97 |
-
<!--
|
| 98 |
-
### Direct Usage (Transformers)
|
| 99 |
-
|
| 100 |
-
<details><summary>Click to see the direct usage in Transformers</summary>
|
| 101 |
-
|
| 102 |
-
</details>
|
| 103 |
-
-->
|
| 104 |
-
|
| 105 |
-
<!--
|
| 106 |
-
### Downstream Usage (Sentence Transformers)
|
| 107 |
-
|
| 108 |
-
You can finetune this model on your own dataset.
|
| 109 |
-
|
| 110 |
-
<details><summary>Click to expand</summary>
|
| 111 |
-
|
| 112 |
-
</details>
|
| 113 |
-
-->
|
| 114 |
-
|
| 115 |
-
<!--
|
| 116 |
-
### Out-of-Scope Use
|
| 117 |
-
|
| 118 |
-
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
|
| 119 |
-
-->
|
| 120 |
-
|
| 121 |
## Evaluation
|
| 122 |
|
| 123 |
### Metrics
|
|
@@ -137,18 +117,6 @@ You can finetune this model on your own dataset.
|
|
| 137 |
| mrr@10 | 0.830 |
|
| 138 |
| **ndcg@10** | **0.856** |
|
| 139 |
|
| 140 |
-
<!--
|
| 141 |
-
## Bias, Risks and Limitations
|
| 142 |
-
|
| 143 |
-
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
|
| 144 |
-
-->
|
| 145 |
-
|
| 146 |
-
<!--
|
| 147 |
-
### Recommendations
|
| 148 |
-
|
| 149 |
-
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
|
| 150 |
-
-->
|
| 151 |
-
|
| 152 |
## Training Details
|
| 153 |
|
| 154 |
<details>
|
|
@@ -193,125 +161,6 @@ You can finetune this model on your own dataset.
|
|
| 193 |
- `dataloader_num_workers`: 2
|
| 194 |
- `load_best_model_at_end`: True
|
| 195 |
|
| 196 |
-
#### All Hyperparameters
|
| 197 |
-
<details><summary>Click to expand</summary>
|
| 198 |
-
|
| 199 |
-
- `overwrite_output_dir`: False
|
| 200 |
-
- `do_predict`: False
|
| 201 |
-
- `eval_strategy`: epoch
|
| 202 |
-
- `prediction_loss_only`: True
|
| 203 |
-
- `per_device_train_batch_size`: 64
|
| 204 |
-
- `per_device_eval_batch_size`: 64
|
| 205 |
-
- `per_gpu_train_batch_size`: None
|
| 206 |
-
- `per_gpu_eval_batch_size`: None
|
| 207 |
-
- `gradient_accumulation_steps`: 1
|
| 208 |
-
- `eval_accumulation_steps`: None
|
| 209 |
-
- `torch_empty_cache_steps`: None
|
| 210 |
-
- `learning_rate`: 4e-05
|
| 211 |
-
- `weight_decay`: 0.1
|
| 212 |
-
- `adam_beta1`: 0.9
|
| 213 |
-
- `adam_beta2`: 0.999
|
| 214 |
-
- `adam_epsilon`: 1e-08
|
| 215 |
-
- `max_grad_norm`: 1.0
|
| 216 |
-
- `num_train_epochs`: 4
|
| 217 |
-
- `max_steps`: -1
|
| 218 |
-
- `lr_scheduler_type`: cosine
|
| 219 |
-
- `lr_scheduler_kwargs`: {}
|
| 220 |
-
- `warmup_ratio`: 0.05
|
| 221 |
-
- `warmup_steps`: 0
|
| 222 |
-
- `log_level`: passive
|
| 223 |
-
- `log_level_replica`: warning
|
| 224 |
-
- `log_on_each_node`: True
|
| 225 |
-
- `logging_nan_inf_filter`: True
|
| 226 |
-
- `save_safetensors`: True
|
| 227 |
-
- `save_on_each_node`: False
|
| 228 |
-
- `save_only_model`: False
|
| 229 |
-
- `restore_callback_states_from_checkpoint`: False
|
| 230 |
-
- `no_cuda`: False
|
| 231 |
-
- `use_cpu`: False
|
| 232 |
-
- `use_mps_device`: False
|
| 233 |
-
- `seed`: 42
|
| 234 |
-
- `data_seed`: None
|
| 235 |
-
- `jit_mode_eval`: False
|
| 236 |
-
- `use_ipex`: False
|
| 237 |
-
- `bf16`: False
|
| 238 |
-
- `fp16`: True
|
| 239 |
-
- `fp16_opt_level`: O1
|
| 240 |
-
- `half_precision_backend`: auto
|
| 241 |
-
- `bf16_full_eval`: False
|
| 242 |
-
- `fp16_full_eval`: False
|
| 243 |
-
- `tf32`: None
|
| 244 |
-
- `local_rank`: 0
|
| 245 |
-
- `ddp_backend`: None
|
| 246 |
-
- `tpu_num_cores`: None
|
| 247 |
-
- `tpu_metrics_debug`: False
|
| 248 |
-
- `debug`: []
|
| 249 |
-
- `dataloader_drop_last`: False
|
| 250 |
-
- `dataloader_num_workers`: 2
|
| 251 |
-
- `dataloader_prefetch_factor`: None
|
| 252 |
-
- `past_index`: -1
|
| 253 |
-
- `disable_tqdm`: False
|
| 254 |
-
- `remove_unused_columns`: True
|
| 255 |
-
- `label_names`: None
|
| 256 |
-
- `load_best_model_at_end`: True
|
| 257 |
-
- `ignore_data_skip`: False
|
| 258 |
-
- `fsdp`: []
|
| 259 |
-
- `fsdp_min_num_params`: 0
|
| 260 |
-
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
|
| 261 |
-
- `fsdp_transformer_layer_cls_to_wrap`: None
|
| 262 |
-
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
|
| 263 |
-
- `deepspeed`: None
|
| 264 |
-
- `label_smoothing_factor`: 0.0
|
| 265 |
-
- `optim`: adamw_torch
|
| 266 |
-
- `optim_args`: None
|
| 267 |
-
- `adafactor`: False
|
| 268 |
-
- `group_by_length`: False
|
| 269 |
-
- `length_column_name`: length
|
| 270 |
-
- `ddp_find_unused_parameters`: None
|
| 271 |
-
- `ddp_bucket_cap_mb`: None
|
| 272 |
-
- `ddp_broadcast_buffers`: False
|
| 273 |
-
- `dataloader_pin_memory`: True
|
| 274 |
-
- `dataloader_persistent_workers`: False
|
| 275 |
-
- `skip_memory_metrics`: True
|
| 276 |
-
- `use_legacy_prediction_loop`: False
|
| 277 |
-
- `push_to_hub`: False
|
| 278 |
-
- `resume_from_checkpoint`: None
|
| 279 |
-
- `hub_model_id`: None
|
| 280 |
-
- `hub_strategy`: every_save
|
| 281 |
-
- `hub_private_repo`: None
|
| 282 |
-
- `hub_always_push`: False
|
| 283 |
-
- `gradient_checkpointing`: False
|
| 284 |
-
- `gradient_checkpointing_kwargs`: None
|
| 285 |
-
- `include_inputs_for_metrics`: False
|
| 286 |
-
- `include_for_metrics`: []
|
| 287 |
-
- `eval_do_concat_batches`: True
|
| 288 |
-
- `fp16_backend`: auto
|
| 289 |
-
- `push_to_hub_model_id`: None
|
| 290 |
-
- `push_to_hub_organization`: None
|
| 291 |
-
- `mp_parameters`:
|
| 292 |
-
- `auto_find_batch_size`: False
|
| 293 |
-
- `full_determinism`: False
|
| 294 |
-
- `torchdynamo`: None
|
| 295 |
-
- `ray_scope`: last
|
| 296 |
-
- `ddp_timeout`: 1800
|
| 297 |
-
- `torch_compile`: False
|
| 298 |
-
- `torch_compile_backend`: None
|
| 299 |
-
- `torch_compile_mode`: None
|
| 300 |
-
- `include_tokens_per_second`: False
|
| 301 |
-
- `include_num_input_tokens_seen`: False
|
| 302 |
-
- `neftune_noise_alpha`: None
|
| 303 |
-
- `optim_target_modules`: None
|
| 304 |
-
- `batch_eval_metrics`: False
|
| 305 |
-
- `eval_on_start`: False
|
| 306 |
-
- `use_liger_kernel`: False
|
| 307 |
-
- `eval_use_gather_object`: False
|
| 308 |
-
- `average_tokens_across_devices`: False
|
| 309 |
-
- `prompts`: None
|
| 310 |
-
- `batch_sampler`: batch_sampler
|
| 311 |
-
- `multi_dataset_batch_sampler`: proportional
|
| 312 |
-
|
| 313 |
-
</details>
|
| 314 |
-
|
| 315 |
### Training Logs
|
| 316 |
| Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
|
| 317 |
|:-------:|:---------:|:-------------:|:---------------------------------:|
|
|
@@ -336,20 +185,11 @@ You can finetune this model on your own dataset.
|
|
| 336 |
|
| 337 |
## Citation
|
| 338 |
|
| 339 |
-
|
| 340 |
-
|
| 341 |
-
|
| 342 |
-
|
| 343 |
-
|
| 344 |
-
|
| 345 |
-
|
| 346 |
-
|
| 347 |
-
|
| 348 |
-
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
|
| 349 |
-
-->
|
| 350 |
-
|
| 351 |
-
<!--
|
| 352 |
-
## Model Card Contact
|
| 353 |
-
|
| 354 |
-
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
|
| 355 |
-
-->
|
|
|
|
| 1 |
---
|
| 2 |
+
base_model: rasyosef/roberta-base-amharic
|
| 3 |
+
datasets:
|
| 4 |
+
- rasyosef/Amharic-Passage-Retrieval-Dataset-V2
|
| 5 |
language:
|
| 6 |
- am
|
| 7 |
+
library_name: sentence-transformers
|
| 8 |
license: mit
|
| 9 |
+
metrics:
|
| 10 |
+
- map
|
| 11 |
+
- mrr@10
|
| 12 |
+
- ndcg@10
|
| 13 |
+
pipeline_tag: text-ranking
|
| 14 |
tags:
|
| 15 |
- sentence-transformers
|
| 16 |
- cross-encoder
|
| 17 |
- generated_from_trainer
|
| 18 |
- dataset_size:491752
|
| 19 |
- loss:BinaryCrossEntropyLoss
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
model-index:
|
| 21 |
- name: reranker-amharic-base
|
| 22 |
results:
|
|
|
|
| 33 |
- type: ndcg@10
|
| 34 |
value: 0.856
|
| 35 |
name: Ndcg@10
|
|
|
|
|
|
|
| 36 |
---
|
| 37 |
|
| 38 |
# reranker-amharic-base
|
| 39 |
|
| 40 |
This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
|
| 41 |
|
| 42 |
+
This model was presented in the paper **[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)**.
|
| 43 |
+
|
| 44 |
+
Official code repository: [https://github.com/rasyosef/amharic-neural-ir](https://github.com/rasyosef/amharic-neural-ir)
|
| 45 |
+
|
| 46 |
## Model Details
|
| 47 |
|
| 48 |
### Model Description
|
| 49 |
- **Model Type:** Cross Encoder
|
| 50 |
+
- **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic)
|
| 51 |
- **Maximum Sequence Length:** 510 tokens
|
| 52 |
- **Number of Output Labels:** 1 label
|
|
|
|
| 53 |
- **Language:** am
|
| 54 |
- **License:** mit
|
| 55 |
|
|
|
|
| 94 |
'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
|
| 95 |
]
|
| 96 |
)
|
| 97 |
+
print(ranks)
|
| 98 |
+
# [{'corpus_id': 0, 'score': np.float32(0.9555243)}, {'corpus_id': 1, 'score': np.float32(0.0012893651)}]
|
| 99 |
```
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
## Evaluation
|
| 102 |
|
| 103 |
### Metrics
|
|
|
|
| 117 |
| mrr@10 | 0.830 |
|
| 118 |
| **ndcg@10** | **0.856** |
|
| 119 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
## Training Details
|
| 121 |
|
| 122 |
<details>
|
|
|
|
| 161 |
- `dataloader_num_workers`: 2
|
| 162 |
- `load_best_model_at_end`: True
|
| 163 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
### Training Logs
|
| 165 |
| Epoch | Step | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
|
| 166 |
|:-------:|:---------:|:-------------:|:---------------------------------:|
|
|
|
|
| 185 |
|
| 186 |
## Citation
|
| 187 |
|
| 188 |
+
```bibtex
|
| 189 |
+
@inproceedings{alemneh2026amharicir,
|
| 190 |
+
title = {The Multilingual Curse at the Retrieval Layer: Evidence from Amharic},
|
| 191 |
+
author = {Alemneh, Yosef Worku and Mekonnen, Kidist Amde and de Rijke, Maarten},
|
| 192 |
+
booktitle = {Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM), ACL 2026},
|
| 193 |
+
year = {2026},
|
| 194 |
+
}
|
| 195 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|