Add paper and code links to model card

This PR improves the model card by adding a link to the research paper "[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)" and the official source code repository. It also adds the BibTeX citation and slightly updates the usage snippet to match the examples found in the official repository.

Files changed (1) hide show

README.md +24 -184

README.md CHANGED Viewed

@@ -1,20 +1,22 @@
 ---
 language:
 - am
 license: mit
 tags:
 - sentence-transformers
 - cross-encoder
 - generated_from_trainer
 - dataset_size:491752
 - loss:BinaryCrossEntropyLoss
-base_model: rasyosef/roberta-base-amharic
-pipeline_tag: text-ranking
-library_name: sentence-transformers
-metrics:
-- map
-- mrr@10
-- ndcg@10
 model-index:
 - name: reranker-amharic-base
   results:
@@ -31,22 +33,23 @@ model-index:
     - type: ndcg@10
       value: 0.856
       name: Ndcg@10
-datasets:
-- rasyosef/Amharic-Passage-Retrieval-Dataset-V2
 ---
 # reranker-amharic-base
 This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
 ## Model Details
 ### Model Description
 - **Model Type:** Cross Encoder
-- **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) <!-- at revision b1a3d2c267262e2b82c83be9d4e59db762a5e931 -->
 - **Maximum Sequence Length:** 510 tokens
 - **Number of Output Labels:** 1 label
-<!-- - **Training Dataset:** Unknown -->
 - **Language:** am
 - **License:** mit
@@ -91,33 +94,10 @@ ranks = model.rank(
         'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
     ]
 )
-# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
 ```
-<!--
-### Direct Usage (Transformers)
-<details><summary>Click to see the direct usage in Transformers</summary>
-</details>
--->
-<!--
-### Downstream Usage (Sentence Transformers)
-You can finetune this model on your own dataset.
-<details><summary>Click to expand</summary>
-</details>
--->
-<!--
-### Out-of-Scope Use
-*List how the model may foreseeably be misused and address what users ought not to do with the model.*
--->
 ## Evaluation
 ### Metrics
@@ -137,18 +117,6 @@ You can finetune this model on your own dataset.
 | mrr@10      | 0.830     |
 | **ndcg@10** | **0.856** |
-<!--
-## Bias, Risks and Limitations
-*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
--->
-<!--
-### Recommendations
-*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
--->
 ## Training Details
 <details>
@@ -193,125 +161,6 @@ You can finetune this model on your own dataset.
 - `dataloader_num_workers`: 2
 - `load_best_model_at_end`: True
-#### All Hyperparameters
-<details><summary>Click to expand</summary>
-- `overwrite_output_dir`: False
-- `do_predict`: False
-- `eval_strategy`: epoch
-- `prediction_loss_only`: True
-- `per_device_train_batch_size`: 64
-- `per_device_eval_batch_size`: 64
-- `per_gpu_train_batch_size`: None
-- `per_gpu_eval_batch_size`: None
-- `gradient_accumulation_steps`: 1
-- `eval_accumulation_steps`: None
-- `torch_empty_cache_steps`: None
-- `learning_rate`: 4e-05
-- `weight_decay`: 0.1
-- `adam_beta1`: 0.9
-- `adam_beta2`: 0.999
-- `adam_epsilon`: 1e-08
-- `max_grad_norm`: 1.0
-- `num_train_epochs`: 4
-- `max_steps`: -1
-- `lr_scheduler_type`: cosine
-- `lr_scheduler_kwargs`: {}
-- `warmup_ratio`: 0.05
-- `warmup_steps`: 0
-- `log_level`: passive
-- `log_level_replica`: warning
-- `log_on_each_node`: True
-- `logging_nan_inf_filter`: True
-- `save_safetensors`: True
-- `save_on_each_node`: False
-- `save_only_model`: False
-- `restore_callback_states_from_checkpoint`: False
-- `no_cuda`: False
-- `use_cpu`: False
-- `use_mps_device`: False
-- `seed`: 42
-- `data_seed`: None
-- `jit_mode_eval`: False
-- `use_ipex`: False
-- `bf16`: False
-- `fp16`: True
-- `fp16_opt_level`: O1
-- `half_precision_backend`: auto
-- `bf16_full_eval`: False
-- `fp16_full_eval`: False
-- `tf32`: None
-- `local_rank`: 0
-- `ddp_backend`: None
-- `tpu_num_cores`: None
-- `tpu_metrics_debug`: False
-- `debug`: []
-- `dataloader_drop_last`: False
-- `dataloader_num_workers`: 2
-- `dataloader_prefetch_factor`: None
-- `past_index`: -1
-- `disable_tqdm`: False
-- `remove_unused_columns`: True
-- `label_names`: None
-- `load_best_model_at_end`: True
-- `ignore_data_skip`: False
-- `fsdp`: []
-- `fsdp_min_num_params`: 0
-- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
-- `fsdp_transformer_layer_cls_to_wrap`: None
-- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
-- `deepspeed`: None
-- `label_smoothing_factor`: 0.0
-- `optim`: adamw_torch
-- `optim_args`: None
-- `adafactor`: False
-- `group_by_length`: False
-- `length_column_name`: length
-- `ddp_find_unused_parameters`: None
-- `ddp_bucket_cap_mb`: None
-- `ddp_broadcast_buffers`: False
-- `dataloader_pin_memory`: True
-- `dataloader_persistent_workers`: False
-- `skip_memory_metrics`: True
-- `use_legacy_prediction_loop`: False
-- `push_to_hub`: False
-- `resume_from_checkpoint`: None
-- `hub_model_id`: None
-- `hub_strategy`: every_save
-- `hub_private_repo`: None
-- `hub_always_push`: False
-- `gradient_checkpointing`: False
-- `gradient_checkpointing_kwargs`: None
-- `include_inputs_for_metrics`: False
-- `include_for_metrics`: []
-- `eval_do_concat_batches`: True
-- `fp16_backend`: auto
-- `push_to_hub_model_id`: None
-- `push_to_hub_organization`: None
-- `mp_parameters`:
-- `auto_find_batch_size`: False
-- `full_determinism`: False
-- `torchdynamo`: None
-- `ray_scope`: last
-- `ddp_timeout`: 1800
-- `torch_compile`: False
-- `torch_compile_backend`: None
-- `torch_compile_mode`: None
-- `include_tokens_per_second`: False
-- `include_num_input_tokens_seen`: False
-- `neftune_noise_alpha`: None
-- `optim_target_modules`: None
-- `batch_eval_metrics`: False
-- `eval_on_start`: False
-- `use_liger_kernel`: False
-- `eval_use_gather_object`: False
-- `average_tokens_across_devices`: False
-- `prompts`: None
-- `batch_sampler`: batch_sampler
-- `multi_dataset_batch_sampler`: proportional
-</details>
 ### Training Logs
 | Epoch   | Step      | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
 |:-------:|:---------:|:-------------:|:---------------------------------:|
@@ -336,20 +185,11 @@ You can finetune this model on your own dataset.
 ## Citation
-<!--
-## Glossary
-*Clearly define terms in order to be accessible across audiences.*
--->
-<!--
-## Model Card Authors
-*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
--->
-<!--
-## Model Card Contact
-*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
--->

 ---
+base_model: rasyosef/roberta-base-amharic
+datasets:
+- rasyosef/Amharic-Passage-Retrieval-Dataset-V2
 language:
 - am
+library_name: sentence-transformers
 license: mit
+metrics:
+- map
+- mrr@10
+- ndcg@10
+pipeline_tag: text-ranking
 tags:
 - sentence-transformers
 - cross-encoder
 - generated_from_trainer
 - dataset_size:491752
 - loss:BinaryCrossEntropyLoss
 model-index:
 - name: reranker-amharic-base
   results:
     - type: ndcg@10
       value: 0.856
       name: Ndcg@10
 ---
 # reranker-amharic-base
 This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic) using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.
+This model was presented in the paper **[The Multilingual Curse at the Retrieval Layer: Evidence from Amharic](https://huggingface.co/papers/2605.24556)**.
+Official code repository: [https://github.com/rasyosef/amharic-neural-ir](https://github.com/rasyosef/amharic-neural-ir)
 ## Model Details
 ### Model Description
 - **Model Type:** Cross Encoder
+- **Base model:** [rasyosef/roberta-base-amharic](https://huggingface.co/rasyosef/roberta-base-amharic)
 - **Maximum Sequence Length:** 510 tokens
 - **Number of Output Labels:** 1 label
 - **Language:** am
 - **License:** mit
         'የቻይናው ፕሬዝዳንት ዚ ጂንፒንግ ከትራምፕ ጋር ባደረጉት ጉባኤ ትኩረታቸው በሁለቱ ሀገራት መካከል ለወራት ከተፈጠረ ውጥረት እና የንግድ ጦርነት በኋላ የተረገጋጋ ግንኙነትን ማስቀጠል ነበር። ከፑቲን ጋር ደግሞ ዢ ለሁለቱ አገራት ስልታዊም ሆነ ኢኮኖሚያዊ ጠቀሜታ ረጅም ጊዜ የዘለቀውን አጋርነትን ይበልጥ ማጠናከር ላይ ነበር ትኩረታቸው።',
     ]
 )
+print(ranks)
+# [{'corpus_id': 0, 'score': np.float32(0.9555243)}, {'corpus_id': 1, 'score': np.float32(0.0012893651)}]
 ```
 ## Evaluation
 ### Metrics
 | mrr@10      | 0.830     |
 | **ndcg@10** | **0.856** |
 ## Training Details
 <details>
 - `dataloader_num_workers`: 2
 - `load_best_model_at_end`: True
 ### Training Logs
 | Epoch   | Step      | Training Loss | amh-passage-retrieval-dev_ndcg@10 |
 |:-------:|:---------:|:-------------:|:---------------------------------:|
 ## Citation
+```bibtex
+@inproceedings{alemneh2026amharicir,
+  title     = {The Multilingual Curse at the Retrieval Layer: Evidence from Amharic},
+  author    = {Alemneh, Yosef Worku and Mekonnen, Kidist Amde and de Rijke, Maarten},
+  booktitle = {Proceedings of the 1st Workshop on Multilinguality in the Era of Large Language Models (MeLLM), ACL 2026},
+  year      = {2026},
+}
+```