SentenceTransformer based on google-bert/bert-base-multilingual-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 64, 'do_lower_case': False, 'architecture': 'BertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

import json
from sentence_transformers import SentenceTransformer, util

# load trained model
model = SentenceTransformer("dice-research/amharic-property-retriever-mbert")

# input field you want to test
query = "book's ቋንቋ"


# load all candidate properties from dataset
properties = set()

with open("dice-research/amharic-property-mapping", "r", encoding="utf-8") as f:
    for line in f:
        row = json.loads(line)
        properties.add(row["property_text"])

properties = list(properties)

# compute embeddings
query_emb = model.encode(query, convert_to_tensor=True, normalize_embeddings=True)
prop_emb = model.encode(properties, convert_to_tensor=True, normalize_embeddings=True)

# compute similarity
scores = util.cos_sim(query_emb, prop_emb)[0]

# get top 5 predictions
top_k = 5
top_results = scores.topk(top_k)

print("Input:", query)
print("\nTop 5 predictions:")

for idx, score in zip(top_results.indices, top_results.values):
    print(properties[idx], score.item())

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.4409
cosine_accuracy@3 0.5538
cosine_accuracy@5 0.6344
cosine_accuracy@10 0.7151
cosine_precision@1 0.4409
cosine_precision@3 0.1846
cosine_precision@5 0.1269
cosine_precision@10 0.0715
cosine_recall@1 0.4409
cosine_recall@3 0.5538
cosine_recall@5 0.6344
cosine_recall@10 0.7151
cosine_ndcg@10 0.5657
cosine_mrr@10 0.5192
cosine_map@100 0.5311

Training Details

Training Dataset

  • Size: 1,224 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 7.17 tokens
    • max: 11 tokens
    • min: 3 tokens
    • mean: 3.52 tokens
    • max: 8 tokens
  • Samples:
    anchor positive
    case's header title
    airline's ንብረቶች assets
    person's እናት mother
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Evaluation Dataset

  • Size: 186 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 186 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 7.16 tokens
    • max: 12 tokens
    • min: 3 tokens
    • mean: 3.51 tokens
    • max: 9 tokens
  • Samples:
    anchor positive
    soccer player's ዓመታት years
    case's rowclass class
    type's accessyear year
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_ratio: 0.1
  • load_best_model_at_end: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss dev-ir_cosine_ndcg@10
0.6410 25 2.67 - -
1.0 39 - 0.6878 0.4881
1.2821 50 1.6894 - -
1.9231 75 1.643 - -
2.0 78 - 0.6251 0.5394
2.5641 100 1.3576 - -
3.0 117 - 0.6248 0.5641
3.2051 125 1.2821 - -
3.8462 150 1.2421 - -
4.0 156 - 0.608 0.5657
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.9.2
  • Sentence Transformers: 5.1.2
  • Transformers: 4.57.6
  • PyTorch: 2.8.0+cu128
  • Accelerate: 1.10.1
  • Datasets: 4.5.0
  • Tokenizers: 0.22.2
Downloads last month
10
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dice-research/amharic-property-retriever-mbert

Finetuned
(955)
this model

Dataset used to train dice-research/amharic-property-retriever-mbert

Evaluation results