Azerbaijani Sentence Embedding Model (v1)
The first sentence embedding model for the Azerbaijani language, designed for semantic search and RAG pipelines.
Fine-tuned from allmalab/bert-base-aze using 50,000 sentence pairs mined from Azerbaijani Wikipedia (DOLLMA dataset) with MultipleNegativesRankingLoss.
⚠️ This is an early release (v1). Formal evaluation and benchmarking in progress.
Usage
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("YOUR_USERNAME/azerbaijani-sentence-embedding-v1")
embeddings = model.encode(["Bakı Azərbaycanın paytaxtıdır"])
Training
- Base model: allmalab/bert-base-aze
- Training data: 50,000 sentence pairs from Azerbaijani Wikipedia (DOLLMA)
- Loss function: MultipleNegativesRankingLoss
- Epochs: 3
- Batch size: 32
Citation
@inproceedings{isbarov-etal-2024-open,
title = "Open foundation models for {A}zerbaijani language",
author = "Isbarov, Jafar and Huseynova, Kavsar and Mammadov, Elvin and Hajili, Mammad and Ataman, Duygu",
booktitle = "Proceedings of the First Workshop on Natural Language Processing for Turkic Languages (SIGTURK 2024)",
month = aug,
year = "2024",
address = "Bangkok, Thailand and Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.sigturk-1.2",
pages = "18--28"
@misc{sayqin-2026-azerbaijani-embeddings,
author = {Sayqin Rustamli},
title = {Azerbaijani Sentence Embedding Model v1},
address = = {Strasbourg, France}
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/sayqin/azerbaijani-sentence-embedding-v1}},
note = {First sentence embedding model for the Azerbaijani language}
}
}
- Downloads last month
- 2