Natasha/Navec hudlit_v1 Model Card
This repository provides a Sentence-Transformers version of Natasha/Navec word embeddings model.
The model is a format-only conversion of the original Navec vectors into the Sentence-Transformers framework.
No changes were made to the underlying embedding weights.
This allows the embeddings to be used directly with the sentence-transformers API.
Source
The original word embeddings come from the Navec project:
- Repository: https://github.com/natasha/navec
- Authors: Natasha NLP project
- License: MIT License
Navec is a compact and efficient set of Russian word embeddings trained on large Russian corpora.
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("BorisTM/natasha_navec_hudlit_v1_12B_500K_300d_100q")
# Run inference
sentences = [
'The weather is lovely today.',
"It's so sunny outside!",
'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 300]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8068, 0.7029],
# [0.8068, 1.0000, 0.7831],
# [0.7029, 0.7831, 1.0000]])
Results
Results on MTEB (rus, v1.1)
| Task | navec_hudlit_v1_12B_500K_300d_100q |
navec_news_v1_1B_250K_300d_100q |
|---|---|---|
| Mean (Task) | 36.37 | 35.51 |
| Mean (Task Type) | 34.29 | 33.82 |
| CEDRClassification | 34.67 | 33.01 |
| GeoreviewClassification | 32.42 | 32.48 |
| GeoreviewClusteringP2P | 32.09 | 26.87 |
| HeadlineClassification | 54.16 | 61.19 |
| InappropriatenessClassification | 53.73 | 52.67 |
| KinopoiskClassification | 43.96 | 42.45 |
| MassiveIntentClassification | 54.67 | 49.78 |
| MassiveScenarioClassification | 59.58 | 53.51 |
| MIRACLReranking | 10.87 | 10.81 |
| MIRACLRetrievalHardNegatives.v2 | 1.74 | 1.60 |
| RiaNewsRetrievalHardNegatives.v2 | 15.42 | 23.46 |
| RuBQReranking | 38.00 | 37.70 |
| RuBQRetrieval | 5.79 | 5.09 |
| RUParaPhraserSTS | 41.38 | 41.11 |
| RuReviewsClassification | 49.74 | 49.21 |
| RuSciBenchGRNTIClassification | 43.40 | 39.97 |
| RuSciBenchGRNTIClusteringP2P | 37.76 | 34.14 |
| RuSciBenchOECDClassification | 34.69 | 31.35 |
| RuSciBenchOECDClusteringP2P | 34.08 | 31.35 |
| SensitiveTopicsClassification | 18.10 | 17.89 |
| STS22 | 50.18 | 51.55 |
| TERRa | 53.73 | 53.99 |
License
MIT
Contact
tg: @btmalov