Natasha/Navec hudlit_v1 Model Card

This repository provides a Sentence-Transformers version of Natasha/Navec word embeddings model.

The model is a format-only conversion of the original Navec vectors into the Sentence-Transformers framework.
No changes were made to the underlying embedding weights.

This allows the embeddings to be used directly with the sentence-transformers API.

Source

The original word embeddings come from the Navec project:

Repository: https://github.com/natasha/navec
Authors: Natasha NLP project
License: MIT License

Navec is a compact and efficient set of Russian word embeddings trained on large Russian corpora.

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BorisTM/natasha_navec_hudlit_v1_12B_500K_300d_100q")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 300]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8068, 0.7029],
#         [0.8068, 1.0000, 0.7831],
#         [0.7029, 0.7831, 1.0000]])

Results

Results on MTEB (rus, v1.1)

Task	`navec_hudlit_v1_12B_500K_300d_100q`	navec_news_v1_1B_250K_300d_100q
Mean (Task)	36.37	35.51
Mean (Task Type)	34.29	33.82
CEDRClassification	34.67	33.01
GeoreviewClassification	32.42	32.48
GeoreviewClusteringP2P	32.09	26.87
HeadlineClassification	54.16	61.19
InappropriatenessClassification	53.73	52.67
KinopoiskClassification	43.96	42.45
MassiveIntentClassification	54.67	49.78
MassiveScenarioClassification	59.58	53.51
MIRACLReranking	10.87	10.81
MIRACLRetrievalHardNegatives.v2	1.74	1.60
RiaNewsRetrievalHardNegatives.v2	15.42	23.46
RuBQReranking	38.00	37.70
RuBQRetrieval	5.79	5.09
RUParaPhraserSTS	41.38	41.11
RuReviewsClassification	49.74	49.21
RuSciBenchGRNTIClassification	43.40	39.97
RuSciBenchGRNTIClusteringP2P	37.76	34.14
RuSciBenchOECDClassification	34.69	31.35
RuSciBenchOECDClusteringP2P	34.08	31.35
SensitiveTopicsClassification	18.10	17.89
STS22	50.18	51.55
TERRa	53.73	53.99

License

MIT

Contact

quelquemath@gmail.com

tg: @btmalov

Downloads last month: -; Downloads are not tracked for this model. How to track