Natasha/Navec news_v1 Model Card

This repository provides a Sentence-Transformers version of Natasha/Navec word embeddings model.

The model is a format-only conversion of the original Navec vectors into the Sentence-Transformers framework.
No changes were made to the underlying embedding weights.

This allows the embeddings to be used directly with the sentence-transformers API.

Source

The original word embeddings come from the Navec project:

Navec is a compact and efficient set of Russian word embeddings trained on large Russian corpora.

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("BorisTM/natasha_navec_news_v1_1B_250K_300d_100q")
# Run inference
sentences = [
    'The weather is lovely today.',
    "It's so sunny outside!",
    'He drove to the stadium.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 300]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.6588, 0.6942],
#         [0.6588, 1.0000, 0.6521],
#         [0.6942, 0.6521, 1.0000]])

Results

Results on MTEB (rus, v1.1)

Task navec_hudlit_v1_12B_500K_300d_100q navec_news_v1_1B_250K_300d_100q
Mean (Task) 36.37 35.51
Mean (Task Type) 34.29 33.82
CEDRClassification 34.67 33.01
GeoreviewClassification 32.42 32.48
GeoreviewClusteringP2P 32.09 26.87
HeadlineClassification 54.16 61.19
InappropriatenessClassification 53.73 52.67
KinopoiskClassification 43.96 42.45
MassiveIntentClassification 54.67 49.78
MassiveScenarioClassification 59.58 53.51
MIRACLReranking 10.87 10.81
MIRACLRetrievalHardNegatives.v2 1.74 1.60
RiaNewsRetrievalHardNegatives.v2 15.42 23.46
RuBQReranking 38.00 37.70
RuBQRetrieval 5.79 5.09
RUParaPhraserSTS 41.38 41.11
RuReviewsClassification 49.74 49.21
RuSciBenchGRNTIClassification 43.40 39.97
RuSciBenchGRNTIClusteringP2P 37.76 34.14
RuSciBenchOECDClassification 34.69 31.35
RuSciBenchOECDClusteringP2P 34.08 31.35
SensitiveTopicsClassification 18.10 17.89
STS22 50.18 51.55
TERRa 53.73 53.99

License

MIT

Contact

quelquemath@gmail.com

tg: @btmalov

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support