snagbreac/russian-reverse-dictionary-semsearch

This sentence-transformers model has been trained on Russian definition-word pairs from dictionary data and crosswords (specifically the data presented here). As such, it can be used as a reverse dictionary when plugged into the semantic search pipeline as an encoder.

Usage (Sentence-Transformers)

Using this model becomes easy when you have sentence-transformers installed:

pip install -U sentence-transformers

Then you can use the model like this:

from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]

model = SentenceTransformer('snagbreac/rubert-base-cased-sentence-rurevdict')
embeddings = model.encode(sentences)
print(embeddings)

Usage as a Russian reverse dictionary with semantic search

For semantic search, you will require an additional list of Russian lemmas. I used a custom list compiled using Zaliznyak dictionary data and Russian Wiktionary data (with pymorphy2 morphological analysis) which I unfortunately cannot publish; however, do note that any list will do as long as it's large enough and known to contain only Russian-language lemmas.

Other than using specifically lemmas as a corpus to perform search upon, the process is quite standard for semantic search. First, you encode all the lemmas in your list as described above (try to use GPU if possible, otherwise this process might take a while). Then encode the query as well and compare it to all lemmas in the corpus, returning only the most similar entries (how many entries to return is probably up to you, I'd recommend 100). SBERT has excellent code snippets for semantic search that you can use for this.

Downloads last month
1
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for snagbreac/rubert-base-cased-sentence-rurevdict

Finetuned
(3)
this model

Dataset used to train snagbreac/rubert-base-cased-sentence-rurevdict