snagbreac/russian-reverse-dictionary-semsearch
This sentence-transformers model has been trained on Russian definition-word pairs from dictionary data and crosswords (specifically the data presented here). As such, it can be used as a reverse dictionary when plugged into the semantic search pipeline as an encoder.
Usage (Sentence-Transformers)
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model like this:
from sentence_transformers import SentenceTransformer
sentences = ["This is an example sentence", "Each sentence is converted"]
model = SentenceTransformer('snagbreac/rubert-base-cased-sentence-rurevdict')
embeddings = model.encode(sentences)
print(embeddings)
Usage as a Russian reverse dictionary with semantic search
For semantic search, you will require an additional list of Russian lemmas. I used a custom list compiled using Zaliznyak dictionary data and Russian Wiktionary data (with pymorphy2 morphological analysis) which I unfortunately cannot publish; however, do note that any list will do as long as it's large enough and known to contain only Russian-language lemmas.
Other than using specifically lemmas as a corpus to perform search upon, the process is quite standard for semantic search. First, you encode all the lemmas in your list as described above (try to use GPU if possible, otherwise this process might take a while). Then encode the query as well and compare it to all lemmas in the corpus, returning only the most similar entries (how many entries to return is probably up to you, I'd recommend 100). SBERT has excellent code snippets for semantic search that you can use for this.
- Downloads last month
- 1
Model tree for snagbreac/rubert-base-cased-sentence-rurevdict
Base model
DeepPavlov/rubert-base-cased-sentence