ReallyNotMe commited on
Commit
64940b9
·
verified ·
1 Parent(s): cbf9345

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -55
README.md CHANGED
@@ -1,55 +0,0 @@
1
- ---
2
- language:
3
- - ru
4
- tags:
5
- - sentiment
6
- - text-classification
7
- ---
8
-
9
- # RuBERT for Sentiment Analysis
10
- Short Russian texts sentiment classification
11
-
12
- This is a [DeepPavlov/rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model trained on aggregated corpus of 351.797 texts.
13
-
14
- ## Labels
15
- 0: NEUTRAL
16
- 1: POSITIVE
17
- 2: NEGATIVE
18
-
19
- ## How to use
20
- ```python
21
-
22
- import torch
23
- from transformers import AutoModelForSequenceClassification
24
- from transformers import BertTokenizerFast
25
-
26
- tokenizer = BertTokenizerFast.from_pretrained('blanchefort/rubert-base-cased-sentiment')
27
- model = AutoModelForSequenceClassification.from_pretrained('blanchefort/rubert-base-cased-sentiment', return_dict=True)
28
-
29
- @torch.no_grad()
30
- def predict(text):
31
- inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
32
- outputs = model(**inputs)
33
- predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
34
- predicted = torch.argmax(predicted, dim=1).numpy()
35
- return predicted
36
- ```
37
-
38
-
39
- ## Datasets used for model training
40
-
41
- **[RuTweetCorp](https://study.mokoron.com/)**
42
-
43
- > Рубцова Ю. Автоматическое построение и анализ корпуса коротких текстов (постов микроблогов) для задачи разработки и тренировки тонового классификатора //Инженерия знаний и технологии семантического веба. – 2012. – Т. 1. – С. 109-116.
44
-
45
- **[RuReviews](https://github.com/sismetanin/rureviews)**
46
-
47
- > RuReviews: An Automatically Annotated Sentiment Analysis Dataset for Product Reviews in Russian.
48
-
49
- **[RuSentiment](http://text-machine.cs.uml.edu/projects/rusentiment/)**
50
-
51
- > A. Rogers A. Romanov A. Rumshisky S. Volkova M. Gronas A. Gribov RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. Proceedings of COLING 2018.
52
-
53
- **[Отзывы о медучреждениях](https://github.com/blanchefort/datasets/tree/master/medical_comments)**
54
-
55
- > Датасет содержит пользовательские отзывы о медицинских учреждениях. Датасет собран в мае 2019 года с сайта prodoctorov.ru