Update README.md
Browse files
README.md
CHANGED
|
@@ -1,55 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
- ru
|
| 4 |
-
tags:
|
| 5 |
-
- sentiment
|
| 6 |
-
- text-classification
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
# RuBERT for Sentiment Analysis
|
| 10 |
-
Short Russian texts sentiment classification
|
| 11 |
-
|
| 12 |
-
This is a [DeepPavlov/rubert-base-cased-conversational](https://huggingface.co/DeepPavlov/rubert-base-cased-conversational) model trained on aggregated corpus of 351.797 texts.
|
| 13 |
-
|
| 14 |
-
## Labels
|
| 15 |
-
0: NEUTRAL
|
| 16 |
-
1: POSITIVE
|
| 17 |
-
2: NEGATIVE
|
| 18 |
-
|
| 19 |
-
## How to use
|
| 20 |
-
```python
|
| 21 |
-
|
| 22 |
-
import torch
|
| 23 |
-
from transformers import AutoModelForSequenceClassification
|
| 24 |
-
from transformers import BertTokenizerFast
|
| 25 |
-
|
| 26 |
-
tokenizer = BertTokenizerFast.from_pretrained('blanchefort/rubert-base-cased-sentiment')
|
| 27 |
-
model = AutoModelForSequenceClassification.from_pretrained('blanchefort/rubert-base-cased-sentiment', return_dict=True)
|
| 28 |
-
|
| 29 |
-
@torch.no_grad()
|
| 30 |
-
def predict(text):
|
| 31 |
-
inputs = tokenizer(text, max_length=512, padding=True, truncation=True, return_tensors='pt')
|
| 32 |
-
outputs = model(**inputs)
|
| 33 |
-
predicted = torch.nn.functional.softmax(outputs.logits, dim=1)
|
| 34 |
-
predicted = torch.argmax(predicted, dim=1).numpy()
|
| 35 |
-
return predicted
|
| 36 |
-
```
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
## Datasets used for model training
|
| 40 |
-
|
| 41 |
-
**[RuTweetCorp](https://study.mokoron.com/)**
|
| 42 |
-
|
| 43 |
-
> Рубцова Ю. Автоматическое построение и анализ корпуса коротких текстов (постов микроблогов) для задачи разработки и тренировки тонового классификатора //Инженерия знаний и технологии семантического веба. – 2012. – Т. 1. – С. 109-116.
|
| 44 |
-
|
| 45 |
-
**[RuReviews](https://github.com/sismetanin/rureviews)**
|
| 46 |
-
|
| 47 |
-
> RuReviews: An Automatically Annotated Sentiment Analysis Dataset for Product Reviews in Russian.
|
| 48 |
-
|
| 49 |
-
**[RuSentiment](http://text-machine.cs.uml.edu/projects/rusentiment/)**
|
| 50 |
-
|
| 51 |
-
> A. Rogers A. Romanov A. Rumshisky S. Volkova M. Gronas A. Gribov RuSentiment: An Enriched Sentiment Analysis Dataset for Social Media in Russian. Proceedings of COLING 2018.
|
| 52 |
-
|
| 53 |
-
**[Отзывы о медучреждениях](https://github.com/blanchefort/datasets/tree/master/medical_comments)**
|
| 54 |
-
|
| 55 |
-
> Датасет содержит пользовательские отзывы о медицинских учреждениях. Датасет собран в мае 2019 года с сайта prodoctorov.ru
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|