Egor-3926
/

ToxicLord

@@ -16,52 +16,56 @@ model-index:
     results:
       - task:
           type: text-classification
-          name: Text Classification
         dataset:
-          name: Internal held-out toxicity test split
           type: private
         metrics:
           - type: accuracy
             value: 0.968937125748503
-            name: Accuracy
           - type: precision
             value: 0.9309514251304697
-            name: Toxic precision
           - type: recall
             value: 0.905152224824356
-            name: Toxic recall
           - type: f1
             value: 0.9178705719374629
-            name: Toxic F1
           - type: f1
             value: 0.9493585102268198
             name: Macro F1
 ---
 # ToxicLord v1
-ToxicLord v1 is a Russian text classification model for chat moderation. It classifies messages as `clean` or `toxic` and is tuned for short Russian Telegram-style messages.
-The model is intended for assistive moderation workflows. It can make mistakes and should be used with logging, review, and project-specific thresholds.
-## Labels
 ```text
 0: clean
 1: toxic
 ```
-## Recommended Threshold
-For conservative Telegram moderation, use the toxic probability instead of only argmax:
 ```text
-toxic if P(toxic) >= 0.90
 ```
-## Evaluation
-Internal held-out test split:
 ```text
 accuracy:        0.9689
@@ -71,14 +75,14 @@ f1_toxic:        0.9179
 macro_f1:        0.9494
 ```
-External fixed benchmark samples at threshold `0.90`:
 ```text
-Telegram clean chat sample: 2/500 triggered, 0.4% trigger rate
-Toxic sample:               364/500 triggered, 72.8% trigger rate
 ```
-## Usage
 ```python
 import torch
@@ -102,19 +106,21 @@ label = "toxic" if toxic_score >= 0.90 else "clean"
 print(label, toxic_score)
 ```
-## Training Data
-The model was fine-tuned on a mixture of public Russian toxicity datasets and private moderation annotations/corrections. Raw training data, Telegram logs, user identifiers, and private annotations are not redistributed with this model.
-## Limitations
-- The model is optimized for Russian Telegram-style moderation and may not transfer well to formal text, long documents, or other languages.
-- Short insults and slurs may be classified as toxic even without broader context.
-- Sarcasm, quotes, jokes, reclaimed language, and moderation discussions can be misclassified.
-- The model should not be used as the only source of truth for irreversible moderation actions.
-## License
-This model is released under `cc-by-nc-nd-4.0`.
-Non-commercial use is allowed with attribution. Commercial use and derivative redistribution are not allowed under this license.

     results:
       - task:
           type: text-classification
+          name: Классификация текста
         dataset:
+          name: Внутренний тестовый набор токсичности
           type: private
         metrics:
           - type: accuracy
             value: 0.968937125748503
+            name: Точность
           - type: precision
             value: 0.9309514251304697
+            name: Precision токсичного класса
           - type: recall
             value: 0.905152224824356
+            name: Recall токсичного класса
           - type: f1
             value: 0.9178705719374629
+            name: F1 токсичного класса
           - type: f1
             value: 0.9493585102268198
             name: Macro F1
 ---
+<p align="center">
+  <img src="ToxicLord.png" alt="ToxicLord" width="420"/>
+</p>
 # ToxicLord v1
+ToxicLord v1 - русскоязычная модель классификации токсичности для чат-модерации. Модель классифицирует короткие сообщения как `clean` или `toxic` и настроена под стиль Telegram-чатов.
+Модель предназначена для помощи в модерации. Она может ошибаться, поэтому для реального использования рекомендуется логирование, ручная проверка спорных случаев и подбор порога под конкретное сообщество.
+## Метки
 ```text
 0: clean
 1: toxic
 ```
+## Рекомендуемый порог
+Для осторожной Telegram-модерации лучше использовать вероятность токсичного класса, а не только `argmax`:
 ```text
+toxic, если P(toxic) >= 0.90
 ```
+## Метрики
+Внутренний тестовый набор:
 ```text
 accuracy:        0.9689
 macro_f1:        0.9494
 ```
+Внешние фиксированные тестовые выборки при пороге `0.90`:
 ```text
+Обычный Telegram-чат: 2/500 срабатываний, 0.4%
+Токсичная выборка:    364/500 срабатываний, 72.8%
 ```
+## Использование
 ```python
 import torch
 print(label, toxic_score)
 ```
+## Обучающие данные
+Модель дообучалась на смеси публичных русскоязычных датасетов токсичности и приватных модерационных разметок/исправлений.
+Сырые обучающие данные, Telegram-логи, идентификаторы пользователей и приватные разметки вместе с моделью не распространяются.
+## Ограничения
+- Модель оптимизирована для русскоязычных Telegram-чатов и может хуже работать на формальных текстах, длинных документах и других языках.
+- Короткие оскорбления и токсичные ярлыки могут классифицироваться как токсичные даже без широкого контекста.
+- Сарказм, цитаты, шутки, обсуждение правил и мета-комментарии могут распознаваться неверно.
+- Модель не стоит использовать как единственный источник решения для необратимых наказаний.
+## Лицензия
+Модель опубликована под лицензией `cc-by-nc-nd-4.0`.
+Разрешено некоммерческое использование с указанием авторства. Коммерческое использование и распространение производных версий запрещены условиями лицензии.