Toxic Speech Structured Analysis (Gemma 3 1B IT)

Model Description

This model is fine-tuned from google/gemma-3-1b-it for structured toxic speech analysis in English and Turkish.

Instead of returning open-ended chat responses, it is trained to extract:

number_of_bad_words
tags
insult_word

The model is designed for short social text such as comments, captions, and forum-like messages.

Task

Given an input text, the model predicts:

The number of offensive expressions.
The toxicity label(s).
The insulting/offensive words or phrases.

Supported label set:

Strong Insult
Toxic
Sarcastic-Mocking
Strong Insult|Threat
Mobbing
Mild Insult
Discriminatory
Threat
Passive-Aggressive
Strong Insult|Discriminatory

Training Data

Dataset: berkeruveyik/toxic-speech-annotated-dataset
Languages: English and Turkish
Supervision format: conversational (user input + assistant structured target)

Example target format:

number_of_bad_words: 2
tags: Strong Insult
insult_word: stupid, ugly

Training Procedure

The model was trained with supervised fine-tuning using TRL (SFTTrainer) on structured conversation data.

Typical training setup:

max_length=512
num_train_epochs=10
per_device_train_batch_size=8
per_device_eval_batch_size=8
gradient_accumulation_steps=2
learning_rate=2e-5
lr_scheduler_type=cosine
warmup_ratio=0.1
weight_decay=0.01
save_strategy=epoch
eval_strategy=epoch
load_best_model_at_end=True

Inference

For better format consistency, use deterministic decoding (do_sample=False).

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

MODEL_ID = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", dtype="auto")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

text = "You are useless and disgusting."
messages = [{"role": "user", "content": text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

out = pipe(prompt, max_new_tokens=128, do_sample=False)
print(out[0]["generated_text"][len(prompt):])

Intended Use

Toxicity signal extraction for moderation workflows
Annotation support
Safety analysis prototypes for EN/TR text

Limitations

May miss implicit abuse, sarcasm, or context-heavy harassment.
Extracted insult words can be incomplete or noisy.
Performance may drop on long inputs, slang drift, or unseen domains.
Should not be the sole basis for legal, HR, or punitive decisions.

Responsible Use

Keep human review for high-stakes cases.
Monitor false positives and false negatives.
Re-evaluate regularly across both languages and target domains.

Suggested Evaluation Metrics

Tag-level Precision / Recall / F1
Exact-match rate for full structured output
number_of_bad_words MAE
Parse-valid output rate

Credits

Base model: google/gemma-3-1b-it
Dataset: berkeruveyik/toxic-speech-annotated-dataset
License: Apache-2.0

Downloads last month: 2

Safetensors

Model size

1.0B params

Tensor type

BF16

Model tree for berkeruveyik/toxic-speech-finetune-with-gemma-3-1b-v1

Base model

google/gemma-3-1b-pt

Finetuned

google/gemma-3-1b-it