Toxic Speech Structured Analysis (Gemma 3 1B IT)
Model Description
This model is fine-tuned from google/gemma-3-1b-it for structured toxic speech analysis in English and Turkish.
Instead of returning open-ended chat responses, it is trained to extract:
number_of_bad_wordstagsinsult_word
The model is designed for short social text such as comments, captions, and forum-like messages.
Task
Given an input text, the model predicts:
- The number of offensive expressions.
- The toxicity label(s).
- The insulting/offensive words or phrases.
Supported label set:
Strong InsultToxicSarcastic-MockingStrong Insult|ThreatMobbingMild InsultDiscriminatoryThreatPassive-AggressiveStrong Insult|Discriminatory
Training Data
- Dataset:
berkeruveyik/toxic-speech-annotated-dataset - Languages: English and Turkish
- Supervision format: conversational (
userinput +assistantstructured target)
Example target format:
number_of_bad_words: 2
tags: Strong Insult
insult_word: stupid, ugly
Training Procedure
The model was trained with supervised fine-tuning using TRL (SFTTrainer) on structured conversation data.
Typical training setup:
max_length=512num_train_epochs=10per_device_train_batch_size=8per_device_eval_batch_size=8gradient_accumulation_steps=2learning_rate=2e-5lr_scheduler_type=cosinewarmup_ratio=0.1weight_decay=0.01save_strategy=epocheval_strategy=epochload_best_model_at_end=True
Inference
For better format consistency, use deterministic decoding (do_sample=False).
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
MODEL_ID = "YOUR_USERNAME/YOUR_MODEL_REPO"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", dtype="auto")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
text = "You are useless and disgusting."
messages = [{"role": "user", "content": text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = pipe(prompt, max_new_tokens=128, do_sample=False)
print(out[0]["generated_text"][len(prompt):])
Intended Use
- Toxicity signal extraction for moderation workflows
- Annotation support
- Safety analysis prototypes for EN/TR text
Limitations
- May miss implicit abuse, sarcasm, or context-heavy harassment.
- Extracted insult words can be incomplete or noisy.
- Performance may drop on long inputs, slang drift, or unseen domains.
- Should not be the sole basis for legal, HR, or punitive decisions.
Responsible Use
- Keep human review for high-stakes cases.
- Monitor false positives and false negatives.
- Re-evaluate regularly across both languages and target domains.
Suggested Evaluation Metrics
- Tag-level Precision / Recall / F1
- Exact-match rate for full structured output
number_of_bad_wordsMAE- Parse-valid output rate
Credits
- Base model:
google/gemma-3-1b-it - Dataset:
berkeruveyik/toxic-speech-annotated-dataset - License: Apache-2.0
- Downloads last month
- 2