Toxic Speech Structured Analysis (Gemma 3 1B IT)

Model Description

This model is fine-tuned from google/gemma-3-1b-it for structured toxic speech analysis in English and Turkish.

Instead of returning open-ended chat responses, it is trained to extract:

  • number_of_bad_words
  • tags
  • insult_word

The model is designed for short social text such as comments, captions, and forum-like messages.

Task

Given an input text, the model predicts:

  • The number of offensive expressions.
  • The toxicity label(s).
  • The insulting/offensive words or phrases.

Supported label set:

  • Strong Insult
  • Toxic
  • Sarcastic-Mocking
  • Strong Insult|Threat
  • Mobbing
  • Mild Insult
  • Discriminatory
  • Threat
  • Passive-Aggressive
  • Strong Insult|Discriminatory

Training Data

  • Dataset: berkeruveyik/toxic-speech-annotated-dataset
  • Languages: English and Turkish
  • Supervision format: conversational (user input + assistant structured target)

Example target format:

number_of_bad_words: 2
tags: Strong Insult
insult_word: stupid, ugly

Training Procedure

The model was trained with supervised fine-tuning using TRL (SFTTrainer) on structured conversation data.

Typical training setup:

  • max_length=512
  • num_train_epochs=10
  • per_device_train_batch_size=8
  • per_device_eval_batch_size=8
  • gradient_accumulation_steps=2
  • learning_rate=2e-5
  • lr_scheduler_type=cosine
  • warmup_ratio=0.1
  • weight_decay=0.01
  • save_strategy=epoch
  • eval_strategy=epoch
  • load_best_model_at_end=True

Inference

For better format consistency, use deterministic decoding (do_sample=False).

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

MODEL_ID = "YOUR_USERNAME/YOUR_MODEL_REPO"

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, device_map="auto", dtype="auto")

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)

text = "You are useless and disgusting."
messages = [{"role": "user", "content": text}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

out = pipe(prompt, max_new_tokens=128, do_sample=False)
print(out[0]["generated_text"][len(prompt):])

Intended Use

  • Toxicity signal extraction for moderation workflows
  • Annotation support
  • Safety analysis prototypes for EN/TR text

Limitations

  • May miss implicit abuse, sarcasm, or context-heavy harassment.
  • Extracted insult words can be incomplete or noisy.
  • Performance may drop on long inputs, slang drift, or unseen domains.
  • Should not be the sole basis for legal, HR, or punitive decisions.

Responsible Use

  • Keep human review for high-stakes cases.
  • Monitor false positives and false negatives.
  • Re-evaluate regularly across both languages and target domains.

Suggested Evaluation Metrics

  • Tag-level Precision / Recall / F1
  • Exact-match rate for full structured output
  • number_of_bad_words MAE
  • Parse-valid output rate

Credits

  • Base model: google/gemma-3-1b-it
  • Dataset: berkeruveyik/toxic-speech-annotated-dataset
  • License: Apache-2.0
Downloads last month
2
Safetensors
Model size
1.0B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for berkeruveyik/toxic-speech-finetune-with-gemma-3-1b-v1

Finetuned
(512)
this model

Dataset used to train berkeruveyik/toxic-speech-finetune-with-gemma-3-1b-v1

Space using berkeruveyik/toxic-speech-finetune-with-gemma-3-1b-v1 1