PaliGemma 2 LoRA Adapter for Multi-Modal Hateful Content Classification
π― Model Overview
This is a LoRA (Low-Rank Adaptation) adapter fine-tuned on top of google/paligemma2-3b-pt-224 for multi-label hateful content detection on paired text + image data using the MMHS150K dataset.
β¨ Key Features
- Multi-Modal Understanding: Processes both text and images simultaneously for context-aware classification
- Multi-Label Classification: Can detect multiple types of hate speech in a single sample
- Generative Approach: Uses generative classification instead of traditional classification heads
- Efficient Fine-Tuning: LoRA adapter with only ~24MB of trainable parameters
- JSON Output: Generates structured JSON arrays for easy downstream processing
This model uses generative classification: instead of training a dedicated classification head, the model generates a strict JSON array of labels (e.g., ["racist", "sexist"]).
Model Details
Model Description
Given an image and its associated text, the model outputs a JSON array containing zero or more labels from a fixed label set. The model is trained to classify hateful memes and social media content into multiple hate speech categories.
High-level flow:
- Build a strict "return JSON only" prompt listing allowed labels.
- Feed (text + image) to the VLM.
- Generate a short response.
- Parse the first JSON array found (best-effort JSON extraction).
- Convert labels into multi-hot predictions and compute multi-label metrics.
| Property | Value |
|---|---|
| Developed by | Amirhossein Yousefi |
| Model type | Vision-Language Model (VLM) with LoRA adapter |
| Language(s) | English |
| License | MIT |
| Base model | google/paligemma2-3b-pt-224 |
| Parameters (Base) | 3B |
| Parameters (Adapter) | ~24MB |
| Input | Text + Image (224Γ224) |
| Output | JSON array of hate speech labels |
Model Sources
| Resource | Link |
|---|---|
| Repository | github.com/amirhossein-yousefi/text_image_multi_modal_vlm |
| Base Model | google/paligemma2-3b-pt-224 |
| Dataset | MMHS150K |
π·οΈ Label Classes
The model classifies content into the following 5 hate speech categories:
| Label | Description | Examples |
|---|---|---|
racist |
Content with racial discrimination | Slurs, stereotypes, dehumanization based on race/ethnicity |
sexist |
Content with gender-based discrimination | Misogyny, gender stereotypes, harassment based on gender |
homophobe |
Content with anti-LGBTQ+ discrimination | Slurs, stereotypes targeting LGBTQ+ individuals |
religion |
Content with religious discrimination | Attacks on religious groups, religious stereotypes |
otherhate |
Other forms of hateful content | Hate not covered by above categories |
Uses
β Direct Use
This model is intended for detecting and classifying hateful content in multimodal (text + image) social media posts, memes, and similar content. It can be used for:
- Content moderation systems - Automated flagging of potentially harmful content
- Research on hate speech detection - Academic studies on multi-modal hate speech
- Social media analysis - Understanding patterns of hateful content
- Dataset annotation assistance - Semi-automated labeling of hate speech datasets
- Educational purposes - Understanding how VLMs can be applied to content moderation
β οΈ Out-of-Scope Use
- Production moderation without human review: This model should not be the sole decision-maker for content removal.
- Non-English content: The model is trained on English data only.
- Single-modality analysis: Best results are achieved with both text and image inputs.
- Real-time high-stakes decisions: The model may produce errors and should not be used for legal or high-stakes decisions without human oversight.
- Surveillance or censorship: This model should not be used for mass surveillance or unjust censorship.
Bias, Risks, and Limitations
Known Limitations
- Dataset Bias: The model is trained on MMHS150K dataset which may contain biases present in the original annotations.
- Cultural Context: Performance may vary across different types of hateful content and cultural contexts.
- Error Rate: The model may produce false positives/negatives and should be used with human oversight.
- JSON Parsing: Generated JSON output may occasionally be malformed and require robust parsing.
- Temporal Bias: The model may not recognize new slurs, memes, or evolving hate speech patterns.
- Image Quality: Performance may degrade on low-quality, distorted, or heavily edited images.
Recommendations
- β Always use human review for critical content moderation decisions.
- β Validate model outputs against your specific use case before deployment.
- β Consider the cultural and contextual limitations of the training data.
- β Implement robust JSON parsing with fallback mechanisms.
- β Regularly evaluate model performance on new data distributions.
- β Combine with other moderation signals for production systems.
π How to Get Started with the Model
Installation
pip install transformers peft torch pillow accelerate
Quick Start - Load the Model
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
import torch
# Model identifiers
BASE_MODEL = "google/paligemma2-3b-pt-224"
LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora"
# Load the base model
base_model = AutoModelForImageTextToText.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float16,
device_map="auto" # or "cpu" for CPU-only inference
)
# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER)
# Load the processor
processor = AutoProcessor.from_pretrained(BASE_MODEL)
print("β
Model loaded successfully!")
Full Inference Example
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
# Load base model and adapter
BASE_MODEL = "google/paligemma2-3b-pt-224"
LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora"
processor = AutoProcessor.from_pretrained(BASE_MODEL)
base_model = AutoModelForImageTextToText.from_pretrained(
BASE_MODEL,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER)
# Prepare input
image = Image.open("path/to/image.jpg").convert("RGB")
text = "Some text to analyze"
# Create prompt
class_names = ["racist", "sexist", "homophobe", "religion", "otherhate"]
prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}"
# Generate
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result) # e.g., ["racist", "sexist"]
Note: This is a LoRA adapter and requires loading the base model first. You cannot use
AutoModel.from_pretrained()directly on the adapter.
Batch Inference
import json
import re
def parse_json_labels(response: str) -> list:
"""Extract JSON array from model response with fallback."""
try:
# Try to find JSON array in response
match = re.search(r'\[.*?\]', response)
if match:
return json.loads(match.group())
except json.JSONDecodeError:
pass
return []
def classify_batch(model, processor, images, texts, class_names):
"""Classify a batch of image-text pairs."""
results = []
for image, text in zip(images, texts):
prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}"
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
response = processor.decode(outputs[0], skip_special_tokens=True)
labels = parse_json_labels(response)
results.append(labels)
return results
Training Details
π Training Data
MMHS150K (Multi-Modal Hate Speech) - A large-scale dataset for multi-modal hate speech detection containing ~150K tweets with associated images.
| Split | Samples | Description |
|---|---|---|
| Train | ~135,000 | Training samples |
| Validation | 5,000 | Validation samples |
| Test | ~10,000 | Held-out test samples |
Dataset structure:
train.csv,val.csv,test.csvwith columns:text,image_path,labels- Labels are multi-hot encoded for: racist, sexist, homophobe, religion, otherhate
Data Source: Twitter/X posts with associated images, annotated for hate speech categories.
Training Procedure
π₯οΈ Hardware Used
| Component | Specification |
|---|---|
| GPU | NVIDIA A100 (40GB/80GB HBM2e) |
| Platform | Google Colab Pro |
| GPU Memory | 40GB+ |
| Precision | bf16 (Brain Float 16) mixed precision |
| CUDA Version | 11.8+ |
Note: The NVIDIA A100 is a data center GPU based on the Ampere architecture, offering 40GB or 80GB of HBM2e memory with 1.6TB/s bandwidth. It provides excellent performance for large VLM fine-tuning tasks.
βοΈ Training Hyperparameters
| Parameter | Value |
|---|---|
| Training regime | bf16 mixed precision |
| Optimizer | AdamW |
| Learning rate | 2e-4 |
| Batch size | 4 (with gradient accumulation) |
| Epochs | 1 |
| Max sequence length | 512 |
| Warmup steps | 100 |
π§ LoRA Configuration
| Parameter | Value |
|---|---|
| LoRA rank (r) | 4 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Task type | CAUSAL_LM |
| Bias | none |
| Trainable parameters | ~24MB |
β±οΈ Training Time & Throughput
| Metric | Value |
|---|---|
| Validation time | 458.13s (0:07:38) |
| Validation throughput | 10.914 samples/s |
| Epochs completed | 1.0 |
| Final validation loss | 0.3525 |
π Evaluation
Testing Data, Factors & Metrics
Testing Data
| Dataset | Samples | Description |
|---|---|---|
| Validation set | 5,000 | MMHS150K validation split |
| Test set | ~10,000 | MMHS150K test split |
Metrics Explained
| Metric | Description | Interpretation |
|---|---|---|
| F1 Micro | Micro-averaged F1 score across all labels | Higher is better. Gives equal weight to each sample. |
| F1 Macro | Macro-averaged F1 score (unweighted mean) | Higher is better. Gives equal weight to each class. |
| Subset Accuracy | Exact match accuracy | Higher is better. All labels must match exactly. |
| Hamming Loss | Fraction of incorrectly predicted labels | Lower is better. Measures per-label errors. |
π Results
This Model's Performance
| Split | F1 Micro | F1 Macro | Subset Accuracy | Hamming Loss |
|---|---|---|---|---|
| Validation | 0.5378 | 0.5000 | 0.4338 | 0.1422 |
| Test | 0.5404 | 0.4896 | β | β |
Comparison with Other Models in the Project
| Model | Hardware | Split | F1 Micro | F1 Macro | Subset Acc | Hamming Loss |
|---|---|---|---|---|---|---|
| Qwen2-VL 2B + LoRA | RTX 3080 (16GB) | Validation | 0.6172 | 0.5077 | 0.4366 | 0.14276 |
| PaliGemma 2 3B + LoRA (this model) | A100 | Validation | 0.5378 | 0.5000 | 0.4338 | 0.14220 |
| Qwen2-VL 2B + LoRA | RTX 3080 (16GB) | Test | 0.6110 | 0.4992 | β | β |
| PaliGemma 2 3B + LoRA (this model) | A100 | Test | 0.5404 | 0.4896 | β | β |
Note: The Qwen2-VL model was trained on a local Windows machine with NVIDIA GeForce RTX 3080 Laptop GPU (16GB VRAM), NVIDIA driver 581.57, and CUDA 13.0.
π§ Technical Specifications
Model Architecture and Objective
| Component | Description |
|---|---|
| Base Model | PaliGemma 2 (3B parameters) - a vision-language model by Google |
| Architecture | Transformer-based VLM with SigLIP vision encoder |
| Vision Encoder | SigLIP-So400m/14 |
| Text Decoder | Gemma 2B |
| Image Resolution | 224 Γ 224 pixels |
| Adapter | LoRA (Low-Rank Adaptation) |
| Objective | Generative multi-label classification via JSON array generation |
Compute Infrastructure
Hardware
| Component | Training | Inference (Recommended) |
|---|---|---|
| GPU | NVIDIA A100 (40GB) | Any GPU with 8GB+ VRAM |
| Platform | Google Colab Pro | Local / Cloud |
| Precision | bf16 | fp16 / bf16 |
| Memory | 40GB+ GPU RAM | 8GB+ GPU RAM |
Software
| Package | Version |
|---|---|
| Python | 3.8+ |
| Transformers | 4.40+ |
| PEFT | 0.17.1 |
| PyTorch | 2.0+ |
| Accelerate | 0.27+ |
| Pillow | 9.0+ |
π Citation
If you use this model, please cite:
BibTeX:
@misc{yousefi2024paligemma-hatespeech,
author = {Yousefi, Amirhossein},
title = {Multi-Modal Vision-Language Models for Hateful Content Classification},
year = {2024},
publisher = {GitHub},
howpublished = {\url{https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm}},
note = {PaliGemma 2 LoRA adapter for MMHS150K hate speech detection}
}
APA:
Yousefi, A. (2024). Multi-Modal Vision-Language Models for Hateful Content Classification. GitHub. https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm
π More Information
For more details on training, evaluation, and usage, see the GitHub repository.
Related Models
- Qwen2-VL 2B MMHS150K LoRA - Alternative VLM fine-tuned on the same dataset
π€ Model Card Authors
π§ Model Card Contact
- GitHub: amirhossein-yousefi
- Hugging Face: Amirhossein75
Framework Versions
| Framework | Version |
|---|---|
| PEFT | 0.17.1 |
| Transformers | 4.40+ |
| PyTorch | 2.0+ |
- Downloads last month
- 3
Model tree for Amirhossein75/paligemma2-3b-mmhs150k-lora
Base model
google/paligemma2-3b-pt-224Evaluation results
- F1 Micro (Test) on MMHS150Kself-reported0.540
- F1 Macro (Test) on MMHS150Kself-reported0.490
- F1 Micro (Validation) on MMHS150Kself-reported0.538
- Subset Accuracy (Validation) on MMHS150Kself-reported0.434