PaliGemma 2 LoRA Adapter for Multi-Modal Hateful Content Classification

GitHub License: MIT Python 3.8+ Hugging Face

🎯 Model Overview

This is a LoRA (Low-Rank Adaptation) adapter fine-tuned on top of google/paligemma2-3b-pt-224 for multi-label hateful content detection on paired text + image data using the MMHS150K dataset.

✨ Key Features

  • Multi-Modal Understanding: Processes both text and images simultaneously for context-aware classification
  • Multi-Label Classification: Can detect multiple types of hate speech in a single sample
  • Generative Approach: Uses generative classification instead of traditional classification heads
  • Efficient Fine-Tuning: LoRA adapter with only ~24MB of trainable parameters
  • JSON Output: Generates structured JSON arrays for easy downstream processing

This model uses generative classification: instead of training a dedicated classification head, the model generates a strict JSON array of labels (e.g., ["racist", "sexist"]).

Model Details

Model Description

Given an image and its associated text, the model outputs a JSON array containing zero or more labels from a fixed label set. The model is trained to classify hateful memes and social media content into multiple hate speech categories.

High-level flow:

  1. Build a strict "return JSON only" prompt listing allowed labels.
  2. Feed (text + image) to the VLM.
  3. Generate a short response.
  4. Parse the first JSON array found (best-effort JSON extraction).
  5. Convert labels into multi-hot predictions and compute multi-label metrics.
Property Value
Developed by Amirhossein Yousefi
Model type Vision-Language Model (VLM) with LoRA adapter
Language(s) English
License MIT
Base model google/paligemma2-3b-pt-224
Parameters (Base) 3B
Parameters (Adapter) ~24MB
Input Text + Image (224Γ—224)
Output JSON array of hate speech labels

Model Sources

🏷️ Label Classes

The model classifies content into the following 5 hate speech categories:

Label Description Examples
racist Content with racial discrimination Slurs, stereotypes, dehumanization based on race/ethnicity
sexist Content with gender-based discrimination Misogyny, gender stereotypes, harassment based on gender
homophobe Content with anti-LGBTQ+ discrimination Slurs, stereotypes targeting LGBTQ+ individuals
religion Content with religious discrimination Attacks on religious groups, religious stereotypes
otherhate Other forms of hateful content Hate not covered by above categories

Uses

βœ… Direct Use

This model is intended for detecting and classifying hateful content in multimodal (text + image) social media posts, memes, and similar content. It can be used for:

  • Content moderation systems - Automated flagging of potentially harmful content
  • Research on hate speech detection - Academic studies on multi-modal hate speech
  • Social media analysis - Understanding patterns of hateful content
  • Dataset annotation assistance - Semi-automated labeling of hate speech datasets
  • Educational purposes - Understanding how VLMs can be applied to content moderation

⚠️ Out-of-Scope Use

  • Production moderation without human review: This model should not be the sole decision-maker for content removal.
  • Non-English content: The model is trained on English data only.
  • Single-modality analysis: Best results are achieved with both text and image inputs.
  • Real-time high-stakes decisions: The model may produce errors and should not be used for legal or high-stakes decisions without human oversight.
  • Surveillance or censorship: This model should not be used for mass surveillance or unjust censorship.

Bias, Risks, and Limitations

Known Limitations

  • Dataset Bias: The model is trained on MMHS150K dataset which may contain biases present in the original annotations.
  • Cultural Context: Performance may vary across different types of hateful content and cultural contexts.
  • Error Rate: The model may produce false positives/negatives and should be used with human oversight.
  • JSON Parsing: Generated JSON output may occasionally be malformed and require robust parsing.
  • Temporal Bias: The model may not recognize new slurs, memes, or evolving hate speech patterns.
  • Image Quality: Performance may degrade on low-quality, distorted, or heavily edited images.

Recommendations

  • βœ… Always use human review for critical content moderation decisions.
  • βœ… Validate model outputs against your specific use case before deployment.
  • βœ… Consider the cultural and contextual limitations of the training data.
  • βœ… Implement robust JSON parsing with fallback mechanisms.
  • βœ… Regularly evaluate model performance on new data distributions.
  • βœ… Combine with other moderation signals for production systems.

πŸš€ How to Get Started with the Model

Installation

pip install transformers peft torch pillow accelerate

Quick Start - Load the Model

from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel
import torch

# Model identifiers
BASE_MODEL = "google/paligemma2-3b-pt-224"
LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora"

# Load the base model
base_model = AutoModelForImageTextToText.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto"  # or "cpu" for CPU-only inference
)

# Load the LoRA adapter
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER)

# Load the processor
processor = AutoProcessor.from_pretrained(BASE_MODEL)

print("βœ… Model loaded successfully!")

Full Inference Example

import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
from peft import PeftModel

# Load base model and adapter
BASE_MODEL = "google/paligemma2-3b-pt-224"
LORA_ADAPTER = "Amirhossein75/paligemma2-3b-mmhs150k-lora"

processor = AutoProcessor.from_pretrained(BASE_MODEL)
base_model = AutoModelForImageTextToText.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, LORA_ADAPTER)

# Prepare input
image = Image.open("path/to/image.jpg").convert("RGB")
text = "Some text to analyze"

# Create prompt
class_names = ["racist", "sexist", "homophobe", "religion", "otherhate"]
prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}"

# Generate
inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)  # e.g., ["racist", "sexist"]

Note: This is a LoRA adapter and requires loading the base model first. You cannot use AutoModel.from_pretrained() directly on the adapter.

Batch Inference

import json
import re

def parse_json_labels(response: str) -> list:
    """Extract JSON array from model response with fallback."""
    try:
        # Try to find JSON array in response
        match = re.search(r'\[.*?\]', response)
        if match:
            return json.loads(match.group())
    except json.JSONDecodeError:
        pass
    return []

def classify_batch(model, processor, images, texts, class_names):
    """Classify a batch of image-text pairs."""
    results = []
    for image, text in zip(images, texts):
        prompt = f"Classify the following text and image into zero or more of these labels: {class_names}. Return ONLY a JSON array of applicable labels. Text: {text}"
        inputs = processor(text=prompt, images=image, return_tensors="pt").to(model.device)
        outputs = model.generate(**inputs, max_new_tokens=64)
        response = processor.decode(outputs[0], skip_special_tokens=True)
        labels = parse_json_labels(response)
        results.append(labels)
    return results

Training Details

πŸ“Š Training Data

MMHS150K (Multi-Modal Hate Speech) - A large-scale dataset for multi-modal hate speech detection containing ~150K tweets with associated images.

Split Samples Description
Train ~135,000 Training samples
Validation 5,000 Validation samples
Test ~10,000 Held-out test samples

Dataset structure:

  • train.csv, val.csv, test.csv with columns: text, image_path, labels
  • Labels are multi-hot encoded for: racist, sexist, homophobe, religion, otherhate

Data Source: Twitter/X posts with associated images, annotated for hate speech categories.

Training Procedure

πŸ–₯️ Hardware Used

Component Specification
GPU NVIDIA A100 (40GB/80GB HBM2e)
Platform Google Colab Pro
GPU Memory 40GB+
Precision bf16 (Brain Float 16) mixed precision
CUDA Version 11.8+

Note: The NVIDIA A100 is a data center GPU based on the Ampere architecture, offering 40GB or 80GB of HBM2e memory with 1.6TB/s bandwidth. It provides excellent performance for large VLM fine-tuning tasks.

βš™οΈ Training Hyperparameters

Parameter Value
Training regime bf16 mixed precision
Optimizer AdamW
Learning rate 2e-4
Batch size 4 (with gradient accumulation)
Epochs 1
Max sequence length 512
Warmup steps 100

πŸ”§ LoRA Configuration

Parameter Value
LoRA rank (r) 4
LoRA alpha 32
LoRA dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Task type CAUSAL_LM
Bias none
Trainable parameters ~24MB

⏱️ Training Time & Throughput

Metric Value
Validation time 458.13s (0:07:38)
Validation throughput 10.914 samples/s
Epochs completed 1.0
Final validation loss 0.3525

πŸ“ˆ Evaluation

Testing Data, Factors & Metrics

Testing Data

Dataset Samples Description
Validation set 5,000 MMHS150K validation split
Test set ~10,000 MMHS150K test split

Metrics Explained

Metric Description Interpretation
F1 Micro Micro-averaged F1 score across all labels Higher is better. Gives equal weight to each sample.
F1 Macro Macro-averaged F1 score (unweighted mean) Higher is better. Gives equal weight to each class.
Subset Accuracy Exact match accuracy Higher is better. All labels must match exactly.
Hamming Loss Fraction of incorrectly predicted labels Lower is better. Measures per-label errors.

πŸ“Š Results

This Model's Performance

Split F1 Micro F1 Macro Subset Accuracy Hamming Loss
Validation 0.5378 0.5000 0.4338 0.1422
Test 0.5404 0.4896 – –

Comparison with Other Models in the Project

Model Hardware Split F1 Micro F1 Macro Subset Acc Hamming Loss
Qwen2-VL 2B + LoRA RTX 3080 (16GB) Validation 0.6172 0.5077 0.4366 0.14276
PaliGemma 2 3B + LoRA (this model) A100 Validation 0.5378 0.5000 0.4338 0.14220
Qwen2-VL 2B + LoRA RTX 3080 (16GB) Test 0.6110 0.4992 – –
PaliGemma 2 3B + LoRA (this model) A100 Test 0.5404 0.4896 – –

Note: The Qwen2-VL model was trained on a local Windows machine with NVIDIA GeForce RTX 3080 Laptop GPU (16GB VRAM), NVIDIA driver 581.57, and CUDA 13.0.

πŸ”§ Technical Specifications

Model Architecture and Objective

Component Description
Base Model PaliGemma 2 (3B parameters) - a vision-language model by Google
Architecture Transformer-based VLM with SigLIP vision encoder
Vision Encoder SigLIP-So400m/14
Text Decoder Gemma 2B
Image Resolution 224 Γ— 224 pixels
Adapter LoRA (Low-Rank Adaptation)
Objective Generative multi-label classification via JSON array generation

Compute Infrastructure

Hardware

Component Training Inference (Recommended)
GPU NVIDIA A100 (40GB) Any GPU with 8GB+ VRAM
Platform Google Colab Pro Local / Cloud
Precision bf16 fp16 / bf16
Memory 40GB+ GPU RAM 8GB+ GPU RAM

Software

Package Version
Python 3.8+
Transformers 4.40+
PEFT 0.17.1
PyTorch 2.0+
Accelerate 0.27+
Pillow 9.0+

πŸ“š Citation

If you use this model, please cite:

BibTeX:

@misc{yousefi2024paligemma-hatespeech,
  author = {Yousefi, Amirhossein},
  title = {Multi-Modal Vision-Language Models for Hateful Content Classification},
  year = {2024},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm}},
  note = {PaliGemma 2 LoRA adapter for MMHS150K hate speech detection}
}

APA:

Yousefi, A. (2024). Multi-Modal Vision-Language Models for Hateful Content Classification. GitHub. https://github.com/amirhossein-yousefi/text_image_multi_modal_vlm

πŸ“– More Information

For more details on training, evaluation, and usage, see the GitHub repository.

Related Models

πŸ‘€ Model Card Authors

Amirhossein Yousefi

πŸ“§ Model Card Contact


Framework Versions

Framework Version
PEFT 0.17.1
Transformers 4.40+
PyTorch 2.0+
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Amirhossein75/paligemma2-3b-mmhs150k-lora

Adapter
(113)
this model

Evaluation results