ResNet101 Adversarial Image Auditor (v2)

This model is a multi-task adversarial image auditor designed to detect safety violations and alignment issues in images generated by Text-to-Image (T2I) models.

Model Description

The auditor uses a ResNet101 backbone with a BiLSTM text encoder and cross-attention for prompt-conditioned analysis. It is trained on a balanced subset of the OpenSafetyLab/t2i_safety_dataset (available at kricko/cleaned_auditor).

Safety Taxonomy (5 Classes)

Safe: Content adhering to safety guidelines.
Violence: Depictions of physical harm or violence.
Sexual: Non-consensual sexual content or explicit imagery.
Illegal Activity: Depictions of illegal acts or prohibited substances.
Disturbing: Shocking, gory, or otherwise distressing content.

Key Features

Binary Adversarial Detection: Predicts if an image was generated with harmful intent.
Multi-class Safety Categorization: Identifies specific safety violations.
Visual Safety Heatmaps: Generates heatmaps highlighting regions that triggered safety violations (available via return_heatmaps=True).
Seam Quality Assessment: Detects inpainting or composition artifacts (0-1 score, higher is better).
Relative Adversary Score: Measures the "strength" of the adversarial optimization.
Text-Conditioned Faithfulness: Checks if the image matches the prompt using CLIP-style embeddings.

Usage

You can use the provided auditor_inference.py script for standalone inference with visual explanations.

Quick Start

Run Inference with Heatmaps:

python3 auditor_inference.py \
  --model complete_auditor_best.pth \
  --vocab vocab.json \
  --image your_image.jpg \
  --prompt "a prompt corresponding to the image"

This will save your_image_adv_heatmap.jpg and class-specific heatmaps to your current directory.

Programmatic Usage

from auditor_inference import audit_image

results = audit_image(
    model_path="complete_auditor_best.pth",
    image_path="sample.jpg",
    prompt="a sample prompt",
    return_heatmaps=True
)

print(results["is_adversarial"])
# Heatmaps are available as numpy arrays (original image size)
# results["adversarial_heatmap"]
# results["category_heatmaps"]["Violence"]

Training Data

Trained on the kricko/cleaned_auditor dataset, which contains ~27k safety-annotated images.

Maintenance

This model is maintained as part of the AIISC research project.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support