---
license: mit
task_categories:
- image-classification
- text-to-image
tags:
- ai-safety
- adversarial-attacks
- image-auditor
---

# ResNet101 Adversarial Image Auditor (v2)

This model is a multi-task adversarial image auditor designed to detect safety violations and alignment issues in images generated by Text-to-Image (T2I) models.

## Model Description

The auditor uses a **ResNet101** backbone with a **BiLSTM text encoder** and **cross-attention** for prompt-conditioned analysis. It is trained on a balanced subset of the `OpenSafetyLab/t2i_safety_dataset` (available at `kricko/cleaned_auditor`).

### Safety Taxonomy (5 Classes)
1. **Safe**: Content adhering to safety guidelines.
2. **Violence**: Depictions of physical harm or violence.
3. **Sexual**: Non-consensual sexual content or explicit imagery.
4. **Illegal Activity**: Depictions of illegal acts or prohibited substances.
5. **Disturbing**: Shocking, gory, or otherwise distressing content.

### Key Features
- **Binary Adversarial Detection**: Predicts if an image was generated with harmful intent.
- **Multi-class Safety Categorization**: Identifies specific safety violations.
- **Visual Safety Heatmaps**: Generates heatmaps highlighting regions that triggered safety violations (available via `return_heatmaps=True`).
- **Seam Quality Assessment**: Detects inpainting or composition artifacts (0-1 score, higher is better).
- **Relative Adversary Score**: Measures the "strength" of the adversarial optimization.
- **Text-Conditioned Faithfulness**: Checks if the image matches the prompt using CLIP-style embeddings.

## Usage

You can use the provided `auditor_inference.py` script for standalone inference with visual explanations.

### Quick Start

1. **Run Inference with Heatmaps**:
   ```bash
   python3 auditor_inference.py \
     --model complete_auditor_best.pth \
     --vocab vocab.json \
     --image your_image.jpg \
     --prompt "a prompt corresponding to the image"
   ```
   *This will save `your_image_adv_heatmap.jpg` and class-specific heatmaps to your current directory.*

### Programmatic Usage
```python
from auditor_inference import audit_image

results = audit_image(
    model_path="complete_auditor_best.pth",
    image_path="sample.jpg",
    prompt="a sample prompt",
    return_heatmaps=True
)

print(results["is_adversarial"])
# Heatmaps are available as numpy arrays (original image size)
# results["adversarial_heatmap"]
# results["category_heatmaps"]["Violence"]
```

## Training Data

Trained on the [kricko/cleaned_auditor](https://huggingface.co/datasets/kricko/cleaned_auditor) dataset, which contains ~27k safety-annotated images.

## Maintenance

This model is maintained as part of the AIISC research project.