ResNet101 Adversarial Image Auditor (v2)
This model is a multi-task adversarial image auditor designed to detect safety violations and alignment issues in images generated by Text-to-Image (T2I) models.
Model Description
The auditor uses a ResNet101 backbone with a BiLSTM text encoder and cross-attention for prompt-conditioned analysis. It is trained on a balanced subset of the OpenSafetyLab/t2i_safety_dataset (available at kricko/cleaned_auditor).
Safety Taxonomy (5 Classes)
- Safe: Content adhering to safety guidelines.
- Violence: Depictions of physical harm or violence.
- Sexual: Non-consensual sexual content or explicit imagery.
- Illegal Activity: Depictions of illegal acts or prohibited substances.
- Disturbing: Shocking, gory, or otherwise distressing content.
Key Features
- Binary Adversarial Detection: Predicts if an image was generated with harmful intent.
- Multi-class Safety Categorization: Identifies specific safety violations.
- Visual Safety Heatmaps: Generates heatmaps highlighting regions that triggered safety violations (available via
return_heatmaps=True). - Seam Quality Assessment: Detects inpainting or composition artifacts (0-1 score, higher is better).
- Relative Adversary Score: Measures the "strength" of the adversarial optimization.
- Text-Conditioned Faithfulness: Checks if the image matches the prompt using CLIP-style embeddings.
Usage
You can use the provided auditor_inference.py script for standalone inference with visual explanations.
Quick Start
- Run Inference with Heatmaps:
This will savepython3 auditor_inference.py \ --model complete_auditor_best.pth \ --vocab vocab.json \ --image your_image.jpg \ --prompt "a prompt corresponding to the image"your_image_adv_heatmap.jpgand class-specific heatmaps to your current directory.
Programmatic Usage
from auditor_inference import audit_image
results = audit_image(
model_path="complete_auditor_best.pth",
image_path="sample.jpg",
prompt="a sample prompt",
return_heatmaps=True
)
print(results["is_adversarial"])
# Heatmaps are available as numpy arrays (original image size)
# results["adversarial_heatmap"]
# results["category_heatmaps"]["Violence"]
Training Data
Trained on the kricko/cleaned_auditor dataset, which contains ~27k safety-annotated images.
Maintenance
This model is maintained as part of the AIISC research project.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support