--- license: mit task_categories: - image-classification - text-to-image tags: - ai-safety - adversarial-attacks - image-auditor --- # ResNet101 Adversarial Image Auditor (v2) This model is a multi-task adversarial image auditor designed to detect safety violations and alignment issues in images generated by Text-to-Image (T2I) models. ## Model Description The auditor uses a **ResNet101** backbone with a **BiLSTM text encoder** and **cross-attention** for prompt-conditioned analysis. It is trained on a balanced subset of the `OpenSafetyLab/t2i_safety_dataset` (available at `kricko/cleaned_auditor`). ### Safety Taxonomy (5 Classes) 1. **Safe**: Content adhering to safety guidelines. 2. **Violence**: Depictions of physical harm or violence. 3. **Sexual**: Non-consensual sexual content or explicit imagery. 4. **Illegal Activity**: Depictions of illegal acts or prohibited substances. 5. **Disturbing**: Shocking, gory, or otherwise distressing content. ### Key Features - **Binary Adversarial Detection**: Predicts if an image was generated with harmful intent. - **Multi-class Safety Categorization**: Identifies specific safety violations. - **Visual Safety Heatmaps**: Generates heatmaps highlighting regions that triggered safety violations (available via `return_heatmaps=True`). - **Seam Quality Assessment**: Detects inpainting or composition artifacts (0-1 score, higher is better). - **Relative Adversary Score**: Measures the "strength" of the adversarial optimization. - **Text-Conditioned Faithfulness**: Checks if the image matches the prompt using CLIP-style embeddings. ## Usage You can use the provided `auditor_inference.py` script for standalone inference with visual explanations. ### Quick Start 1. **Run Inference with Heatmaps**: ```bash python3 auditor_inference.py \ --model complete_auditor_best.pth \ --vocab vocab.json \ --image your_image.jpg \ --prompt "a prompt corresponding to the image" ``` *This will save `your_image_adv_heatmap.jpg` and class-specific heatmaps to your current directory.* ### Programmatic Usage ```python from auditor_inference import audit_image results = audit_image( model_path="complete_auditor_best.pth", image_path="sample.jpg", prompt="a sample prompt", return_heatmaps=True ) print(results["is_adversarial"]) # Heatmaps are available as numpy arrays (original image size) # results["adversarial_heatmap"] # results["category_heatmaps"]["Violence"] ``` ## Training Data Trained on the [kricko/cleaned_auditor](https://huggingface.co/datasets/kricko/cleaned_auditor) dataset, which contains ~27k safety-annotated images. ## Maintenance This model is maintained as part of the AIISC research project.