IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection
Paper β’ 2506.00979 β’ Published β’ 12
2026.4 Our paper has been accepted by ICMR 2026.2026.2 We release our models πIvy-xDetector for AI-generated image and video detectionπ₯π₯π₯!2025.12 The Ivy-Fake Dataset is released.2025.5 We release the ArxivThe following snippet demonstrates how to perform inference using our model with the transformers library.
import torch
from transformers import Qwen2_5_VLForConditionalGeneration, AutoProcessor
from qwen_vl_utils import process_vision_info
# Initialize Model and Processor
model_id = "AI-Safeguard/Ivy-Fake"
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
attn_implementation="flash_attention_2",
device_map="auto",
)
processor = AutoProcessor.from_pretrained(model_id)
# Define the Detection Prompt
messages = [
{
"role": "system",
"content": "You are an AI-generated content detector. Classify the media as real or fake. Provide reasoning inside <think>...</think> tags. End with exactly one wordβreal or fakeβwrapped in <conclusion>...</conclusion>."
},
{
"role": "user",
"content": [
{
"type": "image",
"image": "https://path-to-your-image.jpg", # Replace with your media path
},
{"type": "text", "text": "Is this image real or fake?"},
],
}
]
# Preparation for Inference
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
text=[text],
images=image_inputs,
videos=video_inputs,
padding=True,
return_tensors="pt",
).to("cuda")
# Generation
generated_ids = model.generate(**inputs, max_new_tokens=2048, do_sample=False)
generated_ids_trimmed = [
out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text[0])
If you find Ivy-Fake or IVY-XDETECTOR useful in your research, please cite:
@article{jiang2025ivy,
title={Ivy-fake: A unified explainable framework and benchmark for image and video aigc detection},
author={Jiang, Changjiang and Dong, Wenhui and Zhang, Zhonghao and Si, Chenyang and Yu, Fengchang and Peng, Wei and Yuan, Xinbin and Bi, Yifei and Zhao, Ming and Zhou, Zian and others},
journal={arXiv preprint arXiv:2506.00979},
year={2025}
}
Base model
Qwen/Qwen2.5-VL-3B-Instruct