🛡️ AOT: Dynamic Adversarial Reinforcement Learning for Robust Multimodal Large Language Models

Authors: Yicheng Bao, Xuhong Wang, Qiaosheng Zhang, Chaochao Lu, Xia Hu, Xin Tan.
Affiliations: East China Normal University, Shanghai AI Laboratory.

📄 Paper: arXiv
💻 Code: github

📖 Overview

This repository hosts the Defender model weights from the paper "To Deceive is to Teach? Forging Perceptual Robustness via Adversarial Reinforcement Learning".

AOT (Adversarial Opponent Training) is a co-evolutionary framework designed to address the perceptual fragility of Multimodal Large Language Models (MLLMs). Current MLLMs often fail when confronted with visually complex scenes or semantic distractors.

Instead of relying on finite, manually annotated datasets, AOT formulates the training as a dynamic, adversarial game between:

An Attacker: An image-editing model that autonomously generates semantic adversarial examples (e.g., inserting plausible distractors) to deceive the defender.
A Defender: The MLLM, which improves its perceptual robustness by training on the curriculum generated by the attacker.

This model is the Defender (Iter. 3), based on Qwen2.5-VL-7B-Instruct, which has been hardened through iterative adversarial self-play.

🧠 The AOT Framework

The AOT framework orchestrates a co-evolution where both models improve in tandem:

Attacker Evolution: The attacker (Qwen-Image-Edit) is optimized via Flow-GRPO to discover diverse attack strategies (e.g., object addition, removal, replacement) that specifically target the defender's weaknesses while maintaining semantic integrity.
Defender Enhancement: The defender is fine-tuned via DAPO (Direct Preference Optimization) on the challenging adversarial data generated by the attacker.

🏆 Key Results

Extensive experiments demonstrate that AOT significantly enhances robustness without compromising general capabilities:

Perceptual Robustness: Achieves state-of-the-art performance on fine-grained perception benchmarks like VStar and HRBench (4K & 8K).
Reduced Hallucination: Outperforms baselines on POPE and HallusionBench, indicating that better perception leads to more factual responses.
General Capabilities: Maintains or improves performance on general multimodal benchmarks such as MMMU and RealWorldQA.

Downloads last month: 25

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for chisato111/AOT-Qwen2.5-VL-7B-Instruct

Base model

Qwen/Qwen2.5-VL-7B-Instruct

Finetuned

(1041)

this model

Quantizations

2 models