Shushant
/

ADAL_AI_Detector

@@ -2,20 +2,28 @@
 language: en
 license: apache-2.0
 tags:
-  - text-classification
-  - ai-generated-text-detection
-  - roberta
-  - adversarial-training
 metrics:
-  - roc_auc
 ---
-# RADAR Detector (RoBERTa-large)
 Adversarially trained AI-generated text detector based on the RADAR framework
 ([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with
 a multi-evasion attack pool for robust detection.
 ## Training
 - **Base model**: `roberta-large`
@@ -25,6 +33,27 @@ a multi-evasion attack pool for robust detection.
 - **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat,
   mistral, mistral-chat, mpt, mpt-chat
 ## Usage
 ```python
@@ -45,3 +74,9 @@ print(f"P(human)={probs[1]:.3f}  P(AI)={probs[0]:.3f}")
 ## Label mapping
 - Index 0 → AI-generated
 - Index 1 → Human-written

 language: en
 license: apache-2.0
 tags:
+- text-classification
+- ai-generated-text-detection
+- roberta
+- adversarial-training
 metrics:
+- roc_auc
+datasets:
+- liamdugan/raid
 ---
+# ADAL: AI-Generated Text Detection using Adversarial Learning
 Adversarially trained AI-generated text detector based on the RADAR framework
 ([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with
 a multi-evasion attack pool for robust detection.
+## Overview
+ADAL is an adversarially trained AI-generated text detector based on the RADAR framework (Hu et al., NeurIPS 2023), extended to the RAID benchmark with multi-generator training and a multi-evasion attack pool. The system trains a detector (RoBERTa-large) and a paraphraser (T5-base) in an adversarial game: the paraphraser learns to rewrite AI-generated text so it evades detection, while the detector learns to remain robust against those rewrites. The result is a detector that generalises across 11 AI generators and maintains high AUROC under five distinct evasion attacks.
+Best result: **macro AUROC 0.9951** across all 11 RAID generators, robust to all attack types.
 ## Training
 - **Base model**: `roberta-large`
 - **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat,
   mistral, mistral-chat, mpt, mpt-chat
+## Architecture
+```
+RAID train split (attack='none')
+        │
+        ▼
+   ┌────────────┐      ┌─────────────────────────────────┐
+   │  xm (AI)   │─────▶│  Gσ — Paraphraser (T5-base)     │──▶ xp_ppo
+   └────────────┘      │  ramsrigouthamg/t5_paraphraser  │
+                       └─────────────────────────────────┘
+                                        │
+                              PPO reward R(xp, φ)
+                                        │
+   ┌────────────┐      ┌─────────────────────────────────┐
+   │  xh (human)│─────▶│  Dϕ — Detector (RoBERTa-large)  │──▶ AUROC
+   │  xm (AI)   │─────▶│  roberta-large                  │
+   │  xp_ppo    │─────▶│  (trained via reweighted        │
+   │  xp_det_k  │─────▶│   logistic loss)                │
+   └────────────┘      └─────────────────────────────────┘
+```
 ## Usage
 ```python
 ## Label mapping
 - Index 0 → AI-generated
 - Index 1 → Human-written
+## Author
+**Shushanta Pudasaini **
+PhD Researcher, Technological University Dublin
+Supervisors: Dr. Marisa Llorens Salvador · Dr. Luis Miralles-Pechuán · Dr. David Lillis