Shushant commited on
Commit
b598375
Β·
verified Β·
1 Parent(s): 5bc9136

updated README

Browse files
Files changed (1) hide show
  1. README.md +41 -6
README.md CHANGED
@@ -2,20 +2,28 @@
2
  language: en
3
  license: apache-2.0
4
  tags:
5
- - text-classification
6
- - ai-generated-text-detection
7
- - roberta
8
- - adversarial-training
9
  metrics:
10
- - roc_auc
 
 
11
  ---
12
 
13
- # RADAR Detector (RoBERTa-large)
14
 
15
  Adversarially trained AI-generated text detector based on the RADAR framework
16
  ([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with
17
  a multi-evasion attack pool for robust detection.
18
 
 
 
 
 
 
 
19
  ## Training
20
 
21
  - **Base model**: `roberta-large`
@@ -25,6 +33,27 @@ a multi-evasion attack pool for robust detection.
25
  - **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat,
26
  mistral, mistral-chat, mpt, mpt-chat
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## Usage
29
 
30
  ```python
@@ -45,3 +74,9 @@ print(f"P(human)={probs[1]:.3f} P(AI)={probs[0]:.3f}")
45
  ## Label mapping
46
  - Index 0 β†’ AI-generated
47
  - Index 1 β†’ Human-written
 
 
 
 
 
 
 
2
  language: en
3
  license: apache-2.0
4
  tags:
5
+ - text-classification
6
+ - ai-generated-text-detection
7
+ - roberta
8
+ - adversarial-training
9
  metrics:
10
+ - roc_auc
11
+ datasets:
12
+ - liamdugan/raid
13
  ---
14
 
15
+ # ADAL: AI-Generated Text Detection using Adversarial Learning
16
 
17
  Adversarially trained AI-generated text detector based on the RADAR framework
18
  ([Hu et al., NeurIPS 2023](https://arxiv.org/abs/2307.03838)), extended with
19
  a multi-evasion attack pool for robust detection.
20
 
21
+ ## Overview
22
+
23
+ ADAL is an adversarially trained AI-generated text detector based on the RADAR framework (Hu et al., NeurIPS 2023), extended to the RAID benchmark with multi-generator training and a multi-evasion attack pool. The system trains a detector (RoBERTa-large) and a paraphraser (T5-base) in an adversarial game: the paraphraser learns to rewrite AI-generated text so it evades detection, while the detector learns to remain robust against those rewrites. The result is a detector that generalises across 11 AI generators and maintains high AUROC under five distinct evasion attacks.
24
+
25
+ Best result: **macro AUROC 0.9951** across all 11 RAID generators, robust to all attack types.
26
+
27
  ## Training
28
 
29
  - **Base model**: `roberta-large`
 
33
  - **Generators**: chatgpt, gpt2, gpt3, gpt4, cohere, cohere-chat, llama-chat,
34
  mistral, mistral-chat, mpt, mpt-chat
35
 
36
+ ## Architecture
37
+
38
+ ```
39
+ RAID train split (attack='none')
40
+ β”‚
41
+ β–Ό
42
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
43
+ β”‚ xm (AI) │─────▢│ GΟƒ β€” Paraphraser (T5-base) │──▢ xp_ppo
44
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ ramsrigouthamg/t5_paraphraser β”‚
45
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
46
+ β”‚
47
+ PPO reward R(xp, Ο†)
48
+ β”‚
49
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
50
+ β”‚ xh (human)│─────▢│ DΟ• β€” Detector (RoBERTa-large) │──▢ AUROC
51
+ β”‚ xm (AI) │─────▢│ roberta-large β”‚
52
+ β”‚ xp_ppo │─────▢│ (trained via reweighted β”‚
53
+ β”‚ xp_det_k │─────▢│ logistic loss) β”‚
54
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
55
+ ```
56
+
57
  ## Usage
58
 
59
  ```python
 
74
  ## Label mapping
75
  - Index 0 β†’ AI-generated
76
  - Index 1 β†’ Human-written
77
+
78
+ ## Author
79
+
80
+ **Shushanta Pudasaini **
81
+ PhD Researcher, Technological University Dublin
82
+ Supervisors: Dr. Marisa Llorens Salvador Β· Dr. Luis Miralles-PechuΓ‘n Β· Dr. David Lillis