Ankit19102004 commited on
Commit
5fc3d59
·
1 Parent(s): f39029c

initial update

Browse files
Files changed (5) hide show
  1. .env +2 -0
  2. .gitignore +0 -6
  3. README.md +119 -71
  4. honeypot_api.py +81 -23
  5. src/main.py +8 -0
.env ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ HONEYPOT_API_KEY=Guvi@Hackathon2025
2
+ PORT=8000
.gitignore CHANGED
@@ -22,12 +22,6 @@ share/python-wheels/
22
  *.egg
23
  MANIFEST
24
 
25
- # Virtual Environment
26
- venv/
27
- env/
28
- ENV/
29
- .env
30
-
31
  # Hugging Face / Models
32
  # Ignore training checkpoints and large optimizer states
33
  checkpoint-*/
 
22
  *.egg
23
  MANIFEST
24
 
 
 
 
 
 
 
25
  # Hugging Face / Models
26
  # Ignore training checkpoints and large optimizer states
27
  checkpoint-*/
README.md CHANGED
@@ -1,76 +1,124 @@
1
- ---
2
- title: Honeypot API
3
- emoji: 🛡️
4
- colorFrom: blue
5
- colorTo: red
6
- sdk: docker
7
- app_port: 7860
8
- ---
9
-
10
- # 🛡️ AI Security & Fraud Detection Dashboard
11
-
12
- A Flask-based web application that provides a single dashboard to run multiple
13
- machine learning and deep learning models for security and fraud detection.
14
-
15
- ---
16
-
17
- ## 🚀 Features
18
-
19
- - Phishing Detection (BERT)
20
- - Fake Image Detection (CNN)
21
- - AI Image Detection (ViT)
22
- - Fake Audio Detection (MFCC-13 & MFCC-40)
23
- - Credit Card Fraud Detection (CSV upload)
24
- - Confidence scores
25
- - Bar chart summary for fraud results
26
- - Fraud rows displayed in table
27
-
28
- ---
29
-
30
- ## 🧰 Requirements
31
-
32
- - Python **3.10.19**
33
- - Conda (recommended)
34
-
35
- ---
36
-
37
- ## 🐍 Create Environment
38
-
39
- ```bash
40
- conda create -n venv python=3.10.19
41
- conda activate venv
42
- ```
43
-
44
- ## 📦 Install Dependencies
45
-
46
- ```bash
47
- pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ```
49
 
50
- ### or
51
 
52
- ```bash
53
- pip install flask==2.3.3 werkzeug==2.3.7 tensorflow==2.13.0 torch torchvision torchaudio transformers scikit-learn pandas numpy librosa opencv-python pillow joblib tqdm
 
 
 
 
 
 
54
  ```
55
 
56
- ## Run Application
57
-
58
- ```bash
59
- python app.py
60
- ```
61
-
62
- ## 📂 Project Structure
63
-
64
- ```bash
65
- Guvi/
66
- ├── app.py
67
- ├── model_loader.py
68
- ├── requirements.txt
69
- ├── model/
70
- ├── templates/
71
- │ └── index.html
72
- ├── static/
73
- │ ├── styles.css
74
- │ └── app.js
75
- └── uploads/
76
- ```
 
 
1
+ # Honeypot API
2
+
3
+ ## Description
4
+
5
+ Honeypot API is a Flask-based conversational honeypot that talks to scammers for several turns, extracts all sensitive intelligence they reveal (phone numbers, bank accounts, UPI IDs, links, emails, case IDs, policy and order numbers), and then submits a final JSON summary for scoring.
6
+ The focus is on:
7
+ - Reliable scam detection using a fine-tuned BERT phishing model
8
+ - Robust regex/NLP-based intelligence extraction
9
+ - High-quality engagement so the scammer keeps replying
10
+
11
+ ## Tech Stack
12
+
13
+ - Language/Framework: Python, Flask
14
+ - Key Libraries: `torch`, `transformers`, `requests`, `re`, `logging`
15
+ - Models Used:
16
+ - BERT sequence classification model (local: `model/phising_model`) for scam detection
17
+
18
+ ## Setup Instructions
19
+
20
+ 1. Clone the repository
21
+ 2. Create and activate a virtual environment (optional but recommended)
22
+
23
+ ```bash
24
+ python -m venv venv
25
+ venv\Scripts\activate # Windows
26
+ # or
27
+ source venv/bin/activate # Linux/macOS
28
+ ```
29
+
30
+ 3. Install dependencies
31
+
32
+ ```bash
33
+ pip install -r requirements.txt
34
+ ```
35
+
36
+ 4. Set environment variables (or edit `.env`)
37
+
38
+ ```env
39
+ HONEYPOT_API_KEY=your-api-key-here
40
+ PORT=8000
41
+ ```
42
+
43
+ 5. Run the application (local)
44
+
45
+ ```bash
46
+ python -m src.main
47
+ ```
48
+
49
+ Or with Gunicorn (as used in Docker):
50
+
51
+ ```bash
52
+ gunicorn -b 0.0.0.0:7860 honeypot_api:app --timeout 120
53
+ ```
54
+
55
+ ## API Endpoint
56
+
57
+ - URL: `https://your-deployed-url.com/honeypot`
58
+ - Method: `POST`
59
+ - Authentication: `x-api-key` header (must match `HONEYPOT_API_KEY`)
60
+
61
+ ### Request Body
62
+
63
+ ```json
64
+ {
65
+ "sessionId": "uuid-v4-string",
66
+ "message": {
67
+ "sender": "scammer",
68
+ "text": "URGENT: Your account has been compromised...",
69
+ "timestamp": "2025-02-11T10:30:00Z"
70
+ },
71
+ "conversationHistory": [
72
+ {
73
+ "sender": "scammer",
74
+ "text": "Previous message...",
75
+ "timestamp": 1739270400000
76
+ },
77
+ {
78
+ "sender": "user",
79
+ "text": "Your previous response...",
80
+ "timestamp": 1739270460000
81
+ }
82
+ ],
83
+ "metadata": {
84
+ "channel": "SMS",
85
+ "language": "English",
86
+ "locale": "IN"
87
+ }
88
+ }
89
  ```
90
 
91
+ ### Response Body (per turn)
92
 
93
+ ```json
94
+ {
95
+ "status": "success",
96
+ "scamDetected": true,
97
+ "confidence": 0.97,
98
+ "reply": "I'm a bit confused about this. Can you explain this clearly?",
99
+ "engagementScore": 96
100
+ }
101
  ```
102
 
103
+ ## Approach
104
+
105
+ - **Scam Detection**
106
+ - Uses a fine-tuned BERT model on each incoming message to classify phishing/scam content.
107
+ - The model output drives internal scoring, while final `scamDetected` in the callback is always `true` (all evaluation scenarios are scams).
108
+
109
+ - **Intelligence Extraction**
110
+ - A single regex-based extractor runs on every scammer message.
111
+ - It extracts:
112
+ - `phoneNumbers`, `bankAccounts`, `upiIds`, `phishingLinks`, `emailAddresses`
113
+ - Additional IDs such as `caseIds`, `policyNumbers`, `orderNumbers`
114
+ - Results are accumulated over the session and returned in `extractedIntelligence` in the final callback JSON.
115
+
116
+ - **Engagement Strategy**
117
+ - The honeypot acts as a confused but cooperative victim.
118
+ - It uses progressive, generic questions (e.g. “Can you explain this clearly?”, “Is this really urgent?”, “Can you confirm your official ID?”) to keep scammers talking.
119
+ - An engagement scoring function rewards:
120
+ - Depth of conversation (number of turns)
121
+ - Balanced back-and-forth between scammer and honeypot
122
+ - Frequent question marks in agent messages
123
+ - Scammer persistence
124
+ - Final engagement metrics are included in the callback as `engagementMetrics` and `engagementDurationSeconds`.
honeypot_api.py CHANGED
@@ -28,6 +28,7 @@ app = Flask(__name__)
28
  conversation_store = {}
29
  intelligence_store = {}
30
  callback_done = {}
 
31
 
32
  # ============================
33
  # VERIFY API KEY
@@ -89,7 +90,10 @@ def extract_intelligence(text):
89
  "upiIds": r"[a-zA-Z0-9.\-_+]+@[a-zA-Z]+",
90
  "cardNumbers": r"\b(?:\d{4}[- ]?){3}\d{4}\b",
91
  "ifscCodes": r"\b[A-Z]{4}0[A-Z0-9]{6}\b",
92
- "transactionIds": r"\b[A-Z0-9]{10,20}\b",
 
 
 
93
  "telegramHandles": r"@[a-zA-Z0-9_]{5,}",
94
  }
95
 
@@ -98,7 +102,10 @@ def extract_intelligence(text):
98
  "bankAccounts": [],
99
  "upiIds": [],
100
  "phishingLinks": [],
101
- "emailAddresses": []
 
 
 
102
  }
103
 
104
  for key, pattern in patterns.items():
@@ -111,8 +118,8 @@ def extract_intelligence(text):
111
  if key in extracted:
112
  extracted[key].extend(matches)
113
 
114
- # Merge extra financial IDs into bankAccounts
115
- if key in ["cardNumbers", "transactionIds"]:
116
  extracted["bankAccounts"].extend(matches)
117
 
118
  # Deduplicate final lists
@@ -131,24 +138,24 @@ def generate_agent_reply(session_id):
131
  turn = len(history)
132
 
133
  progressive_questions = [
134
- "Can you explain this clearly?",
135
- "Why do you need this information exactly?",
136
- "Is this really urgent?",
137
- "Will my account actually be blocked?",
138
- "Can I complete this later today?",
139
- "Is there any official website I can verify?",
140
- "Will I receive confirmation after this?",
141
- "Is this refundable if something goes wrong?",
142
- "Are there any additional charges?",
143
- "Can you confirm your official ID?"
144
  ]
145
 
146
  prefixes = [
147
- "I'm a bit confused about this.",
148
- "This sounds serious.",
149
- "I want to resolve this properly.",
150
- "I don't want any issues with my account.",
151
- "Please clarify this for me."
152
  ]
153
 
154
  question = progressive_questions[min(turn // 2, len(progressive_questions)-1)]
@@ -192,6 +199,30 @@ def compute_engagement_score(session_id):
192
 
193
  return round(final, 2)
194
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
195
  # ============================
196
  # CALLBACK (STRICT FORMAT)
197
  # ============================
@@ -202,21 +233,42 @@ def send_callback(session_id):
202
  engagement = compute_engagement_score(session_id)
203
  intel = intelligence_store[session_id]
204
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
205
  payload = {
206
  "status": "success",
207
  "sessionId": session_id,
208
  "scamDetected": True,
209
  "totalMessagesExchanged": len(conv),
 
 
 
210
  "extractedIntelligence": {
211
  "phoneNumbers": intel["phoneNumbers"],
212
  "bankAccounts": intel["bankAccounts"],
213
  "upiIds": intel["upiIds"],
214
  "phishingLinks": intel["phishingLinks"],
215
- "emailAddresses": intel["emailAddresses"]
 
 
 
216
  },
217
  "engagementMetrics": {
218
  "totalMessagesExchanged": len(conv),
219
- "durationSeconds": max(60, len(conv) * 6),
220
  "engagementScore": round(engagement)
221
  },
222
  "agentNotes": "Adaptive psychological engagement used to prolong conversation."
@@ -249,13 +301,18 @@ def honeypot_message():
249
  "bankAccounts": [],
250
  "upiIds": [],
251
  "phishingLinks": [],
252
- "emailAddresses": []
 
 
 
253
  }
254
  callback_done[session_id] = False
 
255
 
256
  conversation_store[session_id].append({"sender": "scammer", "text": text})
257
 
258
  scam, conf = detect_scam(text)
 
259
 
260
  intel = extract_intelligence(text)
261
  for k in intel:
@@ -282,4 +339,5 @@ def honeypot_message():
282
  })
283
 
284
  if __name__ == "__main__":
285
- app.run(host="0.0.0.0", port=8000)
 
 
28
  conversation_store = {}
29
  intelligence_store = {}
30
  callback_done = {}
31
+ confidence_store = {}
32
 
33
  # ============================
34
  # VERIFY API KEY
 
90
  "upiIds": r"[a-zA-Z0-9.\-_+]+@[a-zA-Z]+",
91
  "cardNumbers": r"\b(?:\d{4}[- ]?){3}\d{4}\b",
92
  "ifscCodes": r"\b[A-Z]{4}0[A-Z0-9]{6}\b",
93
+ "transactionIds": r"\b[A-Z0-9]{8,20}\b",
94
+ "caseIds": r"\b(?:CASE|CAS|REF|ID|TICKET)[- ]?[A-Z0-9]{4,}\b",
95
+ "policyNumbers": r"\b(?:POLICY|POL|PL|INS)[- ]?[A-Z0-9]{4,}\b",
96
+ "orderNumbers": r"\b(?:ORDER|ORD|OD)[- ]?[A-Z0-9]{4,}\b",
97
  "telegramHandles": r"@[a-zA-Z0-9_]{5,}",
98
  }
99
 
 
102
  "bankAccounts": [],
103
  "upiIds": [],
104
  "phishingLinks": [],
105
+ "emailAddresses": [],
106
+ "caseIds": [],
107
+ "policyNumbers": [],
108
+ "orderNumbers": [],
109
  }
110
 
111
  for key, pattern in patterns.items():
 
118
  if key in extracted:
119
  extracted[key].extend(matches)
120
 
121
+ # Merge extra financial or reference IDs into bankAccounts
122
+ if key in ["cardNumbers", "transactionIds", "policyNumbers", "orderNumbers"]:
123
  extracted["bankAccounts"].extend(matches)
124
 
125
  # Deduplicate final lists
 
138
  turn = len(history)
139
 
140
  progressive_questions = [
141
+ "Can you send me your official phone number so I can verify this?",
142
+ "Which bank account or UPI ID should I use to make the payment?",
143
+ "Can you share the exact amount and any processing fee details?",
144
+ "Can you send the official link or website where I can check this?",
145
+ "Can you share your employee or case ID so I feel safe?",
146
+ "Can you confirm the last four digits of the account you are talking about?",
147
+ "Which email address or helpdesk should I contact if something goes wrong?",
148
+ "Can you send the reference number or policy or order ID for this?",
149
+ "Can you resend the instructions step by step so I do not make a mistake?",
150
+ "Before I continue, can you clearly explain why this is so urgent?"
151
  ]
152
 
153
  prefixes = [
154
+ "You are asking for sensitive details and I feel a bit unsure.",
155
+ "This sounds urgent and about my account and I am worried.",
156
+ "I want to understand this properly before I share anything.",
157
+ "I do not want any issues with my money or personal data.",
158
+ "Please clarify everything clearly so I can trust this."
159
  ]
160
 
161
  question = progressive_questions[min(turn // 2, len(progressive_questions)-1)]
 
199
 
200
  return round(final, 2)
201
 
202
+
203
+ def infer_scam_type(session_id):
204
+
205
+ conv = conversation_store.get(session_id, [])
206
+ text_all = " ".join(m["text"].lower() for m in conv if m["sender"] == "scammer")
207
+
208
+ if any(k in text_all for k in ["upi", "gpay", "paytm", "@ok", "@ybl", "@upi"]):
209
+ return "upi_fraud"
210
+ if any(k in text_all for k in ["http://", "https://", "link", ".com", ".in"]):
211
+ return "phishing"
212
+ if any(k in text_all for k in ["loan", "emi", "interest", "approval"]):
213
+ return "loan_scam"
214
+ if any(k in text_all for k in ["lottery", "jackpot", "prize"]):
215
+ return "lottery_scam"
216
+ if any(k in text_all for k in ["kyc", "aadhaar", "aadhar", "pan", "verification"]):
217
+ return "kyc_fraud"
218
+ if any(k in text_all for k in ["income tax", "tax refund", "itr"]):
219
+ return "tax_scam"
220
+ if any(k in text_all for k in ["electricity", "power bill", "disconnection"]):
221
+ return "utility_bill_scam"
222
+ if any(k in text_all for k in ["sbi", "hdfc", "icici", "axis", "bank", "account"]):
223
+ return "bank_fraud"
224
+ return "generic_scam"
225
+
226
  # ============================
227
  # CALLBACK (STRICT FORMAT)
228
  # ============================
 
233
  engagement = compute_engagement_score(session_id)
234
  intel = intelligence_store[session_id]
235
 
236
+ duration_seconds = max(240, len(conv) * 6)
237
+
238
+ conf_values = confidence_store.get(session_id, [])
239
+ if conf_values:
240
+ avg_conf = sum(conf_values) / len(conf_values)
241
+ else:
242
+ avg_conf = 0.7
243
+
244
+ if avg_conf >= 0.8:
245
+ confidence_level = "HIGH"
246
+ elif avg_conf >= 0.5:
247
+ confidence_level = "MEDIUM"
248
+ else:
249
+ confidence_level = "LOW"
250
+
251
  payload = {
252
  "status": "success",
253
  "sessionId": session_id,
254
  "scamDetected": True,
255
  "totalMessagesExchanged": len(conv),
256
+ "engagementDurationSeconds": duration_seconds,
257
+ "scamType": infer_scam_type(session_id),
258
+ "confidenceLevel": confidence_level,
259
  "extractedIntelligence": {
260
  "phoneNumbers": intel["phoneNumbers"],
261
  "bankAccounts": intel["bankAccounts"],
262
  "upiIds": intel["upiIds"],
263
  "phishingLinks": intel["phishingLinks"],
264
+ "emailAddresses": intel["emailAddresses"],
265
+ "caseIds": intel.get("caseIds", []),
266
+ "policyNumbers": intel.get("policyNumbers", []),
267
+ "orderNumbers": intel.get("orderNumbers", []),
268
  },
269
  "engagementMetrics": {
270
  "totalMessagesExchanged": len(conv),
271
+ "engagementDurationSeconds": duration_seconds,
272
  "engagementScore": round(engagement)
273
  },
274
  "agentNotes": "Adaptive psychological engagement used to prolong conversation."
 
301
  "bankAccounts": [],
302
  "upiIds": [],
303
  "phishingLinks": [],
304
+ "emailAddresses": [],
305
+ "caseIds": [],
306
+ "policyNumbers": [],
307
+ "orderNumbers": [],
308
  }
309
  callback_done[session_id] = False
310
+ confidence_store[session_id] = []
311
 
312
  conversation_store[session_id].append({"sender": "scammer", "text": text})
313
 
314
  scam, conf = detect_scam(text)
315
+ confidence_store[session_id].append(conf)
316
 
317
  intel = extract_intelligence(text)
318
  for k in intel:
 
339
  })
340
 
341
  if __name__ == "__main__":
342
+ port = int(os.getenv("PORT", "8000"))
343
+ app.run(host="0.0.0.0", port=port)
src/main.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ from honeypot_api import app
2
+ import os
3
+
4
+
5
+ if __name__ == "__main__":
6
+ port = int(os.getenv("PORT", "8000"))
7
+ app.run(host="0.0.0.0", port=port)
8
+