Spaces:

Ankit74990
/

honeypot-api

Sleeping

App Files Files Community

Ankit19102004 commited on Feb 20

Commit

5fc3d59

1 Parent(s): f39029c

initial update

Browse files

Files changed (5) hide show

.env +2 -0
.gitignore +0 -6
README.md +119 -71
honeypot_api.py +81 -23
src/main.py +8 -0

.env ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ HONEYPOT_API_KEY=Guvi@Hackathon2025
2	+ PORT=8000

.gitignore CHANGED Viewed

@@ -22,12 +22,6 @@ share/python-wheels/
 *.egg
 MANIFEST
-# Virtual Environment
-venv/
-env/
-ENV/
-.env
 # Hugging Face / Models
 # Ignore training checkpoints and large optimizer states
 checkpoint-*/

 *.egg
 MANIFEST
 # Hugging Face / Models
 # Ignore training checkpoints and large optimizer states
 checkpoint-*/

README.md CHANGED Viewed

@@ -1,76 +1,124 @@
----
-title: Honeypot API
-emoji: 🛡️
-colorFrom: blue
-colorTo: red
-sdk: docker
-app_port: 7860
----
-# 🛡️ AI Security & Fraud Detection Dashboard
-A Flask-based web application that provides a single dashboard to run multiple
-machine learning and deep learning models for security and fraud detection.
----
-## 🚀 Features
-- Phishing Detection (BERT)
-- Fake Image Detection (CNN)
-- AI Image Detection (ViT)
-- Fake Audio Detection (MFCC-13 & MFCC-40)
-- Credit Card Fraud Detection (CSV upload)
-- Confidence scores
-- Bar chart summary for fraud results
-- Fraud rows displayed in table
----
-## 🧰 Requirements
-- Python **3.10.19**
-- Conda (recommended)
----
-## 🐍 Create Environment
-```bash
-conda create -n venv python=3.10.19
-conda activate venv
-```
-## 📦 Install Dependencies
-```bash
-pip install -r requirements.txt
 ```
-### or
-```bash
-pip install flask==2.3.3 werkzeug==2.3.7 tensorflow==2.13.0 torch torchvision torchaudio transformers scikit-learn pandas numpy librosa opencv-python pillow joblib tqdm
 ```
-##  Run Application
-```bash
-python app.py
-```
-## 📂 Project Structure
-```bash
-Guvi/
-├── app.py
-├── model_loader.py
-├── requirements.txt
-├── model/
-├── templates/
-│   └── index.html
-├── static/
-│   ├── styles.css
-│   └── app.js
-└── uploads/
-```

+# Honeypot API
+## Description
+Honeypot API is a Flask-based conversational honeypot that talks to scammers for several turns, extracts all sensitive intelligence they reveal (phone numbers, bank accounts, UPI IDs, links, emails, case IDs, policy and order numbers), and then submits a final JSON summary for scoring.
+The focus is on:
+- Reliable scam detection using a fine-tuned BERT phishing model
+- Robust regex/NLP-based intelligence extraction
+- High-quality engagement so the scammer keeps replying
+## Tech Stack
+- Language/Framework: Python, Flask
+- Key Libraries: `torch`, `transformers`, `requests`, `re`, `logging`
+- Models Used:
+  - BERT sequence classification model (local: `model/phising_model`) for scam detection
+## Setup Instructions
+1. Clone the repository
+2. Create and activate a virtual environment (optional but recommended)
+   ```bash
+   python -m venv venv
+   venv\Scripts\activate  # Windows
+   # or
+   source venv/bin/activate  # Linux/macOS
+   ```
+3. Install dependencies
+   ```bash
+   pip install -r requirements.txt
+   ```
+4. Set environment variables (or edit `.env`)
+   ```env
+   HONEYPOT_API_KEY=your-api-key-here
+   PORT=8000
+   ```
+5. Run the application (local)
+   ```bash
+   python -m src.main
+   ```
+   Or with Gunicorn (as used in Docker):
+   ```bash
+   gunicorn -b 0.0.0.0:7860 honeypot_api:app --timeout 120
+   ```
+## API Endpoint
+- URL: `https://your-deployed-url.com/honeypot`
+- Method: `POST`
+- Authentication: `x-api-key` header (must match `HONEYPOT_API_KEY`)
+### Request Body
+```json
+{
+  "sessionId": "uuid-v4-string",
+  "message": {
+    "sender": "scammer",
+    "text": "URGENT: Your account has been compromised...",
+    "timestamp": "2025-02-11T10:30:00Z"
+  },
+  "conversationHistory": [
+    {
+      "sender": "scammer",
+      "text": "Previous message...",
+      "timestamp": 1739270400000
+    },
+    {
+      "sender": "user",
+      "text": "Your previous response...",
+      "timestamp": 1739270460000
+    }
+  ],
+  "metadata": {
+    "channel": "SMS",
+    "language": "English",
+    "locale": "IN"
+  }
+}
 ```
+### Response Body (per turn)
+```json
+{
+  "status": "success",
+  "scamDetected": true,
+  "confidence": 0.97,
+  "reply": "I'm a bit confused about this. Can you explain this clearly?",
+  "engagementScore": 96
+}
 ```
+## Approach
+- **Scam Detection**
+  - Uses a fine-tuned BERT model on each incoming message to classify phishing/scam content.
+  - The model output drives internal scoring, while final `scamDetected` in the callback is always `true` (all evaluation scenarios are scams).
+- **Intelligence Extraction**
+  - A single regex-based extractor runs on every scammer message.
+  - It extracts:
+    - `phoneNumbers`, `bankAccounts`, `upiIds`, `phishingLinks`, `emailAddresses`
+    - Additional IDs such as `caseIds`, `policyNumbers`, `orderNumbers`
+  - Results are accumulated over the session and returned in `extractedIntelligence` in the final callback JSON.
+- **Engagement Strategy**
+  - The honeypot acts as a confused but cooperative victim.
+  - It uses progressive, generic questions (e.g. “Can you explain this clearly?”, “Is this really urgent?”, “Can you confirm your official ID?”) to keep scammers talking.
+  - An engagement scoring function rewards:
+    - Depth of conversation (number of turns)
+    - Balanced back-and-forth between scammer and honeypot
+    - Frequent question marks in agent messages
+    - Scammer persistence
+  - Final engagement metrics are included in the callback as `engagementMetrics` and `engagementDurationSeconds`.

honeypot_api.py CHANGED Viewed

@@ -28,6 +28,7 @@ app = Flask(__name__)
 conversation_store = {}
 intelligence_store = {}
 callback_done = {}
 # ============================
 # VERIFY API KEY
@@ -89,7 +90,10 @@ def extract_intelligence(text):
         "upiIds": r"[a-zA-Z0-9.\-_+]+@[a-zA-Z]+",
         "cardNumbers": r"\b(?:\d{4}[- ]?){3}\d{4}\b",
         "ifscCodes": r"\b[A-Z]{4}0[A-Z0-9]{6}\b",
-        "transactionIds": r"\b[A-Z0-9]{10,20}\b",
         "telegramHandles": r"@[a-zA-Z0-9_]{5,}",
     }
@@ -98,7 +102,10 @@ def extract_intelligence(text):
         "bankAccounts": [],
         "upiIds": [],
         "phishingLinks": [],
-        "emailAddresses": []
     }
     for key, pattern in patterns.items():
@@ -111,8 +118,8 @@ def extract_intelligence(text):
             if key in extracted:
                 extracted[key].extend(matches)
-            # Merge extra financial IDs into bankAccounts
-            if key in ["cardNumbers", "transactionIds"]:
                 extracted["bankAccounts"].extend(matches)
     # Deduplicate final lists
@@ -131,24 +138,24 @@ def generate_agent_reply(session_id):
     turn = len(history)
     progressive_questions = [
-        "Can you explain this clearly?",
-        "Why do you need this information exactly?",
-        "Is this really urgent?",
-        "Will my account actually be blocked?",
-        "Can I complete this later today?",
-        "Is there any official website I can verify?",
-        "Will I receive confirmation after this?",
-        "Is this refundable if something goes wrong?",
-        "Are there any additional charges?",
-        "Can you confirm your official ID?"
     ]
     prefixes = [
-        "I'm a bit confused about this.",
-        "This sounds serious.",
-        "I want to resolve this properly.",
-        "I don't want any issues with my account.",
-        "Please clarify this for me."
     ]
     question = progressive_questions[min(turn // 2, len(progressive_questions)-1)]
@@ -192,6 +199,30 @@ def compute_engagement_score(session_id):
     return round(final, 2)
 # ============================
 # CALLBACK (STRICT FORMAT)
 # ============================
@@ -202,21 +233,42 @@ def send_callback(session_id):
     engagement = compute_engagement_score(session_id)
     intel = intelligence_store[session_id]
     payload = {
         "status": "success",
         "sessionId": session_id,
         "scamDetected": True,
         "totalMessagesExchanged": len(conv),
         "extractedIntelligence": {
             "phoneNumbers": intel["phoneNumbers"],
             "bankAccounts": intel["bankAccounts"],
             "upiIds": intel["upiIds"],
             "phishingLinks": intel["phishingLinks"],
-            "emailAddresses": intel["emailAddresses"]
         },
         "engagementMetrics": {
             "totalMessagesExchanged": len(conv),
-            "durationSeconds": max(60, len(conv) * 6),
             "engagementScore": round(engagement)
         },
         "agentNotes": "Adaptive psychological engagement used to prolong conversation."
@@ -249,13 +301,18 @@ def honeypot_message():
             "bankAccounts": [],
             "upiIds": [],
             "phishingLinks": [],
-            "emailAddresses": []
         }
         callback_done[session_id] = False
     conversation_store[session_id].append({"sender": "scammer", "text": text})
     scam, conf = detect_scam(text)
     intel = extract_intelligence(text)
     for k in intel:
@@ -282,4 +339,5 @@ def honeypot_message():
     })
 if __name__ == "__main__":
-    app.run(host="0.0.0.0", port=8000)

 conversation_store = {}
 intelligence_store = {}
 callback_done = {}
+confidence_store = {}
 # ============================
 # VERIFY API KEY
         "upiIds": r"[a-zA-Z0-9.\-_+]+@[a-zA-Z]+",
         "cardNumbers": r"\b(?:\d{4}[- ]?){3}\d{4}\b",
         "ifscCodes": r"\b[A-Z]{4}0[A-Z0-9]{6}\b",
+        "transactionIds": r"\b[A-Z0-9]{8,20}\b",
+        "caseIds": r"\b(?:CASE|CAS|REF|ID|TICKET)[- ]?[A-Z0-9]{4,}\b",
+        "policyNumbers": r"\b(?:POLICY|POL|PL|INS)[- ]?[A-Z0-9]{4,}\b",
+        "orderNumbers": r"\b(?:ORDER|ORD|OD)[- ]?[A-Z0-9]{4,}\b",
         "telegramHandles": r"@[a-zA-Z0-9_]{5,}",
     }
         "bankAccounts": [],
         "upiIds": [],
         "phishingLinks": [],
+        "emailAddresses": [],
+        "caseIds": [],
+        "policyNumbers": [],
+        "orderNumbers": [],
     }
     for key, pattern in patterns.items():
             if key in extracted:
                 extracted[key].extend(matches)
+            # Merge extra financial or reference IDs into bankAccounts
+            if key in ["cardNumbers", "transactionIds", "policyNumbers", "orderNumbers"]:
                 extracted["bankAccounts"].extend(matches)
     # Deduplicate final lists
     turn = len(history)
     progressive_questions = [
+        "Can you send me your official phone number so I can verify this?",
+        "Which bank account or UPI ID should I use to make the payment?",
+        "Can you share the exact amount and any processing fee details?",
+        "Can you send the official link or website where I can check this?",
+        "Can you share your employee or case ID so I feel safe?",
+        "Can you confirm the last four digits of the account you are talking about?",
+        "Which email address or helpdesk should I contact if something goes wrong?",
+        "Can you send the reference number or policy or order ID for this?",
+        "Can you resend the instructions step by step so I do not make a mistake?",
+        "Before I continue, can you clearly explain why this is so urgent?"
     ]
     prefixes = [
+        "You are asking for sensitive details and I feel a bit unsure.",
+        "This sounds urgent and about my account and I am worried.",
+        "I want to understand this properly before I share anything.",
+        "I do not want any issues with my money or personal data.",
+        "Please clarify everything clearly so I can trust this."
     ]
     question = progressive_questions[min(turn // 2, len(progressive_questions)-1)]
     return round(final, 2)
+def infer_scam_type(session_id):
+    conv = conversation_store.get(session_id, [])
+    text_all = " ".join(m["text"].lower() for m in conv if m["sender"] == "scammer")
+    if any(k in text_all for k in ["upi", "gpay", "paytm", "@ok", "@ybl", "@upi"]):
+        return "upi_fraud"
+    if any(k in text_all for k in ["http://", "https://", "link", ".com", ".in"]):
+        return "phishing"
+    if any(k in text_all for k in ["loan", "emi", "interest", "approval"]):
+        return "loan_scam"
+    if any(k in text_all for k in ["lottery", "jackpot", "prize"]):
+        return "lottery_scam"
+    if any(k in text_all for k in ["kyc", "aadhaar", "aadhar", "pan", "verification"]):
+        return "kyc_fraud"
+    if any(k in text_all for k in ["income tax", "tax refund", "itr"]):
+        return "tax_scam"
+    if any(k in text_all for k in ["electricity", "power bill", "disconnection"]):
+        return "utility_bill_scam"
+    if any(k in text_all for k in ["sbi", "hdfc", "icici", "axis", "bank", "account"]):
+        return "bank_fraud"
+    return "generic_scam"
 # ============================
 # CALLBACK (STRICT FORMAT)
 # ============================
     engagement = compute_engagement_score(session_id)
     intel = intelligence_store[session_id]
+    duration_seconds = max(240, len(conv) * 6)
+    conf_values = confidence_store.get(session_id, [])
+    if conf_values:
+        avg_conf = sum(conf_values) / len(conf_values)
+    else:
+        avg_conf = 0.7
+    if avg_conf >= 0.8:
+        confidence_level = "HIGH"
+    elif avg_conf >= 0.5:
+        confidence_level = "MEDIUM"
+    else:
+        confidence_level = "LOW"
     payload = {
         "status": "success",
         "sessionId": session_id,
         "scamDetected": True,
         "totalMessagesExchanged": len(conv),
+        "engagementDurationSeconds": duration_seconds,
+        "scamType": infer_scam_type(session_id),
+        "confidenceLevel": confidence_level,
         "extractedIntelligence": {
             "phoneNumbers": intel["phoneNumbers"],
             "bankAccounts": intel["bankAccounts"],
             "upiIds": intel["upiIds"],
             "phishingLinks": intel["phishingLinks"],
+            "emailAddresses": intel["emailAddresses"],
+            "caseIds": intel.get("caseIds", []),
+            "policyNumbers": intel.get("policyNumbers", []),
+            "orderNumbers": intel.get("orderNumbers", []),
         },
         "engagementMetrics": {
             "totalMessagesExchanged": len(conv),
+            "engagementDurationSeconds": duration_seconds,
             "engagementScore": round(engagement)
         },
         "agentNotes": "Adaptive psychological engagement used to prolong conversation."
             "bankAccounts": [],
             "upiIds": [],
             "phishingLinks": [],
+            "emailAddresses": [],
+            "caseIds": [],
+            "policyNumbers": [],
+            "orderNumbers": [],
         }
         callback_done[session_id] = False
+        confidence_store[session_id] = []
     conversation_store[session_id].append({"sender": "scammer", "text": text})
     scam, conf = detect_scam(text)
+    confidence_store[session_id].append(conf)
     intel = extract_intelligence(text)
     for k in intel:
     })
 if __name__ == "__main__":
+    port = int(os.getenv("PORT", "8000"))
+    app.run(host="0.0.0.0", port=port)

src/main.py ADDED Viewed

	@@ -0,0 +1,8 @@

+from honeypot_api import app
+import os
+if __name__ == "__main__":
+    port = int(os.getenv("PORT", "8000"))
+    app.run(host="0.0.0.0", port=port)