Spaces:
Sleeping
Sleeping
File size: 3,669 Bytes
f2fc677 5fc3d59 e9120f6 5fc3d59 e9120f6 5fc3d59 e9120f6 5fc3d59 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | ---
title: Honeypot API
emoji: 🛡️
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
pinned: false
---
# Honeypot API
## Description
Honeypot API is a Flask-based conversational honeypot that talks to scammers for several turns, extracts all sensitive intelligence they reveal (phone numbers, bank accounts, UPI IDs, links, emails, case IDs, policy and order numbers), and then submits a final JSON summary for scoring.
The focus is on:
- Reliable scam detection using a fine-tuned BERT phishing model
- Robust regex/NLP-based intelligence extraction
- High-quality engagement so the scammer keeps replying
## Tech Stack
- Language/Framework: Python, Flask
- Key Libraries: `torch`, `transformers`, `requests`, `re`, `logging`
- Models Used:
- BERT sequence classification model (local: `model/phising_model`) for scam detection
## Setup Instructions
1. Clone the repository
2. Create and activate a virtual environment (optional but recommended)
```bash
python -m venv venv
venv\Scripts\activate # Windows
# or
source venv/bin/activate # Linux/macOS
```
3. Install dependencies
```bash
pip install -r requirements.txt
```
4. Set environment variables (or edit `.env`)
```env
HONEYPOT_API_KEY=your-api-key-here
PORT=8000
```
5. Run the application (local)
```bash
python -m src.main
```
Or with Gunicorn (as used in Docker):
```bash
gunicorn -b 0.0.0.0:7860 honeypot_api:app --timeout 120
```
## API Endpoint
- URL: `https://your-deployed-url.com/honeypot`
- Method: `POST`
- Authentication: `x-api-key` header (must match `HONEYPOT_API_KEY`)
### Request Body
```json
{
"sessionId": "uuid-v4-string",
"message": {
"sender": "scammer",
"text": "URGENT: Your account has been compromised...",
"timestamp": "2025-02-11T10:30:00Z"
},
"conversationHistory": [
{
"sender": "scammer",
"text": "Previous message...",
"timestamp": 1739270400000
},
{
"sender": "user",
"text": "Your previous response...",
"timestamp": 1739270460000
}
],
"metadata": {
"channel": "SMS",
"language": "English",
"locale": "IN"
}
}
```
### Response Body (per turn)
```json
{
"status": "success",
"scamDetected": true,
"confidence": 0.97,
"reply": "I'm a bit confused about this. Can you explain this clearly?",
"engagementScore": 96
}
```
## Approach
- **Scam Detection**
- Uses a fine-tuned BERT model on each incoming message to classify phishing/scam content.
- The model output drives internal scoring, while final `scamDetected` in the callback is always `true` (all evaluation scenarios are scams).
- **Intelligence Extraction**
- A single regex-based extractor runs on every scammer message.
- It extracts:
- `phoneNumbers`, `bankAccounts`, `upiIds`, `phishingLinks`, `emailAddresses`
- Additional IDs such as `caseIds`, `policyNumbers`, `orderNumbers`
- Results are accumulated over the session and returned in `extractedIntelligence` in the final callback JSON.
- **Engagement Strategy**
- The honeypot acts as a confused but cooperative victim.
- It uses progressive, generic questions (e.g. “Can you explain this clearly?”, “Is this really urgent?”, “Can you confirm your official ID?”) to keep scammers talking.
- An engagement scoring function rewards:
- Depth of conversation (number of turns)
- Balanced back-and-forth between scammer and honeypot
- Frequent question marks in agent messages
- Scammer persistence
- Final engagement metrics are included in the callback as `engagementMetrics` and `engagementDurationSeconds`.
|