Spaces:

Ankit74990
/

honeypot-api

Sleeping

App Files Files Community

honeypot-api / README.md

Ankit19102004

Add HF Spaces Docker config to README

f2fc677 3 months ago

preview code

raw

history blame contribute delete

3.67 kB

metadata

title: Honeypot API
emoji: 🛡️
colorFrom: blue
colorTo: red
sdk: docker
app_port: 7860
pinned: false

Honeypot API

Description

Honeypot API is a Flask-based conversational honeypot that talks to scammers for several turns, extracts all sensitive intelligence they reveal (phone numbers, bank accounts, UPI IDs, links, emails, case IDs, policy and order numbers), and then submits a final JSON summary for scoring.
The focus is on:

Reliable scam detection using a fine-tuned BERT phishing model
Robust regex/NLP-based intelligence extraction
High-quality engagement so the scammer keeps replying

Tech Stack

Language/Framework: Python, Flask
Key Libraries: torch, transformers, requests, re, logging
Models Used:
- BERT sequence classification model (local: model/phising_model) for scam detection

Setup Instructions

Clone the repository

Create and activate a virtual environment (optional but recommended)

python -m venv venv
venv\Scripts\activate  # Windows
# or
source venv/bin/activate  # Linux/macOS

Install dependencies
```
pip install -r requirements.txt
```

Set environment variables (or edit .env)

HONEYPOT_API_KEY=your-api-key-here
PORT=8000

Run the application (local)

python -m src.main

Or with Gunicorn (as used in Docker):

gunicorn -b 0.0.0.0:7860 honeypot_api:app --timeout 120

API Endpoint

URL: https://your-deployed-url.com/honeypot
Method: POST
Authentication: x-api-key header (must match HONEYPOT_API_KEY)

Request Body

{
  "sessionId": "uuid-v4-string",
  "message": {
    "sender": "scammer",
    "text": "URGENT: Your account has been compromised...",
    "timestamp": "2025-02-11T10:30:00Z"
  },
  "conversationHistory": [
    {
      "sender": "scammer",
      "text": "Previous message...",
      "timestamp": 1739270400000
    },
    {
      "sender": "user",
      "text": "Your previous response...",
      "timestamp": 1739270460000
    }
  ],
  "metadata": {
    "channel": "SMS",
    "language": "English",
    "locale": "IN"
  }
}

Response Body (per turn)

{
  "status": "success",
  "scamDetected": true,
  "confidence": 0.97,
  "reply": "I'm a bit confused about this. Can you explain this clearly?",
  "engagementScore": 96
}

Approach

Scam Detection
- Uses a fine-tuned BERT model on each incoming message to classify phishing/scam content.
- The model output drives internal scoring, while final scamDetected in the callback is always true (all evaluation scenarios are scams).
Intelligence Extraction
- A single regex-based extractor runs on every scammer message.
- It extracts:
  - phoneNumbers, bankAccounts, upiIds, phishingLinks, emailAddresses
  - Additional IDs such as caseIds, policyNumbers, orderNumbers
- Results are accumulated over the session and returned in extractedIntelligence in the final callback JSON.
Engagement Strategy
- The honeypot acts as a confused but cooperative victim.
- It uses progressive, generic questions (e.g. “Can you explain this clearly?”, “Is this really urgent?”, “Can you confirm your official ID?”) to keep scammers talking.
- An engagement scoring function rewards:
  - Depth of conversation (number of turns)
  - Balanced back-and-forth between scammer and honeypot
  - Frequent question marks in agent messages
  - Scammer persistence
- Final engagement metrics are included in the callback as engagementMetrics and engagementDurationSeconds.