Spaces:
Sleeping
Sleeping
Commit ·
b8203af
1
Parent(s): 1d571f0
updated README
Browse files
README.md
CHANGED
|
@@ -1,12 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: BotDetection
|
| 3 |
-
|
| 4 |
-
colorFrom: blue
|
| 5 |
-
colorTo: gray
|
| 6 |
-
sdk: streamlit
|
| 7 |
-
sdk_version: 1.42.0
|
| 8 |
-
app_file: app.py
|
| 9 |
-
pinned: false
|
| 10 |
license: apache-2.0
|
| 11 |
---
|
| 12 |
|
|
|
|
| 1 |
+
# Social Media Bot Detection
|
| 2 |
+
|
| 3 |
+
This project focuses on detecting automated (bot) accounts using **only user metadata and behavioral features**, without relying on text or content analysis.
|
| 4 |
+
|
| 5 |
+
The goal is to build a **robust and lightweight bot detection system** that is less sensitive to content manipulation and language changes.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## What this project does
|
| 10 |
+
- Uses **user-level metadata** and behavioral signals as input
|
| 11 |
+
- Performs **feature engineering** to capture activity patterns
|
| 12 |
+
- Trains **supervised machine learning models** to classify accounts as bot or genuine
|
| 13 |
+
- Supports an **API-driven setup** for frontend or downstream integration
|
| 14 |
+
|
| 15 |
+
This version intentionally avoids text-based features.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Why metadata-only detection?
|
| 20 |
+
Text-based bot detection can break when:
|
| 21 |
+
- Bots generate human-like text
|
| 22 |
+
- Language or topics change frequently
|
| 23 |
+
|
| 24 |
+
Metadata and behavior:
|
| 25 |
+
- Are harder to fake consistently
|
| 26 |
+
- Capture long-term patterns
|
| 27 |
+
- Generalize better across platforms
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
## Approach (high level)
|
| 32 |
+
1. Collect user metadata
|
| 33 |
+
2. Clean and preprocess the data
|
| 34 |
+
3. Engineer behavioral features
|
| 35 |
+
4. Train supervised ML models
|
| 36 |
+
5. Evaluate using standard classification metrics
|
| 37 |
+
6. Serve predictions via an API
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Model & Code
|
| 42 |
+
- Training and inference code are included in this repository
|
| 43 |
+
- **Model artifacts are not stored here** due to size constraints
|
| 44 |
+
|
| 45 |
+
📦 Trained model weights are hosted on Hugging Face:
|
| 46 |
+
👉 https://huggingface.co/spaces/ASHUT0SH-SiNGH/BotDetection
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Notes
|
| 51 |
+
- Focuses on **pipeline design and modeling logic**
|
| 52 |
+
- Frontend components are minimal and not the core focus
|
| 53 |
+
- Designed to be extended with additional metadata features
|
| 54 |
+
|
| 55 |
+
---
|
| 56 |
+
|
| 57 |
+
## Status
|
| 58 |
+
- Model trained and evaluated
|
| 59 |
+
- API-based integration supported
|
| 60 |
+
- Open to further improvements
|
| 61 |
+
|
| 62 |
+
|
| 63 |
---
|
| 64 |
title: BotDetection
|
| 65 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
license: apache-2.0
|
| 67 |
---
|
| 68 |
|
app.py
CHANGED
|
@@ -82,7 +82,7 @@ MODEL_FEATURES = [
|
|
| 82 |
|
| 83 |
|
| 84 |
@st.cache_resource
|
| 85 |
-
def load_model(model_path="bot_model.joblib"):
|
| 86 |
try:
|
| 87 |
model = joblib.load(model_path)
|
| 88 |
return model
|
|
|
|
| 82 |
|
| 83 |
|
| 84 |
@st.cache_resource
|
| 85 |
+
def load_model(model_path="./bot_model.joblib"):
|
| 86 |
try:
|
| 87 |
model = joblib.load(model_path)
|
| 88 |
return model
|