Spaces:
Sleeping
Sleeping
File size: 7,176 Bytes
71c1ad2 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | ---
title: Hubble AI Engine
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
---
# Hubble AI Engine β Cyberbullying Detection Pipeline
A production-grade, layered AI content moderation system for detecting cyberbullying across **text**, **image**, and **video** inputs. Designed as a universal safety tool for social media platforms.
---
## ποΈ Architecture Overview
```
User Input (text/image/video)
β Preprocessing (normalization, frame extraction)
β Fast AI Filter (RoBERTa text, EfficientNet image) β ONNX optimized
β Risk Scoring Engine (0-100 composite score)
β LangGraph Router
ββ LOW (0-30) β Allow β
ββ MEDIUM (31-65) β Warning β οΈ
ββ HIGH (66-100) β Deep Analysis π΄
ββ CLIP multimodal alignment
ββ Gemini reasoning (via LangChain)
ββ Final verdict
β Decision Engine (severity + user history + rules)
β Response + Logging
```
### Key Components
| Layer | Technology | Purpose |
|-------|-----------|---------|
| **Fast Filter** | RoBERTa (ONNX), EfficientNet (ONNX) | Sub-200ms first-pass classification |
| **Risk Scoring** | Custom weighted engine | Composite score with category weights + user history |
| **Routing** | LangGraph state machine | Conditional deep analysis for HIGH-risk content only |
| **Deep Analysis** | CLIP + Gemini (LangChain) | Multimodal alignment + LLM contextual reasoning |
| **Decision Engine** | Rule-based system | ALLOWED / WARNING / BLOCKED with escalation logic |
| **Observability** | LangSmith + structlog | Full pipeline tracing and structured logging |
| **Storage** | MongoDB (motor) + Redis | Moderation logs, user history, result caching |
---
## π Project Structure
```
Ai/
βββ app/
β βββ main.py # FastAPI app factory + lifespan
β βββ config.py # Pydantic Settings (.env)
β βββ dependencies.py # FastAPI dependency injection
β βββ api/ # API layer
β β βββ router.py # Route aggregator
β β βββ v1/
β β β βββ analyze.py # POST /analyze/text, /image, /video
β β β βββ health.py # GET /health
β β β βββ history.py # GET /history/{user_id}
β β βββ schemas/
β β βββ requests.py # Request models
β β βββ responses.py # Response models
β βββ pipeline/ # Core moderation pipeline
β β βββ preprocessor.py # Input normalization
β β βββ fast_filter.py # RoBERTa + EfficientNet inference
β β βββ risk_scorer.py # Composite risk scoring
β β βββ deep_analyzer.py # CLIP + Gemini deep analysis
β β βββ decision_engine.py # Rule-based verdicts
β β βββ workflow.py # LangGraph state machine
β βββ models/ # ML model management
β β βββ model_registry.py # Singleton model loader
β β βββ text_model.py # RoBERTa (ONNX)
β β βββ image_model.py # EfficientNet (ONNX)
β β βββ clip_model.py # OpenCLIP
β β βββ onnx_utils.py # ONNX export/inference
β βββ services/ # External integrations
β β βββ gemini_service.py # Gemini via LangChain
β β βββ mongo_service.py # MongoDB (async)
β β βββ redis_service.py # Redis (async)
β βββ observability/ # Monitoring
β β βββ langsmith.py # LangSmith tracing
β β βββ logging.py # structlog config
β βββ utils/ # Helpers
β βββ image_utils.py # Image preprocessing
β βββ video_utils.py # Video frame extraction
βββ tests/ # Test suite
βββ model_cache/ # Downloaded models (gitignored)
βββ _legacy/ # Old code (preserved for reference)
βββ .env.example # Environment template
βββ requirements.txt # Python dependencies
βββ README.md # This file
```
---
## π Quick Start
### 1. Setup Python Environment
```bash
cd Ai
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -r requirements.txt
```
### 2. Configure Environment
```bash
cp .env.example .env
# Edit .env with your API keys and database URIs
```
### 3. Start Services (MongoDB + Redis)
```bash
# Using Docker
docker run -d -p 27017:27017 --name hubble-mongo mongo:7
docker run -d -p 6379:6379 --name hubble-redis redis:7-alpine
```
### 4. Run the Server
```bash
cd Ai
python -m app.main
# Or: uvicorn app.main:app --reload --port 8000
```
### 5. Test the API
```bash
# Health check
curl http://localhost:8000/health
# Analyze text
curl -X POST http://localhost:8000/api/v1/analyze/text \
-H "Content-Type: application/json" \
-d '{"text": "You are worthless", "user_id": "user123"}'
# Analyze image
curl -X POST http://localhost:8000/api/v1/analyze/image \
-F "file=@test.jpg" -F "user_id=user123"
```
---
## π‘ API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/health` | Health check with model/service status |
| `GET` | `/health/ping` | Lightweight liveness probe |
| `POST` | `/api/v1/analyze/text` | Analyze text for cyberbullying |
| `POST` | `/api/v1/analyze/image` | Analyze image for harmful content |
| `POST` | `/api/v1/analyze/video` | Analyze video (frame extraction) |
| `GET` | `/api/v1/history/{user_id}` | Get moderation history |
| `GET` | `/api/v1/history/{user_id}/summary` | Get aggregated user stats |
### Response Schema
All `/analyze/*` endpoints return a unified `AnalysisResponse`:
```json
{
"request_id": "req_abc123",
"input_type": "text",
"status": "WARNING",
"risk_level": "MEDIUM",
"risk_score": 45.2,
"categories": ["insult", "toxic"],
"confidence": 0.82,
"decision": {
"action": "WARNING",
"reason": "Content flagged as potentially harmful",
"severity": "medium",
"should_alert_parent": false
},
"processing_time_ms": 156,
"cached": false
}
```
---
## π§ͺ Running Tests
```bash
cd Ai
python -m pytest tests/ -v
```
---
## π Models Used
| Model | Purpose | Size | Backend |
|-------|---------|------|---------|
| `unitary/toxic-bert` | Text toxicity (6 labels) | ~450 MB | ONNX |
| `google/efficientnet-b0` | Image classification | ~20 MB | ONNX |
| `openai/clip-vit-base-patch32` | Multimodal alignment | ~600 MB | PyTorch |
| `gemini-2.0-flash` | Deep contextual reasoning | Cloud API | LangChain |
---
## π Security Notes
- API keys loaded from `.env` only (never hardcoded)
- CORS restricted in production mode
- User data isolated by `user_id`
- All moderation events logged for audit
---
Built for the **National Hackathon β SentinelAI Project** |