Spaces:
Sleeping
Sleeping
metadata
title: Hubble AI Engine
emoji: π
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
pinned: false
Hubble AI Engine β Cyberbullying Detection Pipeline
A production-grade, layered AI content moderation system for detecting cyberbullying across text, image, and video inputs. Designed as a universal safety tool for social media platforms.
ποΈ Architecture Overview
User Input (text/image/video)
β Preprocessing (normalization, frame extraction)
β Fast AI Filter (RoBERTa text, EfficientNet image) β ONNX optimized
β Risk Scoring Engine (0-100 composite score)
β LangGraph Router
ββ LOW (0-30) β Allow β
ββ MEDIUM (31-65) β Warning β οΈ
ββ HIGH (66-100) β Deep Analysis π΄
ββ CLIP multimodal alignment
ββ Gemini reasoning (via LangChain)
ββ Final verdict
β Decision Engine (severity + user history + rules)
β Response + Logging
Key Components
| Layer | Technology | Purpose |
|---|---|---|
| Fast Filter | RoBERTa (ONNX), EfficientNet (ONNX) | Sub-200ms first-pass classification |
| Risk Scoring | Custom weighted engine | Composite score with category weights + user history |
| Routing | LangGraph state machine | Conditional deep analysis for HIGH-risk content only |
| Deep Analysis | CLIP + Gemini (LangChain) | Multimodal alignment + LLM contextual reasoning |
| Decision Engine | Rule-based system | ALLOWED / WARNING / BLOCKED with escalation logic |
| Observability | LangSmith + structlog | Full pipeline tracing and structured logging |
| Storage | MongoDB (motor) + Redis | Moderation logs, user history, result caching |
π Project Structure
Ai/
βββ app/
β βββ main.py # FastAPI app factory + lifespan
β βββ config.py # Pydantic Settings (.env)
β βββ dependencies.py # FastAPI dependency injection
β βββ api/ # API layer
β β βββ router.py # Route aggregator
β β βββ v1/
β β β βββ analyze.py # POST /analyze/text, /image, /video
β β β βββ health.py # GET /health
β β β βββ history.py # GET /history/{user_id}
β β βββ schemas/
β β βββ requests.py # Request models
β β βββ responses.py # Response models
β βββ pipeline/ # Core moderation pipeline
β β βββ preprocessor.py # Input normalization
β β βββ fast_filter.py # RoBERTa + EfficientNet inference
β β βββ risk_scorer.py # Composite risk scoring
β β βββ deep_analyzer.py # CLIP + Gemini deep analysis
β β βββ decision_engine.py # Rule-based verdicts
β β βββ workflow.py # LangGraph state machine
β βββ models/ # ML model management
β β βββ model_registry.py # Singleton model loader
β β βββ text_model.py # RoBERTa (ONNX)
β β βββ image_model.py # EfficientNet (ONNX)
β β βββ clip_model.py # OpenCLIP
β β βββ onnx_utils.py # ONNX export/inference
β βββ services/ # External integrations
β β βββ gemini_service.py # Gemini via LangChain
β β βββ mongo_service.py # MongoDB (async)
β β βββ redis_service.py # Redis (async)
β βββ observability/ # Monitoring
β β βββ langsmith.py # LangSmith tracing
β β βββ logging.py # structlog config
β βββ utils/ # Helpers
β βββ image_utils.py # Image preprocessing
β βββ video_utils.py # Video frame extraction
βββ tests/ # Test suite
βββ model_cache/ # Downloaded models (gitignored)
βββ _legacy/ # Old code (preserved for reference)
βββ .env.example # Environment template
βββ requirements.txt # Python dependencies
βββ README.md # This file
π Quick Start
1. Setup Python Environment
cd Ai
python -m venv venv
venv\Scripts\activate # Windows
# source venv/bin/activate # macOS/Linux
pip install -r requirements.txt
2. Configure Environment
cp .env.example .env
# Edit .env with your API keys and database URIs
3. Start Services (MongoDB + Redis)
# Using Docker
docker run -d -p 27017:27017 --name hubble-mongo mongo:7
docker run -d -p 6379:6379 --name hubble-redis redis:7-alpine
4. Run the Server
cd Ai
python -m app.main
# Or: uvicorn app.main:app --reload --port 8000
5. Test the API
# Health check
curl http://localhost:8000/health
# Analyze text
curl -X POST http://localhost:8000/api/v1/analyze/text \
-H "Content-Type: application/json" \
-d '{"text": "You are worthless", "user_id": "user123"}'
# Analyze image
curl -X POST http://localhost:8000/api/v1/analyze/image \
-F "file=@test.jpg" -F "user_id=user123"
π‘ API Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/health |
Health check with model/service status |
GET |
/health/ping |
Lightweight liveness probe |
POST |
/api/v1/analyze/text |
Analyze text for cyberbullying |
POST |
/api/v1/analyze/image |
Analyze image for harmful content |
POST |
/api/v1/analyze/video |
Analyze video (frame extraction) |
GET |
/api/v1/history/{user_id} |
Get moderation history |
GET |
/api/v1/history/{user_id}/summary |
Get aggregated user stats |
Response Schema
All /analyze/* endpoints return a unified AnalysisResponse:
{
"request_id": "req_abc123",
"input_type": "text",
"status": "WARNING",
"risk_level": "MEDIUM",
"risk_score": 45.2,
"categories": ["insult", "toxic"],
"confidence": 0.82,
"decision": {
"action": "WARNING",
"reason": "Content flagged as potentially harmful",
"severity": "medium",
"should_alert_parent": false
},
"processing_time_ms": 156,
"cached": false
}
π§ͺ Running Tests
cd Ai
python -m pytest tests/ -v
π Models Used
| Model | Purpose | Size | Backend |
|---|---|---|---|
unitary/toxic-bert |
Text toxicity (6 labels) | ~450 MB | ONNX |
google/efficientnet-b0 |
Image classification | ~20 MB | ONNX |
openai/clip-vit-base-patch32 |
Multimodal alignment | ~600 MB | PyTorch |
gemini-2.0-flash |
Deep contextual reasoning | Cloud API | LangChain |
π Security Notes
- API keys loaded from
.envonly (never hardcoded) - CORS restricted in production mode
- User data isolated by
user_id - All moderation events logged for audit
Built for the National Hackathon β SentinelAI Project