--- title: Hubble AI Engine emoji: ๐Ÿ” colorFrom: blue colorTo: indigo sdk: docker app_port: 7860 pinned: false --- # Hubble AI Engine โ€” Cyberbullying Detection Pipeline A production-grade, layered AI content moderation system for detecting cyberbullying across **text**, **image**, and **video** inputs. Designed as a universal safety tool for social media platforms. --- ## ๐Ÿ—๏ธ Architecture Overview ``` User Input (text/image/video) โ†’ Preprocessing (normalization, frame extraction) โ†’ Fast AI Filter (RoBERTa text, EfficientNet image) โ€” ONNX optimized โ†’ Risk Scoring Engine (0-100 composite score) โ†’ LangGraph Router โ”œโ”€ LOW (0-30) โ†’ Allow โœ… โ”œโ”€ MEDIUM (31-65) โ†’ Warning โš ๏ธ โ””โ”€ HIGH (66-100) โ†’ Deep Analysis ๐Ÿ”ด โ”œโ”€ CLIP multimodal alignment โ”œโ”€ Gemini reasoning (via LangChain) โ””โ”€ Final verdict โ†’ Decision Engine (severity + user history + rules) โ†’ Response + Logging ``` ### Key Components | Layer | Technology | Purpose | |-------|-----------|---------| | **Fast Filter** | RoBERTa (ONNX), EfficientNet (ONNX) | Sub-200ms first-pass classification | | **Risk Scoring** | Custom weighted engine | Composite score with category weights + user history | | **Routing** | LangGraph state machine | Conditional deep analysis for HIGH-risk content only | | **Deep Analysis** | CLIP + Gemini (LangChain) | Multimodal alignment + LLM contextual reasoning | | **Decision Engine** | Rule-based system | ALLOWED / WARNING / BLOCKED with escalation logic | | **Observability** | LangSmith + structlog | Full pipeline tracing and structured logging | | **Storage** | MongoDB (motor) + Redis | Moderation logs, user history, result caching | --- ## ๐Ÿ“ Project Structure ``` Ai/ โ”œโ”€โ”€ app/ โ”‚ โ”œโ”€โ”€ main.py # FastAPI app factory + lifespan โ”‚ โ”œโ”€โ”€ config.py # Pydantic Settings (.env) โ”‚ โ”œโ”€โ”€ dependencies.py # FastAPI dependency injection โ”‚ โ”œโ”€โ”€ api/ # API layer โ”‚ โ”‚ โ”œโ”€โ”€ router.py # Route aggregator โ”‚ โ”‚ โ”œโ”€โ”€ v1/ โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ analyze.py # POST /analyze/text, /image, /video โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ health.py # GET /health โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ history.py # GET /history/{user_id} โ”‚ โ”‚ โ””โ”€โ”€ schemas/ โ”‚ โ”‚ โ”œโ”€โ”€ requests.py # Request models โ”‚ โ”‚ โ””โ”€โ”€ responses.py # Response models โ”‚ โ”œโ”€โ”€ pipeline/ # Core moderation pipeline โ”‚ โ”‚ โ”œโ”€โ”€ preprocessor.py # Input normalization โ”‚ โ”‚ โ”œโ”€โ”€ fast_filter.py # RoBERTa + EfficientNet inference โ”‚ โ”‚ โ”œโ”€โ”€ risk_scorer.py # Composite risk scoring โ”‚ โ”‚ โ”œโ”€โ”€ deep_analyzer.py # CLIP + Gemini deep analysis โ”‚ โ”‚ โ”œโ”€โ”€ decision_engine.py # Rule-based verdicts โ”‚ โ”‚ โ””โ”€โ”€ workflow.py # LangGraph state machine โ”‚ โ”œโ”€โ”€ models/ # ML model management โ”‚ โ”‚ โ”œโ”€โ”€ model_registry.py # Singleton model loader โ”‚ โ”‚ โ”œโ”€โ”€ text_model.py # RoBERTa (ONNX) โ”‚ โ”‚ โ”œโ”€โ”€ image_model.py # EfficientNet (ONNX) โ”‚ โ”‚ โ”œโ”€โ”€ clip_model.py # OpenCLIP โ”‚ โ”‚ โ””โ”€โ”€ onnx_utils.py # ONNX export/inference โ”‚ โ”œโ”€โ”€ services/ # External integrations โ”‚ โ”‚ โ”œโ”€โ”€ gemini_service.py # Gemini via LangChain โ”‚ โ”‚ โ”œโ”€โ”€ mongo_service.py # MongoDB (async) โ”‚ โ”‚ โ””โ”€โ”€ redis_service.py # Redis (async) โ”‚ โ”œโ”€โ”€ observability/ # Monitoring โ”‚ โ”‚ โ”œโ”€โ”€ langsmith.py # LangSmith tracing โ”‚ โ”‚ โ””โ”€โ”€ logging.py # structlog config โ”‚ โ””โ”€โ”€ utils/ # Helpers โ”‚ โ”œโ”€โ”€ image_utils.py # Image preprocessing โ”‚ โ””โ”€โ”€ video_utils.py # Video frame extraction โ”œโ”€โ”€ tests/ # Test suite โ”œโ”€โ”€ model_cache/ # Downloaded models (gitignored) โ”œโ”€โ”€ _legacy/ # Old code (preserved for reference) โ”œโ”€โ”€ .env.example # Environment template โ”œโ”€โ”€ requirements.txt # Python dependencies โ””โ”€โ”€ README.md # This file ``` --- ## ๐Ÿš€ Quick Start ### 1. Setup Python Environment ```bash cd Ai python -m venv venv venv\Scripts\activate # Windows # source venv/bin/activate # macOS/Linux pip install -r requirements.txt ``` ### 2. Configure Environment ```bash cp .env.example .env # Edit .env with your API keys and database URIs ``` ### 3. Start Services (MongoDB + Redis) ```bash # Using Docker docker run -d -p 27017:27017 --name hubble-mongo mongo:7 docker run -d -p 6379:6379 --name hubble-redis redis:7-alpine ``` ### 4. Run the Server ```bash cd Ai python -m app.main # Or: uvicorn app.main:app --reload --port 8000 ``` ### 5. Test the API ```bash # Health check curl http://localhost:8000/health # Analyze text curl -X POST http://localhost:8000/api/v1/analyze/text \ -H "Content-Type: application/json" \ -d '{"text": "You are worthless", "user_id": "user123"}' # Analyze image curl -X POST http://localhost:8000/api/v1/analyze/image \ -F "file=@test.jpg" -F "user_id=user123" ``` --- ## ๐Ÿ“ก API Endpoints | Method | Endpoint | Description | |--------|----------|-------------| | `GET` | `/health` | Health check with model/service status | | `GET` | `/health/ping` | Lightweight liveness probe | | `POST` | `/api/v1/analyze/text` | Analyze text for cyberbullying | | `POST` | `/api/v1/analyze/image` | Analyze image for harmful content | | `POST` | `/api/v1/analyze/video` | Analyze video (frame extraction) | | `GET` | `/api/v1/history/{user_id}` | Get moderation history | | `GET` | `/api/v1/history/{user_id}/summary` | Get aggregated user stats | ### Response Schema All `/analyze/*` endpoints return a unified `AnalysisResponse`: ```json { "request_id": "req_abc123", "input_type": "text", "status": "WARNING", "risk_level": "MEDIUM", "risk_score": 45.2, "categories": ["insult", "toxic"], "confidence": 0.82, "decision": { "action": "WARNING", "reason": "Content flagged as potentially harmful", "severity": "medium", "should_alert_parent": false }, "processing_time_ms": 156, "cached": false } ``` --- ## ๐Ÿงช Running Tests ```bash cd Ai python -m pytest tests/ -v ``` --- ## ๐Ÿ“Š Models Used | Model | Purpose | Size | Backend | |-------|---------|------|---------| | `unitary/toxic-bert` | Text toxicity (6 labels) | ~450 MB | ONNX | | `google/efficientnet-b0` | Image classification | ~20 MB | ONNX | | `openai/clip-vit-base-patch32` | Multimodal alignment | ~600 MB | PyTorch | | `gemini-2.0-flash` | Deep contextual reasoning | Cloud API | LangChain | --- ## ๐Ÿ”’ Security Notes - API keys loaded from `.env` only (never hardcoded) - CORS restricted in production mode - User data isolated by `user_id` - All moderation events logged for audit --- Built for the **National Hackathon โ€” SentinelAI Project**