Spaces:
Sleeping
Sleeping
| title: Hubble AI Engine | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| # Hubble AI Engine β Cyberbullying Detection Pipeline | |
| A production-grade, layered AI content moderation system for detecting cyberbullying across **text**, **image**, and **video** inputs. Designed as a universal safety tool for social media platforms. | |
| --- | |
| ## ποΈ Architecture Overview | |
| ``` | |
| User Input (text/image/video) | |
| β Preprocessing (normalization, frame extraction) | |
| β Fast AI Filter (RoBERTa text, EfficientNet image) β ONNX optimized | |
| β Risk Scoring Engine (0-100 composite score) | |
| β LangGraph Router | |
| ββ LOW (0-30) β Allow β | |
| ββ MEDIUM (31-65) β Warning β οΈ | |
| ββ HIGH (66-100) β Deep Analysis π΄ | |
| ββ CLIP multimodal alignment | |
| ββ Gemini reasoning (via LangChain) | |
| ββ Final verdict | |
| β Decision Engine (severity + user history + rules) | |
| β Response + Logging | |
| ``` | |
| ### Key Components | |
| | Layer | Technology | Purpose | | |
| |-------|-----------|---------| | |
| | **Fast Filter** | RoBERTa (ONNX), EfficientNet (ONNX) | Sub-200ms first-pass classification | | |
| | **Risk Scoring** | Custom weighted engine | Composite score with category weights + user history | | |
| | **Routing** | LangGraph state machine | Conditional deep analysis for HIGH-risk content only | | |
| | **Deep Analysis** | CLIP + Gemini (LangChain) | Multimodal alignment + LLM contextual reasoning | | |
| | **Decision Engine** | Rule-based system | ALLOWED / WARNING / BLOCKED with escalation logic | | |
| | **Observability** | LangSmith + structlog | Full pipeline tracing and structured logging | | |
| | **Storage** | MongoDB (motor) + Redis | Moderation logs, user history, result caching | | |
| --- | |
| ## π Project Structure | |
| ``` | |
| Ai/ | |
| βββ app/ | |
| β βββ main.py # FastAPI app factory + lifespan | |
| β βββ config.py # Pydantic Settings (.env) | |
| β βββ dependencies.py # FastAPI dependency injection | |
| β βββ api/ # API layer | |
| β β βββ router.py # Route aggregator | |
| β β βββ v1/ | |
| β β β βββ analyze.py # POST /analyze/text, /image, /video | |
| β β β βββ health.py # GET /health | |
| β β β βββ history.py # GET /history/{user_id} | |
| β β βββ schemas/ | |
| β β βββ requests.py # Request models | |
| β β βββ responses.py # Response models | |
| β βββ pipeline/ # Core moderation pipeline | |
| β β βββ preprocessor.py # Input normalization | |
| β β βββ fast_filter.py # RoBERTa + EfficientNet inference | |
| β β βββ risk_scorer.py # Composite risk scoring | |
| β β βββ deep_analyzer.py # CLIP + Gemini deep analysis | |
| β β βββ decision_engine.py # Rule-based verdicts | |
| β β βββ workflow.py # LangGraph state machine | |
| β βββ models/ # ML model management | |
| β β βββ model_registry.py # Singleton model loader | |
| β β βββ text_model.py # RoBERTa (ONNX) | |
| β β βββ image_model.py # EfficientNet (ONNX) | |
| β β βββ clip_model.py # OpenCLIP | |
| β β βββ onnx_utils.py # ONNX export/inference | |
| β βββ services/ # External integrations | |
| β β βββ gemini_service.py # Gemini via LangChain | |
| β β βββ mongo_service.py # MongoDB (async) | |
| β β βββ redis_service.py # Redis (async) | |
| β βββ observability/ # Monitoring | |
| β β βββ langsmith.py # LangSmith tracing | |
| β β βββ logging.py # structlog config | |
| β βββ utils/ # Helpers | |
| β βββ image_utils.py # Image preprocessing | |
| β βββ video_utils.py # Video frame extraction | |
| βββ tests/ # Test suite | |
| βββ model_cache/ # Downloaded models (gitignored) | |
| βββ _legacy/ # Old code (preserved for reference) | |
| βββ .env.example # Environment template | |
| βββ requirements.txt # Python dependencies | |
| βββ README.md # This file | |
| ``` | |
| --- | |
| ## π Quick Start | |
| ### 1. Setup Python Environment | |
| ```bash | |
| cd Ai | |
| python -m venv venv | |
| venv\Scripts\activate # Windows | |
| # source venv/bin/activate # macOS/Linux | |
| pip install -r requirements.txt | |
| ``` | |
| ### 2. Configure Environment | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your API keys and database URIs | |
| ``` | |
| ### 3. Start Services (MongoDB + Redis) | |
| ```bash | |
| # Using Docker | |
| docker run -d -p 27017:27017 --name hubble-mongo mongo:7 | |
| docker run -d -p 6379:6379 --name hubble-redis redis:7-alpine | |
| ``` | |
| ### 4. Run the Server | |
| ```bash | |
| cd Ai | |
| python -m app.main | |
| # Or: uvicorn app.main:app --reload --port 8000 | |
| ``` | |
| ### 5. Test the API | |
| ```bash | |
| # Health check | |
| curl http://localhost:8000/health | |
| # Analyze text | |
| curl -X POST http://localhost:8000/api/v1/analyze/text \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"text": "You are worthless", "user_id": "user123"}' | |
| # Analyze image | |
| curl -X POST http://localhost:8000/api/v1/analyze/image \ | |
| -F "file=@test.jpg" -F "user_id=user123" | |
| ``` | |
| --- | |
| ## π‘ API Endpoints | |
| | Method | Endpoint | Description | | |
| |--------|----------|-------------| | |
| | `GET` | `/health` | Health check with model/service status | | |
| | `GET` | `/health/ping` | Lightweight liveness probe | | |
| | `POST` | `/api/v1/analyze/text` | Analyze text for cyberbullying | | |
| | `POST` | `/api/v1/analyze/image` | Analyze image for harmful content | | |
| | `POST` | `/api/v1/analyze/video` | Analyze video (frame extraction) | | |
| | `GET` | `/api/v1/history/{user_id}` | Get moderation history | | |
| | `GET` | `/api/v1/history/{user_id}/summary` | Get aggregated user stats | | |
| ### Response Schema | |
| All `/analyze/*` endpoints return a unified `AnalysisResponse`: | |
| ```json | |
| { | |
| "request_id": "req_abc123", | |
| "input_type": "text", | |
| "status": "WARNING", | |
| "risk_level": "MEDIUM", | |
| "risk_score": 45.2, | |
| "categories": ["insult", "toxic"], | |
| "confidence": 0.82, | |
| "decision": { | |
| "action": "WARNING", | |
| "reason": "Content flagged as potentially harmful", | |
| "severity": "medium", | |
| "should_alert_parent": false | |
| }, | |
| "processing_time_ms": 156, | |
| "cached": false | |
| } | |
| ``` | |
| --- | |
| ## π§ͺ Running Tests | |
| ```bash | |
| cd Ai | |
| python -m pytest tests/ -v | |
| ``` | |
| --- | |
| ## π Models Used | |
| | Model | Purpose | Size | Backend | | |
| |-------|---------|------|---------| | |
| | `unitary/toxic-bert` | Text toxicity (6 labels) | ~450 MB | ONNX | | |
| | `google/efficientnet-b0` | Image classification | ~20 MB | ONNX | | |
| | `openai/clip-vit-base-patch32` | Multimodal alignment | ~600 MB | PyTorch | | |
| | `gemini-2.0-flash` | Deep contextual reasoning | Cloud API | LangChain | | |
| --- | |
| ## π Security Notes | |
| - API keys loaded from `.env` only (never hardcoded) | |
| - CORS restricted in production mode | |
| - User data isolated by `user_id` | |
| - All moderation events logged for audit | |
| --- | |
| Built for the **National Hackathon β SentinelAI Project** |